1. 5
    cat file.txt \    
      | sort           `# I *can* put a comment here` \
      | cut -f 1 \
        `# And I can put one here too` \
      | grep foo
    

    :^)

    1. 7

      I would usually just write:

        cat file.txt | # "useless cat" recapitulated
        sort         | # I *can* put a comment here
        cut -f 1     | # And I can put one here too
        grep foo       # and even here
      
      1. 6

        That’s OK for pipelines, but it falls down for the equivalent of

        ... find /bin               # traverse this directory
            -type f -a -executable  # filter executable files
            -a printf '%s %P\n'     # print size and path
          | sort -n                 # sort numerically
          ;
        

        The Tour has an example like this, probably should have put it in the blog post too …

        1. 2

          This screams to me a future accidental bug by not leading with the pipe and needless whitespace ceremony. It looks nice in a small example like this, but say you need to pipe that input into another command: if it’s a longer command, you’ll need to move all of the pipes and then possibly adjust all of the comments to the new whitespace if you’re on a char limit and then you may forget add back the pipe after grep because it’s easy to miss it on scan. And this doesn’t even considering that it’s not indented which is hard to parse visually.

          1. 1

            how would you do this without the useless cat?

            1. 2
              1. You can do: sort < file.txt | # comment ….
              2. UUoC makes sometimes sense and does not deserve so much hate as it gets in internet discussions.
              3. I prefer | + indentation on the beginning of the next line, so this style of comments is not much useful for me (If I have to, I will probably rather use the backtick trick).
              1. 2

                Also, just sort file.txt | ...

                1. 4

                  Or if you want to keep the order consistent with the data flow:

                  < file.txt \
                  sort | \
                  ...
                  
          2. 3

            I’ve seen this trick! But I don’t like all the noise with \ and backticks.

            I probably should have mentioned it … Right now it actually doesn’t work in OSH because # goes to the end of the line – we don’t look for a backtick. But that is not fundamental and could be changed. I haven’t seen it enough in real code to make it a priority. I think I saw it in some sandstorm shell scripts, and that’s about it.

            The 1.2M lines of shell corpus (https://www.oilshell.org/release/0.9.2/test/wild.wwz/) either contains 0 or 1 instances of it, I forget …

            1. 4

              But I don’t like all the noise with \ and backticks.

              i agree it’s a total hack. it wasn’t meant as a serious proposal (although i have used it in the past for a lack of alternatives), i was just being a smart-ass.

              oil’s approach looks way better :)

              1. 1

                Oh also another reason not to use that trick is that it actually relies on “empty elision”. Shell omits empty strings from argv, which is related to word splitting. Consider:

                $ argv X $(echo) `echo` Y
                

                This results in an argv of ['X', 'Y'] in shell, but ['X', '', '', 'Y'] in Oil. Compare with:

                $ argv X "$(echo)" "`echo`" Y
                

                I deem this too confusing which is why Oil Doesn’t Require Quoting Everywhere !

            1. 9

              Another language (albeit more high-level than Zig) that has a great C interop story is Nim. Nim makes it really simple to wrap C libraries for use in Nim.

              1. 6

                One thing that I really like about Zig is how good it is going the other way: making libraries that can be called from C easily (and thence other languages). How does Nim handle that case? It’s an often neglected case.

                1. 4

                  I used Nim to write a library callable by JNI on Android (https://github.com/akavel/hellomello), totally fine. The current version of the project (unfortunately I believe it’s somewhat bitrotten now) is macroified, but an earlier iteration was showing clearly how to do that by hand.

                  1. 2

                    While I doubt anyone uses it, my very fancy ‘ls’ has as shared library/.so extension system where you can build .so’s either in Nim or in C and either way load them up. A C program could similarly load them up no problemo with dlopen & dlsym. That extension/.so system may constitute a coding example for readers here.

                    1. 1

                      It should be fairly easy, though I can’t attest to that through personal experience. The GNUNet project calls Nim code from their C code IIRC.

                  1. 1

                    There is another way to skin this cat. Awk is mostly the “programing language” version of a generic row processor. You can also just generate the whole program (pretty easily in almost any language you already like). E.g., here is one for Nim: https://github.com/c-blake/cligen/blob/master/examples/rp.nim with example “code” in the main doc comment. (They need input like seq 1 1000 or pastes of such.)

                    With generated C rather than Nim and TinyC/tcc you can even get the start-up time down to similar to mawk/gawk. Row processing itself runs at full compiled speed, and optimize-compiled speed is only a gcc -O3 away if you have single machine data (a common case for me).

                    Besides probably being faster on “enough” data to amortize compilation costs, “native compiled awk-ish-XYZ” can also be more type safe (with slightly more type ceremony like my s(0)/f(0)/i(0) instead of just $1). Depending upon your “target language”, keystrokes for “one liners” might well be less than any awk (e.g. Nim has less need for (), {}s, ‘;’). Adding new “powers” is often as close at hand as the stdlib of your target language and simple “include” maybe stowed just once in some config file.

                    Type systems & brevity & start-up cost amortization aside, this “rotated idea” is worthy of serious consideration if for no other reasons than being “portable in concept” and about 100x simpler to implement something that can run faster.

                    1. 3

                      If folks are interested in binary parsing, the canonical repo is https://github.com/dloss/binary-parsing

                      Don’t underestimate https://docs.python.org/3/library/struct.html

                      1. 4

                        The perl/python struct modules were my original inspiration 20 years ago for what is lately https://github.com/c-blake/nio

                      1. 1

                        A shell script name, like any program name, can be any path not containing ‘/’ or ‘\0’. Shell functions are largely faster versions of scripts (except things that like “cd” change aspects of the current process). ‘[’ used to be an external command.

                        1. 9

                          I wish Rust had a bigger standard library (“batteries” included, like python in some degree)

                          See for example sort. I realise all of us download and run programs with lots of dependencies most days but I feel like core utils should not pull non-standard dependencies.

                          1. 13

                            Note that among those 12 direct dependencies, Python’s stdlib has direct equivalents only to 4: clap, itertools, rand, tempfile. Things like unicode-width, rayon, semver, binary-heap-plus are not provided by Python. compare, fnv, memchr and ouroboros are somewhat hard to qualify Rust-isms.

                            1. 2

                              In addition, it’s worth noting that a lot of projects eschew argparse (what the alternative to clap would be) for click. If a similar project was done in python, I’d almost bet money that they’d use click.

                              rand being separate has some advantages, largely that it is able to move at a pace that’s not tied to language releases. I look at this as a similar situation that golang’s syscall package has (had? the current situation is unclear to me rn). If an OS introduces a new random primitive (getrandom(2), getentropy(2)), a separate package is a lot easier to update than the stdlib, which is tied to language releases.

                              Golang’s syscall package has (had?) a similar problem, which led to big changes being locked down, and the recommended pkg being golang.org/x/sys. There’s a lot more agility to be had to leverage features of the underlying OS if you don’t tie certain core features to the same cadence as a language release. (this is not to say that this is the only problem with the syscall package being in the stdlib, but it’s definitely one of them. more info on the move here: https://docs.google.com/document/d/1QXzI9I1pOfZPujQzxhyRy6EeHYTQitKKjHfpq0zpxZs/edit)

                              1. 1

                                argparse

                                I’d use getopt over argparse. argparse just has really abysmal parsing which is different from other shell tools, especially when dealing with subcommands.

                              2. 1

                                True. Could be just rust-lang crates like futures-rs or cargo instead of being in the stdlib.

                              3. 12

                                This problem is Software Engineering Complete (like NP-Complete - transformable to any other SW Eng Complete thing). As just one other example, Nim also struggles with what should be in the stdlib vs. external packages. Rust has just about 1000x the resources than Nim for a much more spartan core stdlib, but of course the Nim stdlib almost surely has more bugs than that Rust spartan core. So, a lot of this boils down to A) tolerance for bugs, B) resources to maintain going forward, and C) the cost of complexity/generality in the first place, and probably a factor or two I’m forgetting/neglecting. ABC relate far more to community & project management/attitudes than language details themselves. Also, presence in the stdlib is not a panacea for discoverability because as the stdlib grows more and more giant, discoverability crashes.

                                Note this is neither attack nor defense but elaboration on why this is not an easy problem.

                                1. 2

                                  I wonder if the reason nim has to have a big standard library is to attract people. Rust already has the following, as you said, and people are sure to create an kinds of things. Whereas, if one was to try nim, if there wasn’t the stdlib, they would have to do everything on their own.

                                2. 3

                                  Same. A big part of the learning curve for me was discovering modules like serde, tokio, anyhow/thiserror, and so on that seem necessary in just about every Rust program I write.

                                  1. 3

                                    Not providing a standard executor was the only complaint I had from async Rust.

                                    1. 2

                                      I like that there is no built-in blessed executor - it keeps Rust runtime-free.

                                      I’ve worked on projects where using an in-house executor was a necessity.

                                      Also gtk-rs supports using GTK’s event loop as the executor and it’s very cool to await button clicks :)

                                      1. 1

                                        Yeah i’ve used it and it felt refreshing :) But for other small tools perhaps having a reference and minimal implementation would be good. I like the smol crate and I think would be perfect for this

                                    2. 3

                                      All of them developed over time and became a de-facto standard. But it was always the intention that the std doesn’t try to develop these tools as you need some iterations, which won’t work on with a stability guarantee. tokio just went to 1.0 this? year, I’ve got code lying around using 0.1,0.2 and some 0.3 (and don’t forget futures etc).

                                      anyhow/thiserror ? Well there is failure,error-chain,quick-error,snafu,eyre (stable-eyre,color-eyre), simple-error….. And yes, some of them are still active as they solve different problems (I specifically had to move away from thiserror) and some are long deprecated. So there was a big amount of iteration (and some changes to the std Error trait as a result).

                                      You don’t want to end up like c++ (video) where everybody treats the std implementation of regex as something you don’t ever want to use.

                                    3. 3

                                      There is problem with such approach of “batteries included” in the standard library - development of such libraries slows down or stagnates. Actually I prefer to use set of well behaved external libraries than need to replace “built-ins” because these are too simplified for any reasonable usage.

                                      1. 3

                                        The thing is that as a user I “trust” standard rust-lang crates at first glance; surely if I check the external libraries out or recognize them I will know they are well behaved and performant. Trust and trusting trust is such a big problem in software in general.

                                        1. 5

                                          Yes, that why there are some “blessed” crates as well as there is Crev project to expand the trust.

                                        2. 1

                                          just version the standard library interface, though

                                          1. 4

                                            And point me to one, just one, example where it worked? If something is merged into the core, then it will die there. Python has examples of such code, Ruby has examples of such code, etc. How often for example built in HTTP client is good enough to be used in any serious case? How often instead you pull dependency to handle the timeouts, headers, additional HTTP versions, etc. better/faster/easier?

                                        3. 2

                                          The rust ecosystem, IMO, is far too eager to pull in third party dependencies. I haven’t looked deep into this tool, but a quick glance leads me to believe that many of these dependencies could be replaced with the standard library and/or slimmed down alternative libraries and a little extra effort.

                                          1. 2

                                            Unfortunately it’s not always that simple. Let’s see the third party dependencies I pulled for meli, an email client, which was a project I started with the intention of implementing as much as possible myself, for fun.

                                            xdg = "2.1.0"
                                            crossbeam = "0.7.2"
                                            signal-hook = "0.1.12"
                                            signal-hook-registry = "1.2.0"
                                            nix = "0.17.0"
                                            serde = "1.0.71"
                                            serde_derive = "1.0.71"
                                            serde_json = "1.0"
                                            toml = { version = "0.5.6", features = ["preserve_order", ] }
                                            indexmap = { version = "^1.6", features = ["serde-1", ] }
                                            linkify = "0.4.0"
                                            notify = "4.0.1"
                                            termion = "1.5.1"
                                            bincode = "^1.3.0"
                                            uuid = { version = "0.8.1", features = ["serde", "v4"] }
                                            unicode-segmentation = "1.2.1"
                                            smallvec = { version = "^1.5.0", features = ["serde", ] }
                                            bitflags = "1.0"
                                            pcre2 = { version = "0.2.3", optional = true }
                                            structopt = { version = "0.3.14", default-features = false }
                                            futures = "0.3.5"
                                            async-task = "3.0.0"
                                            num_cpus = "1.12.0"
                                            flate2 = { version = "1.0.16", optional = true }
                                            

                                            From a quick glance, only nix, linkify, notify, uuid, bitflags could be easily replaced by invented here code because the part of the crates I use is small.

                                            I cannot reasonably rewrite:

                                            • serde
                                            • flate2
                                            • crossbeam
                                            • structopt
                                            • pcre2
                                            1. 1

                                              You could reduce transitory dependencies with:

                                              serde -> nanoserde

                                              structopt -> pico-args

                                              Definitely agree that it isn’t that simple, and each project is different (and often it’s not worth the energy, esp for applications, not libraries), but it’s something I notice in the Rust ecosystem in general.

                                              1. 4

                                                But then you’re getting less popular deps, with fewer eyeballs on them, from less-known authors.

                                                Using bare-bones pico-args is a poor deal here — for these CLI tools the args are their primary user interface. The fancy polished features of clap make a difference.

                                          2. 1

                                            Why do you think an external merge sort should be part of the Rust stdlib? I don’t think it’s part of the Python stdlib either. Rust already has sort() and unstable_sort() in its stdlib (unstable sort should have been the default, but that ship has sailed).

                                          1. 2

                                            Maybe worthy of note is the old Reactive Keyboard system:

                                            Darragh, J. J., Witten, I. H., & James, M. L. (1990). The Reactive Keyboard: a predictive typing aid. Computer, 23(11), 41–49. doi:10.1109/2.60879

                                            I’m not sure if there is a way to integrate this into a Unix terminal emulator smoothly. Integration via shell/vim/emacs/etc. might be better..but there is both more than one source of text and more than one place of applicability.

                                            1. 4

                                              Is anyone here using nim for something serious in production? It looks really nice and I am surprised is not more popular.

                                              1. 5

                                                How about the Nim Forum? The source code is at nim-lang/nim-forum on github.

                                                This really shows off the capabilities of the language: both the backend and frontend are written in Nim. The frontend is Nim compiled down to a JavaScript SPA.

                                                I’m the kind of person who is sensitive to latency; I dislike most JS-heavy browser-based things. The Nim Forum is as responsive as a JS-free / minimal JS site.

                                                1. 3

                                                  Not sure what counts as “serious in production”, but I’ve been running a kernel-syslogd on four or so machines ever since I wrote kslog. I also have several dozen command-line utilities including replacements for ls and procps as well as a unified diff highlighter. The Status IM Ethereum project has also been investing heavily in Nim.

                                                  1. 2

                                                    I’ve been working on a key-value data store; it’s a wrapper around the C library libmdbx (an extension of LMDB), but with an idiomatic Nim API, and I’m working on higher level features like indexes.

                                                  1. 12

                                                    Heck, if people can bring up Ada (which until I joined lobste.rs I thought was a historical footnote), I can bring up Nim :) It even compiles straight to C. Strings are pretty ergonomic, cleanup is automatic thanks to ref-counted GC, the language has an excellent tutorial, good reference docs, mediocre stdlib docs (fairly complete but hard to navigate.)

                                                    1. 5

                                                      For RosettaCode-like edification purposes/maybe give more detailed color on @snej’s comment, this is what it looks like in Nim. With the Nim tcc backend, it compiles in 475 milliseconds for me (from scratch). “UX Benchmark”-wise, it took me about 6 minutes to just port from his C++, mostly deleting chatter/noise to get this (and another 90 seconds more to fix up his glob_test).

                                                      import os
                                                      
                                                      proc glob*(pattern, text: string): bool =
                                                        var p, t, np, nt: int
                                                        while p < pattern.len or t < text.len:
                                                          if p < pattern.len:
                                                            case pattern[p]
                                                            of '*':
                                                              np = p
                                                              nt = t + 1
                                                              p.inc
                                                              continue
                                                            of '?':
                                                              if nt < text.len:
                                                                p.inc
                                                                t.inc
                                                                continue
                                                            else:
                                                              if t < text.len and text[t] == pattern[p]:
                                                                p.inc
                                                                t.inc
                                                                continue
                                                          if nt > 0 and nt <= text.len:
                                                            p = np
                                                            t = nt
                                                            continue
                                                          return false
                                                        return true
                                                      
                                                      proc walk*(pattern: string, dir=".") =
                                                        for path in walkDirRec(dir, relative=true):
                                                          var file = open(path)
                                                          var lineNo = 0
                                                          for line in file.lines:
                                                            lineNo.inc
                                                            if glob(pattern, line):
                                                              echo path, ":", lineNo, "\t", line
                                                          file.close
                                                      
                                                      when isMainModule:
                                                        proc main =
                                                          if paramCount() != 1:
                                                            echo "USAGE: ", paramStr(0), " <pattern>"
                                                            quit 1
                                                          walk paramStr(1)
                                                        main()
                                                      

                                                      As is, it runs as fast as the C++ (with both compiled with optimizations turned on). It could be optimized in a few obvious ways, of course, but run-time speed was also explicitly not the point of this “benchmark”.

                                                      Note that this “benchmark” is probably even more dependent upon developer-language familiarity than the usual fare (and even dependent upon text editor search-replace-delete-fu/typing speed).

                                                      1. 2

                                                        Cool, thanks!

                                                        As for tinycc — the speed sounds great, but is its optimizer competitive with GCC or Clang? I could see using it in debug/development builds..l

                                                        1. 2

                                                          TinyCC barely has an optimizer..so, absolutely not competitive. And yes, the idea is to use it for debug/rapid dev cycles, not release builds (as I think I alluded to in my first reply to @akavel). That said, unoptimized code tends to be “only” 2.5-10x slower than optimized, though YMMV (a lot). It’s effective for me for rapid edit-compile-test on small data/whatever/edit again cycles, but, as always, all are encouraged to do their own experimentation. :-)

                                                        2. 1

                                                          How do I setup Nim to work with tcc? Also, do you maybe know if this would work on Windows? Including the multithreaded features?

                                                          1. 2

                                                            Well, I just install the mob branch of tcc and then you can say nim c --cc:tcc foo.nim. For extra credit you can edit $HOME/.config/nim/nim.cfg to default to tcc & switch back to gcc/etc. if you define r so you can say nim c foo.nim for a rapid devel cycle and nim c -d:r foo.nim for an optimized output. You can also just say nim r foo.nim to just run it right away, of course.

                                                            I’ve used tcc on Windows, but it was like 7 years ago and not with Nim. There are a few adjustments like --tlsEmulation:on for multi-threaded which seems to work ok on Linux. Both Windows & threaded & tcc…you may be off the map. :)

                                                            There is rapid progress/hard work being done on incremental compilation going on with Nim that should make having a lightning fast C compiler less important (famous last words…).

                                                            1. 2

                                                              Thanks for the reply! For my hobby projects I seem to be having fast enough compile times that I usually don’t think about them. The tcc idea interested me more from a minimalism point of view - GCC is not exactly tiny, and I assume tcc is much smaller. But if you say there’s some extra adjustments that need to be discovered, I’ll probably pass for the time being; I’m having hard enough time with multithreading in the default Nim setup, that I couldn’t stomach extra challenge this time. But I’ll keep the idea in mind, maybe one day, definitely sounds alluring, thanks!

                                                              edit: oh, the mob branch idea is crazy fun! I’m going to submit it as a separate post :)

                                                      1. 3

                                                        It may be low profile which certainly adjusts risk assessment, but this Wikipedia/Wiki-like model for tcc has seemed to work well for over 10 years now, i.e. about as long as Github has been around (or git for that matter).

                                                        1. 7

                                                          I don’t know to what extent kids this days see Wikipedia as obvious, but I remember how when I first stumbled upon this idea of wiki, my mind was completely “oh shit, this is crazy, this can’t possibly be working.. yet, here it is?!?”, including the guilty pleasure of being able to edit someone else’s website with no strings attached… now that we know it does work, transplanting the idea to vcs seems to me similarly crazy, genius and exciting!

                                                          1. 4

                                                            Oh sure. Your exceitement is understandable. It is kind of natural, though – Wikis usually use version control as a way to make repair from mistakes/vandalism easy.

                                                            Why, in some alternate timeline/history, VCs might have been invented for Wikis before for source code… :-) Even in our own timeline, automatic management of document versions in word processors may have pre-dated RCS. diff certainly did and for like over a decade the Linux kernel worked off of just diff/patch. So, maybe this happened for VC in our own timeline, but only got really fancy for devs into fancy tools.

                                                            It is a pleasant surprise that mob revision can/does mostly work, as I think Jimmy Wales (founder of Wikipedia) continues to say. His first real-life version was purportedly (EDIT: a slightly upscale version of) yellow Post-It notes on a paper encyclopedia - the OG diff/patch. ;-)

                                                            1. 4

                                                              One difference I noticed long ago between git and wikis’ history, is that when writing text, I don’t seem to need nor want the extra step of commits spanning multiple files. This is rather curious to me, and also showed up when I tried to write an editor with versioning/history support; I started thinking to maybe “just” reuse git, but it quickly begun to feel like a major impedance mismatch to me.

                                                        1. 2

                                                          Rosetta Code already has a task for this without the simplification with many more solutions.

                                                          1. 2

                                                            Nim used to do this experimentally with a “strong spaces” setting in 2014, but I guess it was not popular enough or viewed as too error prone { or unpopular due to that view :-) } and got removed in 2019.

                                                            1. 4

                                                              For an update, see On the Information Bottleneck Theory of Deep Learning.

                                                              The article itself asks:

                                                              It remains to be seen whether the information bottleneck governs all deep-learning regimes, or whether there are other routes to generalization besides compression.

                                                              The paper I linked answers:

                                                              Moreover, we find that there is no evident causal connection between compression and generalization: networks that do not compress are still capable of generalization, and vice versa.

                                                              I think it was a great idea. But evidences suggest that it’s just not true.

                                                              1. 2

                                                                My intuition has been that discovering sparse representations are usually necessary for any kind of generalization – a model learning speech-to-text, for instance, will necessarily have somewhere inside it an understanding of the individual vowel/consonant sounds and utterances which are then building blocks for generating text.

                                                                “Compression” ~= “sparse representation”, right? So the paper refutes that idea?

                                                                1. 1

                                                                  thank you kindly for the link ! having cursorily looked at it and the arguments raised by tishby et al, it seems that information bottleneck might still be relevant…

                                                                  1. 1

                                                                    Why do you think information bottleneck might still be relevant? I am curious. (I consider the theory mostly failed at this point.)

                                                                    1. 2

                                                                      In that link @sanxiyn posts, there seems to be a very vigorous back and forth between Tishby et al. (the IB theory crew) and the article criticizing IB (EDIT: with neither side conceding defeat). The program committee accepting the paper to the conference may only mean they thought it worthy of a much broader discussion in the community than their review process.

                                                                      Since that was 2 years ago, perhaps other papers or discussion have taken place in the understanding of IB or its critique. I think the link itself and publication is non-conclusive, even of community opinion, never mind the fact of the matter.

                                                                      One kind of “obvious” point about “compression” and “generalization” is that the are almost semantically the same. To find a well generalizing representation means to have some representation that has been “properly de-noised”. Representing noise takes up space (probably a lot, usually, but that is problem specific). This applies to all fitting, from multi-linear regression on up, and maybe to all “induction”. (The transduction case, more analogous to “interpolation” is different.)

                                                                      That is just one piece of the puzzle, of course, and there may be controversy over how to define/assess “compression” (e.g. a neural net with a bunch of near zero weights may take up computer memory, but be the same as one without those weights at all), and also controversy over specific quantitative relationships between compression, however assessed and out of sample error rates.

                                                                      TL;DR - I think @sanxiyn has more work to do in order to establish “mostly failed” or “debunked” status.

                                                                      1. 2

                                                                        @cblake, i don’t think i could have said it better than you did. thank you !

                                                                        1. 2

                                                                          You’re welcome. The Wikipedia entry on the Information Bottleneck Method covers some of this controversy in the “Information theory of deep learning” section (today’s version..Future folk may someday have to go back to that in wiki history). They also have more references.

                                                                1. 3

                                                                  As far as the original article goes, I think that the universality (and hence possible desirability as a “standard”) of -h as a short option for --help is often overstated. For many tools that means “human readable”, nothing at all, or something totally different. I once did a survey on my system /bin & /usr/bin and under 50% of the programs had -h as a short option for --help. I suspect many thousands of programs are non-conforming in this way and that it is probably an unwinnable battle to change that.

                                                                  1. 31

                                                                    Good advice, except the one about skipping man pages. Man pages are easier to use and almost always more useful than --help. They’re “rich text”, they automatically open in a pager and they have a pretty much standardized disposition and format.

                                                                    The only exception is when developers are too lazy to write them and instead generate them from some other, non-man documentation. These man pages usually turn out terrible. But honestly, mdoc is not that hard. If you care about user experience, you should write a man page for your program.

                                                                    They don’t work on Windows? Says who? I’m sure man is ported to Windows. And besides, why should the way Windows works dictate how I write UNIX tools? I use both Windows and UNIX, and I don’t expect them to be the same type of system – that’s why I use both.

                                                                    Also, citation needed on the claim that “not enough people” use man pages. How can you know that?

                                                                    1. 5

                                                                      I think your advice on man pages here is a bit off, or at least, it’s not the advice I would give.

                                                                      The only exception is when developers are too lazy to write them and instead generate them from some other, non-man documentation.

                                                                      You seem to be using “lazy” as a pejorative here, but I see it as a positive in this case. My main problem with writing the entire man page from scratch is that it duplicates a lot of the documentation that is shown in the --help output. So when the docs change in one place, you have to remember to make the same change in the other place too. This is annoying. So it’s not just a matter of sucking it up and writing the man page in the first place. There’s a question of continued maintenance here.

                                                                      I write a portion of my man page in asciidoc while the rest is generated straight from the argv definitions. Overall, I think the resulting man page looks pretty good. Certainly not terrible.

                                                                      They don’t work on Windows? Says who? I’m sure man is ported to Windows. And besides, why should the way Windows works dictate how I write UNIX tools? I use both Windows and UNIX, and I don’t expect them to be the same type of system – that’s why I use both.

                                                                      I don’t use Windows and I don’t know the state of man tooling on Windows. But I don’t know anyone who uses man on Windows regularly. People who use my CLI tools on Windows might not have cygwin or WSL or whatever else installed. It’s a native Windows program, so they don’t need it. It would be awfully user-hostile of me to only offer the complete docs for the tool in a way that doesn’t let them read it natively on their system. “Oh no sorry, in order to read the docs for this tool, you need to go out and download this other port of a UNIX program. Good luck setting it up.”

                                                                      That’s why I try to write man-level documentation in my argv definitions. So that when a user uses --help, they see exactly the same documentation that appears in the man page. And they don’t need a special tool to read it.

                                                                      Now obviously, man-level documentation is quite verbose, so the compromise I settled on is that the -h flag will show a much briefer set of docs.

                                                                      But the point is this. All the docs for all the flags are located in exactly one place and that place occurs right next to that flag’s definition in the code. So when you need to add a flag or change docs or whatever, everything is right there near each other. Locality is a useful feature.

                                                                      So I’d say we agree that “forget about man pages” is bad advice. It makes me sad that it’s being promoted here. But let’s not be blind to their downsides and the realities of maintaining cross platform tools.

                                                                      1. 2

                                                                        I think you have presented a very responsible way of generating and thinking about man pages here. I like your compromise between -h and --help. I agree with you for the most part!

                                                                        However, I think that your generated man page (rg.1), at least as it appears on my system (Alpine Linux), is an example of the risks of such generation. The NAME section is preceded by six empty lines, and the definition lists likewise have too much spacing in comparison to traditional man pages (example). These small details are easy to miss when generating man pages from another format.

                                                                        But this is not a fundamental problem with your method, it is probably an easy fix, nor am I against such generation in general, if only done responsibly and consciously. You have clearly put good effort into it. Other than the spacing, rg.1 looks like any other man page.

                                                                        1. 1

                                                                          Yes, to your point, I used to use Markdown and converted it to a man format using pandoc. But the results were quite terrible. I almost just decided to learn mdoc and figure out how to use that, but noticed a lot of other projects using asciidoc and getting decent results.

                                                                      2. 5

                                                                        I think man usage is closely tied to what type of programming and what environment/OS is being worked on.

                                                                        If you work mostly on JavaScript on windows almost all of the documentation (command or library-wise) you regularly interact with is going to be on the web. If you are doing C on BSD, then you probably lean on man pages a lot. Most of the devs I know at work so Java on Linux or C# on windows, and I doubt most of either group of them are used to man pages.

                                                                        And if there’s a windows implementation of man (not cygwin or WSL), I’d love to see it.

                                                                        That being said, mandoc/mdoc is delightful, and I wish it would become more popular and cause the authors to change their mind.

                                                                        PS powershell help is infuriating - it seems comprehensive, but built for a different thought process than I’ve been able to force myself into. Tips would be welcome.

                                                                        1. 7

                                                                          If you work mostly on JavaScript

                                                                          Indeed, I think this goes for web technologies in general. They don’t use man pages because they are not UNIX tools. The same goes for C++ and Java to some extent. No problem there.

                                                                          But when JavaScript developers start building UNIX tools, they should write man pages for them, because the man page is part of what a UNIX tool is. UNIX users expect their programs to have manuals.

                                                                          And if there’s a windows implementation of man

                                                                          I’m not sure to what extent the man program itself works on Windows, with how it locates man files across the system, but I know mandoc, which man uses to render man pages, is ported. You don’t need Cygwin or WSL to use it.

                                                                        2. 4

                                                                          To be fair if your only experience with man pages are Linux’s then you probably would almost always skip them.

                                                                          It was only when I started using OpenBSD that I realised the true power of the manpage.

                                                                          1. 4

                                                                            I’ve used both OpenBSD and various distributions of Linux, and I think this is somewhat true, but not entirely. OpenBSD does make a greater effort to document the system as such, which is difficult to do for a Linux distribution, because it uses a bunch of different parts written by different people. But reasonably, program documentation must be pretty much the same on both systems, at least for programs that exist on both. Linux also has a lot of good documentation for C functions.

                                                                            1. 4

                                                                              ^^ this. I literally have a dedicated vertically oriented “man screen” that synchs from word-at-cursor with a vim keybinding against the OpenBSD man pages first, with a few exceptions (ptrace, mmap, …) even if I am on a Linux machine a the moment.

                                                                            2. 3

                                                                              I’ve always got the feeling that the man pages themselfs are great but navigating them is not where it could be. Try finding something in man bash. All the information is there but you just can’t find it if you’re not already experienced.

                                                                              And ironically understanding less(1) through its man page is horrible. There are just too many not that useful esoteric options, commands, special notations for key combinations etc. pp. for a tool with such a simple and important purpose.

                                                                              Using manpages should be dead simple (like markdown for example).

                                                                              1. 2

                                                                                My main issue with man is that I often struggle to find exactly the flag I need, as they tend to be rather lengthy. However, on a --help, I can search on my terminal and often find what I need in less steps.

                                                                                As for your point, you should definitely discuss it with them, then the post can be improved for everyone else.

                                                                                1. 12

                                                                                  FYI, man usually supports / to search

                                                                                  1. 4

                                                                                    Yes, but still, there is way more information, so it becomes harder to find exactly what I need. Sometimes such a simple search for a substring doesn’t cut it (if less has more fancy searches, I’m not aware of them).

                                                                                    1. 1

                                                                                      Yes, man, at least mandoc, supports tags – type ‘:t’ to jump. It also supports full text search across all manpages at once, using either whatis or apropos.

                                                                                      For writing manpages, there are also more friendly options than troff these days. For example, scdoc is reasonable. Here’s an example of scdoc documenting its own file format: https://git.sr.ht/~sircmpwn/scdoc/tree/master/scdoc.5.scd

                                                                                  2. 3

                                                                                    As was said, less (the pager man usually uses) supports / for searching and & for grep-like filtering. Much more convenient, in my opinion, than running --help output through grep.

                                                                                    That said, --help is fine for a concise specification of the tool’s options. It is not meant for more descriptive explanations. That’s where man pages are useful.

                                                                                  3. 1

                                                                                    I use app help all which prints out all help topics, and by piping that to less or your pager of choice, you’ve got essentially the same as you would have with a manpage, including formatting.

                                                                                    I could work on also providing a manpage with more or less the same text, but … what’s the point? I suppose that man app is kinda convenient, but it’s a very small convenience and comparatively a lot of work.

                                                                                    1. 4

                                                                                      The problem is that your x help all interface will always be inferior to man x because it is not a consistent interface that the user can expect every program to work with. I don’t think we appreciate that enough – being able to type man x to read a well-written documentation for every program on the system is a dream come true.

                                                                                      By not providing man pages, the developer not only annoys the user; he lowers the user’s expectation that man x will work for any x, which causes the user to look elsewhere for documentation in general, making man less popular, which in turn developers see as an excuse not to write man pages, which causes users not to expect man pages to be available, and so on forever, in a vicious circle. The more UNIX tools that are created without a man page, the less the man ecosystem will prosper.

                                                                                      In other words, I think providing man pages is responsible and commendable. For once in the UNIX world, there is a canonical and unified way of doing something. It would be a tremendous loss if we reverted to a thousand different and mutually incompatible help systems.

                                                                                      1. 2

                                                                                        Manpages aren’t consistent either; conventions differ widely. Here are four manpages from some fairly standard tools (curl, ls, top, xrandr), and the formatting of the flags is all different. I didn’t have to try many manpages to get this either: just load 4 tools from 4 different authors/ecosystems.

                                                                                        I’m not supposed to generate these manpages either according to your previous comment, so I’m supposed to endlessly muck about with this unreadable and uneditable troff nonsense for every update, which would be a duplicate of the inline CLI help and a massive waste of time IMO, just so you can type man app. Call it “lazy” if you will, but if we want to sling those sort of adjectives around then not wanting to type app help all – which gives almost identical results to man app – is much lazier, as the amount of effort for that is minimal, whereas the effort for writing non-generated manpages is pretty large.

                                                                                        1. 2

                                                                                          The formatting in those example is somewhat different, but mostly the same. The differences seem fairly irrelevant because I’ve read all of those man pages before and never noticed that they were slightly different in how they use indentation and bold text. You could say there’s a forest here to be seen beyond the trees.

                                                                                          What I said about generation in my original comment only goes for what one might call lazy and ignorant generation, characterized by these things:

                                                                                          • The end result does not read like a man page, because it is generated from a type of documentation entirely unlike a man page.
                                                                                          • The developer is largely clueless about troff/mandoc and the man page system and trusts a generation system written by somebody else, who might or might not know what they’re doing.

                                                                                          This is the type of generation I call lazy. Not generation in general. In fact, troff is a great target for generation – if you know what you’re doing. Your help x/y/z documentation, which I am not saying is bad, might be generated to troff with great success, but it depends on how it is written, and it must be done with care.

                                                                                          I realize it might sound a bit demanding: not only must you provide a man page, it must also be well-made. But naturally, if I think man pages should be provided, I also think they should be good. The way I see it, the user is always going to be lazy, but the developer should try not to be, and it is noble if one tries.

                                                                                          (Besides, developers regularly work with all kinds of arcane languages, not least the UNIX shell. Troff is just another one, and a relatively simple one at that, especially if you stick to m(an)doc, which doesn’t actually use the troff program per se AFAIK.)

                                                                                          (By the way, Git is a good example of a UNIX tool that has man pages like help x/y/z, except they’re man git-x/y/z.)

                                                                                          1. 3

                                                                                            But the result are exactly the same. Why spend a day (optimistic, will probably be more) writing some troff tool? Just because I know one “arcane” language doesn’t mean I should learn another just to fit the personal preference of some random person on the internet. Life is short. There are a lot of things of value I’d like to do. Dealing with troff is not one of them.

                                                                                            Good documentation doesn’t depend on the format; that’s just a boring implementation detail. Docs in HTML or PDF is far from my favourite format, but I’ll choose well-written HTML or PDF docs over some manpage any day, especially if that makes things easier for the author. Good documentation depends on clear writing, which is hard and time-consuming, so it’s probably best to spend time there than on technical implementation stuff. If manpages work for you: great, go for it! If something else works for you: use that.

                                                                                            1. 2

                                                                                              But the result are exactly the same.

                                                                                              What does this refer to? Maybe you can clarify.

                                                                                              Regarding format, it is not just an implementation detail. Man pages can be compared with academic papers, which also follow a more or less standardized format. People expect an abstract, an introduction, methodology, background, results and so forth. This format may sometimes feel like a burden for the authors, but ultimately, it helps them. And paradoxically, creativity tends to flourish precisely when it must work under restrictions.

                                                                                              I’ll choose well-written HTML or PDF docs over some manpage any day, especially if that makes things easier for the author

                                                                                              I think most UNIX users would be find it very annoying if some of their programs were documented in HTML, other in PDF, yet others in mdoc, etc. etc. Man is valuable because it is a universal documentation system. That doesn’t mean it has to be the only documentation system. Very complicated programs, such as Emacs, cannot and should probably not be documented entirely in man.

                                                                                              If manpages work for you: great, go for it! If something else works for you: use that.

                                                                                              Again, from the perspective I’ve represented here, I don’t think this is a proper way of thinking about it.

                                                                                              If man pages are an intrinsic part of what UNIX tools are, which I argue, then not providing a man page for a UNIX program is akin to not providing an installer for a Windows program. Some Windows programs are provided as zip files, which the user must install manually, but that is not in line with what Windows programs should be.

                                                                                              I think we should not move towards more fragmentation here. It would surely be easier for developers, because it is always easier (or at least it feels easier) to work without restrictions, but would be a great loss for users and the UNIX ecosystem in general.

                                                                                              1. 3

                                                                                                But the result are exactly the same.

                                                                                                What does this refer to? Maybe you can clarify.

                                                                                                That a help command (or -h/--help flag) doesn’t need to be that different from what you get from a manpage; see e.g. this or this. You could argue this or that should be bold or underlined or whatnot, but I’d format it more or less the same in a manpage (not a huge fan of a lot of bold/underline text), and the principle is the same: short, concise plain-text documentation.

                                                                                                The only tool that I still maintain with a manpage essentially just duplicates the --help output. It’s fairly pointless IMO, and just increases the maintenance burden for little reason (the --help output is actually better because it’s easier to align stuff better).


                                                                                                Manpages haven’t been “the standard” for years; loads of CLI tools come without manpages, or essentially just manpage for the sake of it but without the full (or even useful) docs. GNU tools actually do this: the full documentation is usually available in info; for example yesterday I wanted to set the time to the year 2100 to test some potential 2038 bug, and date(1) has just:

                                                                                                   -s, --set=STRING
                                                                                                          set time described by STRING
                                                                                                

                                                                                                STRING … yeah, useful … what kind of string? It’s not really mentioned anywhere, but info date has much more detailed (actually useful) docs on this. This is common for a lot of CLI tools, not just GNU ones (although only GNU tools use that weird info thing).


                                                                                                Your entire argument is essentially just “I prefer it like this, therefore everyone should be doing it like this”. It’s fine to prefer things and argue the advantages of this – I do this all the time – but I really don’t like it when people go around telling me what I “should” be doing with my spare time, that I’m “lazy”, or that it’s not “the proper way to think about it”. I have no doubt this was not your intention, but it comes off as fairly unpleasant and even aggressive to me. Do with that feedback what you will.

                                                                                                1. 1
                                                                                                  1. Thanks for clarifying.

                                                                                                  2. I still maintain that man pages are a de facto standard, and I think your example proves that. Otherwise, why would the GNU tools even have man pages?

                                                                                                    Further, I would estimate that the majority of CLI tools packaged by popular Linux distributions actually do come with man pages. Sometimes, package maintainers even add them manually. Why would they do this, if they didn’t have an idea that man pages are or at least should be a standard documentation system that their users can rely on being available?

                                                                                                    I think your argument is akin to saying that there is no “standard” sense of morality or values within a culture, just because some people within that culture disagree with it. Democracy and liberalism are “standard” values in most Western countries even if some neo-fascists who live there disagree with it.

                                                                                                    A couple of exceptions don’t disprove the rule, and even large disobedience to a standard (e.g. the ancient Israelites) does not remove the standard (i.e. their ideal law).

                                                                                                  3. No, I argue for man because I think it is a valuable standard. I don’t think it’s perfect, it’s just okay. In fact, some mix between man and info would perhaps be better, in theory. But in practice man is the closest we’ve got.

                                                                                                    That’s why I think UNIX developers should put in the effort to write man pages, because I care about the preserving the man system.

                                                                                                  Sorry if I came across as arrogant. I tried as much as I could to argue as clearly and convincingly as possible and limit the claims of my own opinions by saying “I think”.

                                                                                                  I think this discussion has reached its limits, at least for now, but I’d be glad to discuss this some other time and really try to understand each other. Dank u voor de discussie!

                                                                                                  1. 1

                                                                                                    By the way, one of the things I’ve been meaning to write for years is a “unified documentation viewer”; typing doc ls will display the info page if it exists, or the man page if it exists, or the output of ls -h and/or ls --help if that works, etc. This would also include native support for rendering asciidoc and/or Markdown, as well as a pager that makes some things a bit easier.

                                                                                                    I think something along those lines is probably a more useful path forward than telling people to write manpages, and if done well it probably has the potential to outright replace manpages as the de-facto standard Unix documentation tool. I mean, there’s a reason people aren’t writing them, and addressing these reasons is more helpful (I have similar feelings to the “you should use {IRC,Email,…}” arguments that people use).

                                                                                                    1. 1

                                                                                                      doc ls

                                                                                                      That’s not a bad idea. I’d be very interested in such a tool. It would certainly be very useful, sort of along the lines of tldr.

                                                                                                      there’s a reason people aren’t writing them, and addressing these reasons is more helpful

                                                                                                      Interestingly, we both seem to take a pragmatic approach to the way we think about these things. In my case, I think that neither e-mail and man are perfect, but nonetheless I care about preserving them because they’re already so widespread, and having a standard, regardless of how imperfect that standard is, is better than fragmentation.

                                                                                                      I agree that it is ultimately ineffective to yell at people to use e-mail or man, when they’d be more persuaded if e-mail/man were actually made easier to use, which they certainly can.

                                                                                                      Sometimes, though, people just think that man is hard to use. They haven’t actually read the mandoc/mdoc manuals or really tried to find any information on how man pages are created. They’ve ruled out man(doc) beforehand, simply because a lot of people online, who in turn might not have any actual experience with man(doc) either, say that man is not modern or is hard to use or that not enough people read man pages anyway (for example, the page discussed in this thread).

                                                                                                      In these cases, while it isn’t helpful to yell at people, it might be helpful to suggest to them that their prejudices about man are incorrect and that they should look into it honestly and reassess it.

                                                                                              2. 1

                                                                                                I wonder what @johnaj thinks of help2man or if he has another suggestion. help2man is what I’ve always suggested to cligen users.

                                                                                                It might be “not so hard” to write in a common sub-dialect that winds up formatting well-ish in both help & man formats. Then muscle-memory man cmd folks and --help folks can all get along with minimal extra effort. Maybe sometimes a little auto-postprocessing on the output of the generated troff stuff could help.

                                                                                      2. 1

                                                                                        This has been one of our more controversial suggestions with reviewers too! This guide is intentionally meant to be a bit opinionated and to open discussion, so I’m enjoying the debate. ;)

                                                                                      1. 2

                                                                                        nim doc supports restructured text (rST) to format doc comments into HTML. So, cligen which grabs documentation from the very same doc comments supports “rich text”. If the user turns on colors via their ~/.config/cligen/ files then the help messages can be (almost) as nicely formatted as man pages, but everything is maybe a bit more terse by its nature. (Well, it doesn’t need to be, as the help message for lc perhaps demonstrates.)

                                                                                        1. 5

                                                                                          Also, it depends on which AWK implementation. mawk is generally much faster than gawk.

                                                                                          Setting up the rough equivalent to what is in the post for the first “benchmark”, I get 0.164s for cut, 0.225s for mawk, and 0.413s for gawk. Similar ratios with the other test.

                                                                                          I find the conclusion of the post to be pretty flimsy.

                                                                                          1. 2

                                                                                            I agree the conclusion is flimsy. Much also depends upon the CPU. I just tested gawk-5.1.0 as between 1.20x and 1.55x faster (4 diff CPUs) than mawk-1.3.4_p20200106 for “extracting column 2”.

                                                                                          1. 2

                                                                                            Actually, even small writes to a pipe are not atomic once the pipe buffer fills up. The normal thing that happens there is the write partially completes and then blocks and the process is put to sleep. Latter parts of that write call only happen when a reader has read from the pipe to make room. If there are >1 writers to the pipe, then they can be awoken in any order in which case their writers are interleaved. This is avoidable with O_NONBLOCK, of course, but can be a real gotcha.

                                                                                            As a concrete example, this is not reliable unless you change it to -P1 (defeating the purpose of real-time-speed up of the very slow file program):

                                                                                            find . -print |
                                                                                              xargs -P4 file -n -F:XxX: |
                                                                                              grep ":XxX: .*script" |
                                                                                              sed -e  's/:XxX: .*$//'`
                                                                                            

                                                                                            To make it fail you want to save the output and run this in a tree with a boatload of scripts or just change “script” to “.” to match everything. Then check that output against names in the tree. It may take multiple trials and your -P might need to be about as large as the number of CPU core threads you have, depending upon how busy/idle your system is.

                                                                                            Anyway, the point is this fails even though every individual write(2) call by the file children of xargs is well below the “atomicity size limit” “guarantee” due to the sleeping/scheduling pattern noted above the example pipeline. (At least they’re well below if you are in a normal file tree where path + separator + file type is a reasonably short string.)

                                                                                            1. 6

                                                                                              Actually, even small writes to a pipe are not atomic once the pipe buffer fills up.

                                                                                              That’s incorrect. According to POSIX: “Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe.” The entire small buffer is written in one go, blocking the process first if necessary.

                                                                                              https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

                                                                                              1. 3

                                                                                                Oops. You are right. I stand corrected. Apologies!

                                                                                                That pipeline (and a simpler one with just grep 'XxX.*XxX') does interleave, but it is from stdio buffering. Pipelines work fine with stdbuf -oL file. I should have been more careful about concluding something about the OS.

                                                                                                Reading the source, it turns out that file -n|--no-buffer only does unbuffered output when also using --files-from. The file man page (in combination with an strace test to a tty with line buffering) fooled me by saying it was “only useful” (not “only active”) with --files-from .

                                                                                            1. 4

                                                                                              Just to nitpick a bit, because this has been bothering me for a while: Conflict-free replicated datatypes aren’t a solution for conflicts.

                                                                                              Making a CRDT is easy. Whenever you have a conflict you hash both data sets and throw away the one with the lower hash. This may be stupid but it fulfills the criteria for a CRDT.

                                                                                              The conflict between the authors intentions is still there, the only thing that is conflict-free is the state of different copies of the data after all the changes have propagated. The conflict of the changes that where made is semantic in nature and probably has to be resolved at a language level and/or by a human.

                                                                                              Which is probably why Pijul ended up with counter-intuitive behavior by treating text as a graph and using that representation for conflict resolution.

                                                                                              1. 6

                                                                                                I have worked on something similar (filesystem synchronization with a CRDT model) and I agree, people misunderstand CRDTs.

                                                                                                Behind the theory, the idea of CRDTs is that from the user’s point of view they are what we usually consider “data structures” plus rules that define how users can modify then and how the data will be merged in all cases. That means that:

                                                                                                • There is no case in which the system will say “I don’t know what to do” and have to ask for outside help synchronously. Adding something to the data structure that says “ask a human later” is fine.

                                                                                                • The end result will not depend on which device does the merge.

                                                                                                • Most importantly those rules are built into the data structure and exposed to the user (they are part of the API).

                                                                                                Now the problem is to design a CRDT that does what the end user wants, and that’s a lot harder than just making a CRDT “that works”…

                                                                                                1. 6

                                                                                                  Just to nitpick a bit, because this has been bothering me for a while

                                                                                                  You seem to be in agreement with the post. That section of the post just says “here is some related work, it’s called CRDT, and wasn’t enough to solve it”.

                                                                                                  1. 1

                                                                                                    I think the directed graph representation comes from the original paper as explained well for non-mathematicians here.

                                                                                                    1. 2

                                                                                                      It’s inspired by that, but does a lot more. The actual thing used by Pijul is explained in the blog post linked here as well (and your link comes from that person reading Pijul’s source code and asking us questions about it).

                                                                                                  1. 4

                                                                                                    I was going to share this in a new thread, but since it’s related I’ll share it here.

                                                                                                    (Not trying to take over your thread, just trying to keep the home page clean.)

                                                                                                    I found a really nice locate alternative written in Rust, and it’s really good (at least in my opinion).

                                                                                                    https://github.com/mosmeh/indexa

                                                                                                    1. 3

                                                                                                      Is it as fast as this one though?

                                                                                                      1. 3

                                                                                                        In terms of searching for the file based on the query, yes. It feels almost instataneous.

                                                                                                        However, it’s used for interactive selection. That way you can wrap commands around it like:

                                                                                                        emacs "$(ix)", vim "$(ix)", or mpv "$(ix)"

                                                                                                        Selecting the file you want in indexa will output the full path to stdout.

                                                                                                        1. 2

                                                                                                          I just tried out both. plocate is much more a near drop-in replacement for mlocate. plocate ingests (usually pre-built) mlocate databases while ix does its own file tree walking. This makes plocate build time 100s of times faster than ix as well as sharing the usually in cron DB builds.

                                                                                                          In terms of query time, plocate runs in low single-digit milliseconds. ix seems to have no non-interactive mode. The only way to make it non-interactive would appear to be a pseudo-terminal (the setup & control of which which might well dominate run time).

                                                                                                          1. 1

                                                                                                            Actually, indexa creates it’s own database, than tree-walks from that.

                                                                                                            And I did say that indexa was used for interactive selection.

                                                                                                            I said that in terms of searching, yes, it’s just as fast.

                                                                                                            1. 2

                                                                                                              You can reimplement indexa with plocate and fzf. It would probably be faster and less diskspace used for the file database.

                                                                                                              1. 1

                                                                                                                I meant that “just as fast” is hard to know. indexa is only interactive. So, one is stuck with “how fast my screen changes”. plocate was taking 5 millisec. A 10x slower indexa at 50 ms might well “look” the same, roughly “movie frame instant”. I’m not saying indexa does take 50 ms on my test file hierarchy. I just don’t know. It’s hard to measure. :-) That was my point of my 2nd paragraph. Sorry it was unclear. Could be under 1 ms or maybe up to 100ms. A more careful comparison is warranted before claiming “just as fast” conclusions, though.

                                                                                                                1. 2

                                                                                                                  For example, if type time ix -q MissingFile and hit the ENTER key twice in as rapid succession as I am physically able then the lowest time I can see is about 75 ms. Meanwhile, if I strace -tttfs16 -o/dev/shm/ix.st ix -q MissingFile and do grep -C2 'execve\|read.0,' /dev/shm/ix.st then I see times around 75-85 ms until calls just before the read(0,..). That is some 15..17x slower than plocate on the same test data.

                                                                                                                  These are admittedly lame benchmarks & include all screen/terminal set up time in both cases and strace/ptrace mode overheads in the more precise benchmark. Whoever wrote indexa already added -q. If they just add a -n non-interactive option to just print any answers then performance would be much easier to compare.

                                                                                                                  Looking at the strace shows a lot of millisecs in memory allocation system calls, though. So, I am not optimistic that this Rust thing is much faster than 10x slower than plocate, carefully assessed. Also, for my test data, indexa|ix uses 286 MiB while plocate uses only 4 MiB. So, I would have to agree with @Foxboron that plocate + fzf would likely be more efficient in multiple metrics.