1. 2
    def d(w,c)s=->w{c.map{|d|(d&[w])[0]?0:d-[-w]}-[0]};c[0]?(v=c[0][0])?d(w+[v],s[v])||d(w+[-v],s[-v]):!1:w end
    
    1. 4

      I never understood why Linux doesn’t just add that.

      1. 1

        Alternatively, in zsh:

        zmv -n '*' '${f// /_}'
        

        (This will also ensure the replacement’s don’t overlap.)

        1. 6

          A different, more imperative approach: A paper algorithm notation, by Kragen Javier Sitaker.

          1. 1

            I tried writing a suffix tree class with Kragen’s paper algorithm notation the other day. The problem is that suffix trees have many nested layers of ifs so I ended up running out of horizontal space and then getting frustrated. I barely understand suffix trees well enough to hold them in my head, so I’m thinking I should start by writing out something I understand better. That’ll separate the concern of developing muscle memory and fluency for the notation from the concern of improving my understanding of suffix trees.

            As a side note, writing a suffix tree in J would be even more painful.

          1. 26

            Forgive the wall of text. Text editor data structures are something I’m particularly passionate about.

            The piece table is superior to the gap buffer only if the following are all true:

            1. Files are stored in the encoding they will be manipulated in, and/or it is trivial to synchronize the encoding in memory. Fixed-length encodings and UTF-8 fit these criteria, most other variable-length multibyte encodings do not.
            2. Your system does not have efficient paged virtual memory with a secondary store.
            3. Filesystem operations are not more than some fairly small constant multiplier more expensive than memory operations.

            As a big fan of Oberon, which used piece chains as its mechanism for storing text (derived from Gutkenecht’s earlier work on Lara), I’m pretty familiar with the piece chain data structure. The piece chain made perfect sense in Oberon, because the machines in question didn’t have virtual memory and the programming language itself didn’t support dynamically-sized allocations so it couldn’t reallocate a gap buffer even if it wanted to.

            (Some later versions of Oberon did support dynamically-sized allocations…)

            Piece chains have the advantage that all or almost all of the text stays on disk. This is very useful in that file loads are essentially instantaneous and memory usage is proportional to the number of changes rather than the size of the file. It makes absolutely perfect sense if you don’t have virtual memory that can be paged to and from disk.

            (Pike, in his The Text Editor sam extends this argument: it doesn’t work if you have crappy virtual memory either.)

            Virtual memory, though, lets the operating system handle the disk operations too. And the rapid loading benefit goes away if you have to decode the text (e.g. from some multibyte encoding to wide characters in memory).

            In terms of expense of editing operations, gap buffers and piece chains are more-or-less identical in the asymptotic sense: the piece chain maintains a cached (offset,piece) pair that corresponds directly to the location of the gap in a gap buffer. The performance of a piece chain degrades as you add more edits as you have to traverse more and more pieces to get to a certain spot. Gap buffers lack that issue.

            Piece chains have two important advantages: it’s easier to store metainformation about text if the metainformation applies to “runs” of text, since the formatting information can be stored in the piece headers; and undo/redo is easy to implement (but not incredibly so).

            In other words, piece chains are awesome and a particularly beautiful data structure, but their actual advantage over gap buffers on modern hardware under modern operating systems is kinda a mirage in most use cases.

            1. 4

              First, thanks for your insightful comment. You seem knowledgeable on the subject so I have a question for you.

              I discovered the piece table data structure while reading the linked article (so, just now), and my first thought was that it would be a good fit to implement a collaborative editor. I have not thought this through yet but I guess that instead of a single “new” file it would be possible to have one per participant. What do you think about this idea?

              1. 2

                Thinking about it, yeah, that would be an excellent use case. Each user could have their own append-only key file from which pieces are carved and linked into the chain. Each user would also need their own point offset cache, so you’d need to be careful about managing those but I think that would be considerably easier than maintaining multiple gaps.

                1. 2

                  Cool. So this will be another project for motivated master students ^^. Thanks!

              2. 1

                Can you explain how to implement gap buffers in a way to exploit virtual memory? This seems to require quite complicated and probably nonportable overlapping mappings, but perhaps I’m missing something obvious.

                Also, piece chains trivially and efficiently support “multiple cursors” which are en vogue today.

                1. 1

                  Re: the multiple cursor thing, yeah, that’s true. There was another comment above talking about that (well, collaborative editors, which are essentially the same thing). Multiple cursors are neat but I never got into them. I’m too old and set in my ways.

                  Gap buffers taking advantage of virtual memory is automatic, assuming the VM implementation is reasonable. Allocate a buffer of whatever size and the individual pages of that allocation will be paged in and out as needed.

                  One major advantage of piece chains is that most or all of the text remains or can remain on disk, which is important when you have limited memory. When you have an essentially infinite virtual memory that the OS automatically pages to and from disk as needed, there’s no longer any reason to not simply load the whole file into memory.

                  1. 1

                    I don’t see how this will work. You can mmap the whole file into a contiguous area, but then you cannot add a gap… or you can map it twice but then you need to remap everytime you move the gap… just making use of VM is either “load whole file into memory” or, really, “copy whole file into swap partition”.

                    For piece chains, you just map once and look for the unchanged text there.

                    1. 2

                      That’s it. Read the whole file into memory. You’re done. When memory runs low/something else needs physical memory, the OS will page out part of the buffer to disk. When you access that part again, a page fault happens and the page is brought back in.

                      No mmaping or anything. Just allocate as normal and let the virtual memory subsystem do its job.

                      You pay the price of the initial load, but if you have to decode the text in any way (e.g. UTF-8 to wchar), you were gonna have to do it anyway.

              1. 4

                Hey, Lobsters! I’ve been sitting on this project for a while and a buddy recently recommended sharing it here. I built it to scratch my own itch, so it’s still very MVP. Its biggest shortcoming is definitely the lack of the ability to apply AND and OR boolean logic to filters. In the long term I’d like to add the ability to combine multiple feeds into one as well, assuming there’s demand.

                My girlfriend was looking for a new job in a rather niche field so I had subscribed to some RSS feeds, only to find they were country-wide and there was no good way to narrow it down geographically. I didn’t want to wade through hundreds of job postings just to nab a couple, so I built siftRSS. Laziness really is a virtue.

                I’d really appreciate any feedback. Thanks!

                1. 2

                  Support for regexp would be great too. ;)

                  1. 1

                    Ahh, good call! It was in my original plans (like many things), but I was focused on cutting the scope to the point of embarrassment. ’Tis the nature of MVP, after all. :)

                    If you like, I can shoot you a message when I implement regex matching.

                1. 2

                  Most of the benefit from this approach would come if you can often expect streams of 16 bytes to contain no white space character. This seems like a good guess in many applications.

                  Sounds quite unlikely to me.

                  1. 3

                    <a href="https://lobste.rs/s/fajid3/pruning_spaces_from_strings_quickly_on">Pruning spaces from strings</a> has 80 consecutive bytes of non-whitespace, which is 4 or 5 chunks depending on how it ends up aligned.

                  1. 3

                    I guess most grep will be quicker than awk, however.

                    1. 1

                      Unless we are talking an order of magnitude slower, YAGNI.

                      1. 3

                        ( repeat 100 cat /usr/share/dict/words ) | time grep '/[aoeiu]..[aeoiu]/' >/dev/null

                        p9p grep: 0.55s
                        GNU grep 3.1: 1.3s
                        busybox grep: 5.2s
                        ripgrep: 11s

                        mawk: 2.5s
                        gawk: 3.7s
                        nawk: 6.6s
                        p9p awk: 10s
                        busybox awk: 11s

                        GNU sed -n: 4.5s
                        p9p sed -n: 18s
                        busybox sed -n: 6.6s

                        perl: 10s
                        ruby: 14s

                        1. 1

                          How in the world did you get those timings? I can’t reproduce anything close to them with either GNU grep or ripgrep.

                        2. 1

                          Depends on the context. For a script processing large quantities of data, 2x could be a pretty big win. (Though in the specific case of GNU tools, awk actually uses the same DFA matcher that grep does, so it’s probably a wash in any case.)

                          1. 1

                            Agree on context. For everyday use it seems unlikely to matter, too much.

                        3. 0

                          On that basis, ag is much faster than grep.

                        1. 4

                          This works fine for my purposes:

                              alias homegit="GIT_DIR=~/prj/dotfiles/.git GIT_WORK_TREE=~ git"
                          
                          1. 1

                            It really bugs me why this sort of solution (without some intermediate layer) doesn’t get more attention. I use alias dotfiles='git --git-dir=.dotfiles.git' myself.

                          1. 1

                            I use this in zsh:

                            # setpath [FRONT] [-- BACK] - set $PATH, optionally put FRONT first, BACK last
                            setpath() {
                              path=(
                                ~/bin
                                ~/prj/mblaze{,/contrib}
                            #   ~/src/vis
                                ~/.gem/ruby/*/bin(Nn[-1])
                                ~/.opam/current/bin
                                ~/.cabal/bin
                                ~/.go/bin
                                /usr/local/sbin
                                /usr/local/bin
                                /usr/sbin
                                /usr/bin
                                /sbin
                                /bin
                                /usr/games
                                /usr/games/bin
                              )
                              path=( "$@[1,$@[(i)--]-1]" ${(u)path:A}(N-/) "$@[$@[(i)--]+1,-1]" )
                            }
                            setpath
                            
                            1. 4

                              I think this is quite intriguing, I can build a Beamer slide set with only fetching 20MB of data (+ 48MB Tectonic).

                              But I really wish it used LuaTeX, especially now as it moved to C. I think many future TeX developments will make use of Lua.

                              1. 1

                                Is LuaTeX part of the standard distribution? Just wondering what will make LuaTeX take off.

                                1. 3

                                  Time, I think, will make LuaTeX take off. It’s already a huge part of ConTeXt. Any large macro package would be easier to implement in Lua than TeX.

                                  1. 3

                                    I love LuaTeX and ConTeXt, but they heavily suffer from a documentation problem. It’s always a struggle to find any practical advice. :(

                                    (I write all my slides in ConTeXt through SHSC): https://github.com/skade/SHSC#usage

                                    1. 1

                                      I agree, but I have found whenever doing ConTeXt that I have fewer questions than I do using the alternatives. So, I get further before hitting a brick wall, but the brick wall’s a doozy. :)

                                  2. 1

                                    It is part of TeXLive, has good support for modern things like OTF and PDF, and the embedded Lua makes much logic a lot easier. Also, it seems to be one of the few engines that are actively developed.

                                1. 4

                                  Even stranger for me is that 30% of bugs were filed by MinGW users. It’s hard to draw anything more from that figure alone (we don’t know how many unique users there are, for example), but it still strikes me as surprising.

                                  1. 2

                                    My normal Windows build with GUI and everything identifies as “i686-pc-mingw32.” It’s just build the way regardless if you’re using MinGW or not.

                                    1. 4

                                      Sorry, was a bit unclear there - it was more the fact that there are so many Windows users (well, bugs filed by Windows users). My naive assumption was that a minority of Emacs users are Windows users.

                                      1. 5

                                        Perhaps there simply are more bugs that occur on Windows? ;)

                                        1. 1

                                          That would probably make sense! My assumption is that there are more contributors contributing from a {u,Lin}nix machine than there are from Windows. This wouldn’t have anything to do with Windows itself, though.

                                        2. 1

                                          There were a lot of Unix developers forced to work on Windows for market reasons in the 90’s and such. So I’m guessing they took their text editor with them.

                                          I worked at all-Windows shops in game dev where at least one person used Emacs.

                                          The same dynamic applied to Python. I started out with Python on Windows when I worked in games. Proprietary game SDKs only work on Windows, at least back then. Even though like 95% of the Python devs use Unix, probably over 50% of users use it on Windows (as judged by their download stats). It was super useful on Windows, and probably the same goes for Emacs.

                                      2. 1

                                        Looks like it was actually closer to 10% (from a reply in the same thread):

                                        I’ve now done this with a bit more rigour. And graphs! Now each bug reporter email is only counted once per year. (The biggest effect of this is a drop in the mingw %.)

                                        See: http://debbugs.gnu.org/stats/emacs.html

                                      1. 13

                                        …nobody tell him about Perl’s syntax.

                                        1. 2

                                          Or Ruby’s unless.

                                          1. 6

                                            Both of those don’t have else.

                                            (The correct solution is to make if an expression)

                                            1. 1

                                              When I first read this I worried he was railing on if as an expression, but this is more about the loss of readability that comes from tacking if onto expressions.

                                          1. 3

                                            Note that the SSH protocol (as by RFC4254) itself only supports passing a single string when executing a commmand:

                                              byte      SSH_MSG_CHANNEL_REQUEST
                                              uint32    recipient channel
                                              string    "exec"
                                              boolean   want reply
                                              string    command
                                            
                                            1. 2

                                              rdumpfs and borg (for encrypted off-site backup).

                                              1. 5

                                                Re-reading papers on early version control systems (RCS, et al) and diffing for a presentation on…version control. I am re-saddened by the lack of emphasis on developer experience for these tools and re-surprised by the rare usage of semantic information in diffing (or even basic syntactic awareness, such as curly braces defining blocks).

                                                  1. 1

                                                    Certainly can give you my experience of using RCS, CVS, svn, hg, git…

                                                    I’m sort not surprised about the rare usage of semantic information in diffing in that the semantics shift laterally and longitudinally….

                                                    ie. We version many different languages and the semantics of those languages have all shifted over the history of the project!

                                                    1. 1

                                                      semantics of those languages have all shifted over the history of the project!

                                                      I wasn’t talking “deep” semantics, more about things like (true, behavior-preserving) refactorings where methods might be extracted, variables inlined, etc., but the behavior hasn’t changed. It’s too bad tools like IDEA can’t store the valuable higher-level operations (“renamed class A to B”, “extracted this code into a method C”) somewhere in a version control system. Instead, we’re still dealing with diffs of lines of text.

                                                      But semantics aside, language-specific syntactic comparison/merging is just not that hard (I did this for my version control tool back in the early 90’s).

                                                      1. 1

                                                        Which tool was that? Just curious.

                                                        1. 1

                                                          It was a vcs for Visual Basic called “VB ProjectWorks”. I succumbed to Microsoft’s FUD at the time, and shut down the company because they had come out with their (awful, horrible) vcs tool “Delta”.

                                                    2. 1

                                                      This list is not complete without SCCS: http://www.basepath.com/aup/talks/SCCS-Slideshow.pdf

                                                      1. 1

                                                        I do cover that, though didn’t put it in my list above, so thanks for the addition! What’s hilarious is that the IEEE version of this paper (from their digital library) is worse than the PDF you reference, as it looks like they tried to OCR it and mangled it up.

                                                    1. 8

                                                      I find the lack of dynamic allocation especially interesting. In what sort of situations would this be necessary?

                                                      1. 5

                                                        Embedded systems?

                                                        1. 1

                                                          As an example though, what sort of system requires SSL and does not allow dynamic allocation? Not questioning the purpose of the feature, just curious :)

                                                          1. 5

                                                            It’s not about “not allowing dynamic allocation”.

                                                            Statically allocating everything is more predictable, since you don’t have to worry about e.g. heap fragmentation. It reduces some amount of security issues, like failing to handle malloc errors, and use-after-frees, and double-frees. It makes it impossible (ish) to have memory leaks, which can also be security-relevant. It makes it easier to write thread-safe code.

                                                            1. 1

                                                              Ahhh, that makes a lot of sense. Thank you!

                                                            2. 2

                                                              You don’t know how annoying it can be when a library uses something you don’t have in some corner you have backed yourself into. its easier to just be careful from the start.

                                                              I can imagine needing to use amazon IOT with client certificates on some bare metal platform for example.

                                                              1. 1

                                                                for when you need your refridgerator to make purchases with a credit card

                                                                self-stocking fridge

                                                            3. 1

                                                              He wrote this book: http://www.prometheusbrother.com/ Which is interesting. All SSL hackers have very varied interests.

                                                              1. 1

                                                                On top of embedded, they can be designed to use fewer resources if it’s a fixed-allocation scheme or easier to analyze. In terms of analysis, you might be able to do timing analysis for covert channel mitigation, same analysis for real-time situations, static analysis with tools like Astree Analyzer to show absence of critical errors, or whole app analysis showing it follow your policies in every successful or failed state due to determinism. Always go for static designs when you can. If you can’t, then mix the two so at least portions can be exhaustively shown to be correct with extra eyeballs on the dynamic stuff. Similar to mixing pure and impure components in languages like Ocaml.

                                                                1. 1

                                                                  Dumb CS question: if you restrict yourself to static allocation, are you Turing-complete?

                                                                  1. 2

                                                                    Individual programs are generally not Turing-complete (what would that even mean?) - the question only makes sense for language implementations. (Admittedly any program that accepts input is in some sense a language implementation, but for something that just decrypts SSL not really).

                                                                    A Turing-complete language implementation necessarily needs access to unbounded storage. In a sense such storage has to be “dynamic”, but it could e.g. use the C stack and potentially recurse to arbitrary depth (which I believe would qualify as “static allocation only” in the casual sense; it would be limited by stack size but you can set an arbitrarily large stack at runtime). Or use a fixed amount of memory and a file/database/etc. for storage.

                                                                    1. 2

                                                                      Unless you are statically allocating an infinite amount of memory, no

                                                                      1. 2

                                                                        …but this is true whether or not you restrict yourself to static allocation. There’s always an upper bound on memory on physical systems.

                                                                        1. 3

                                                                          What do physical systems have to do with this? The same principle applies to Turing machines, you can’t simulate an arbitrary TM using a predetermined (finite) subset of the tape.

                                                                      2. 1

                                                                        Stack machines with just one stack and no other storage are not turing complete. Turing completeness is not a requirement for certain algorithms, though and sometimes avoided, because non-turing completeness does allow for more interesting proofs about the program at hand (e.g. liveness).

                                                                        Turing-completeness is also interesting because it is not hard to reach, making some things accidentally turing complete, such as Magic - the gathering.

                                                                    1. 11

                                                                      I think the prgmr.com VPS hosting company fits the description.

                                                                      (Disclaimer: I’m a paying customer.)

                                                                      1. 2

                                                                        It absolutely does. I am a customer too, although I forgot to mention it in this thread.

                                                                      1. 1

                                                                        Hardly retro, but I run an Eee PC 701 from 2008 for various low-performance tasks (primarily as a IPv6 tunnel endpoint).

                                                                        Also the only 32-bit x86 machine in my flat.