1. 50
  1. 19

    I find it funny everyone saying NixOS is ‘doing weird things’. It is just specifying exact versions of dependencies, which seems sensible to me.

    I guess it could be done via some other means, like wrapper scripts.

    1. 1

      The “weird things” isn’t what it’s doing (specifying dependencies), it’s how it’s doing it (a mammoth shebang).

      1. 2

        I didn’t know shebangs had a 128 byte limit, that seems the stranger thing to me, but good to know.

    2. 16

      perl rereads the file

      Abandon all hope ye who enter here expecting predictable interpreter behavior

      1. 6

        why is this functionality in the kernel? 2 bytes are enough to know that the program start needs interpreter dispatch, that could have been done via a userspace service. (khm… systemd? :D) Those could parse arbitrary length shebangs, fork, exec, et voila!

        Other binary handlers can be registered, I remember when mono could be registered, and .net exe programs were dispatched to mono based on their binary headers I guess…

        1. 3

          Wikipedia seems to offer a brief explanation (by Dennis Ritchie) as to why it exists — mainly to make scripts be more seamless with the rest of the system, to remove inconsistencies.

          1. 4

            I know why it exists, yet I don’t know why is it in the kernel?

            It would be enough to simply detect the magic #! then dispatch to the shebang-invoker binary to the userspace. See: https://en.wikipedia.org/wiki/Binfmt_misc

            1. 2

              Ah, sorry, I misread. You make a good point, and by now it’s probably just “historical reasons”.

              1. 1

                No problem, actually i had no time to research the topic just posted my point from the phone while commiting, and your link lead to the explaination of the actual binfmt handler which I remembered and mentioned

        2. 5

          Just don’t try this on OpenBSD. Or probably any BSD.

          1. 5

            What shouldn’t be tried on a BSD? Long shebangs, or changing the kernel’s handling of long shebangs?

            I’m not sure on what restrictions (if any) the BSD kernels place on shebang line length.

            1. 3

              If the shebang is longer than 128, you get ENOEXEC. Like in the “fixed” linux kernel.

              1. 1

                That’s interesting. It shows that there is “form” for having this model in other Unixes, and presumably as Perl doesn’t break on these machines the change was deemed feasible.

                The real issue is how this change was backported to older kernels, which is the meat of the linked article. But that’s fine, I’ve learned a lot in this discussion!

          2. 5

            128 bytes is not very long; it’s even shorter than PATH_MAX, which has long had a minimum length of 256.

            1. 4

              shebangs suck and I’ve hit things like this before with python and very long usernames in home dirs. what doesn’t make sense though is perl not only detects truncated shebangs, but also does wordsplitting on the first argument it gets? everything after the interpreter is passed as one argument + implicit filename as the second argument. what a horrifying interpreter.

              1. 5

                I believe this behavior is to enable very long shebang paths, but this is just a guess. I will try to learn more about this and update accordingly.

                (Edit rewording, clarification).

                Update - long story short, many platforms have weird restrictions on how long a shebang line can be, and some (like Windows) don’t even know what it is! Perl has to run on all these platforms, of which Linux is only one.

                One can also enable “taint mode” as a command line argument in Perl, and in the shebang, so parsing of this line has to be done by Perl.

              2. 3

                Yes, maybe it never should have worked. And yes, it’s sad that people apparently had cases that depended on this odd behavior, but there we are.

                The wording here is a bit odd, in that it implies that NixOS worked because of the kernel’s quirky shebang handling, whereas in fact it worked despite the kernel’s quirky shebang handling.

                1. 2

                  This is a common and accepted use of “depend” in English. NixOS does depend on the kernel executing malformed shebang lines, rather than rejecting them. Just as one could depend on anything similarly dubious, like depending on their drug dealer to bail them out of jail.

                  1. 2

                    My reading of this entire issue is that all unixen (including Linux) have silently truncated the shebang line after a number of characters (32 is mentioned in the perlrun manual), and this is what Perl has been designed to work around. This has been the case since Perl 3, so late 1980s.

                    In my view, changing this behavior as radically as the kernel change did can hardly considered to be “correct”.

                    1. 1

                      Of course making a breaking change is incorrect, I never claimed otherwise. That has literally nothing whatsoever to do with the wording Linus chose. NixOS depends on that behavior. If it didn’t, it wouldn’t have broken.

                      There are other fixes that wouldn’t have broken NixOS. That’s irrelevant to the correct usage of the word. If I depend on lib1, and there also exists a lib2 that solves the same problem, does that mean I actually don’t depend on lib1? Of course not, that’s absurd.

                      The wording is 100% correct and normal.

                      1. 1

                        After reading your comment and reflecting a bit, I have come to the conclusion that I agree with you. Thanks for taking the time to expand.

                    2. 2

                      NixOS does depend on the kernel executing malformed shebang lines, rather than rejecting them

                      Well it really depends what the “correct” behaviour for handling shebangs longer than 128 bytes is. There are three obvious possibilities:

                      1. Truncate + execute (existing kernel behaviour).
                      2. Give up and execute a ‘default shell’ instead.
                      3. Execute the whole thing regardless of its length.

                      NixOS ‘depends’ on Linux doing either 1. or 3., so it doesn’t strictly speaking depend on “this odd behavior”, in that it would work just fine if they’d chosen 3. as their “fix”.

                      You wouldn’t claim that you ‘depend’ on your drug dealer to bail you out of jail if you also have a host of friends and family who are willing and able to bail you out.

                      1. 5
                        1. is just weird though, it’s “instead of probably silently doing the wrong thing, definitely silently do the wrong thing”.
                        1. 2

                          I think returning ENOEXEC if the file starts with #! and there is no newline within the first 128 bytes would be decent behavior. It would allow expansion of the length limit without breaking anything in the future.

                          That is, things that worked would continue to work that way, but some new things might start to work.

                          1. 1

                            Just because there are other behaviors NixOS could depend on doesn’t mean it doesn’t depend on this behavior. Your interpretation suggests that because solutions exist, there is no problem.

                            When a particular behavior changed, NixOS broke. Thus NixOS depended on that behavior in some way. This is common and accepted usage of depend, especially in technical contexts.

                      2. 1

                        I wonder if long shebangs have MAX_SIZE of 128 bytes, why those Nix scripts have more than that, doesn’t it always be truncated assuming linux-version < 5.0-rc1? What’s the point on having those big lines on them? if they will never be used to whatever purpose was intended for them in the first place.

                        1. 3

                          perl reads/parses the shebang itself again when executed and doesn’t use the command line passed by the kernel if it sees that its truncated.

                          https://lobste.rs/s/zmxyhk/case_supersized_shebang#c_dfkskv

                          1. 1

                            Thanks for answering. I assume such behaviour is only in place inside of the perl interpreter, is it different from other versions of perl (say version 1 for example)? are there other dynamic language interpreter that do it? It seems a little weird as a hack and thus non-standard and incorrect, but I may be the one incorrect, because I don’t know much else. By the way, thanks for your work on void.