1. 5

    This blog is full of inflammatory low-effort content, I am disappointed that things like this make its way to lobste.rs.

    1. 5

      This is disappointing. I did not know there were competing standards for URLs now.

      1. 15

        The WHATWG standard isn’t much of a standard - the document changes right from under you as it’s a “living standard”. AFAIK there are no stable URLs of particular versions. And there’s no BNF anymore, either, as they ripped it out. I complained about that before but they seem uninterested in fixing it, which also gives me little confidence that if someone added it as a contribution, it would be maintained.

        1. 3

          It only changes under you if you don’t pay attention ;-)

          1. 1

            I guess it’s good that I just always go with the IETF standard then. I wonder how the WHATWG has diverged though? What things can break if I send a valid IETF standard URL to a browser?

            1. 11

              One example is WHATWG URLs allow \ as a path seperator, and IETF URLs only allow /. This means that if you use an IETF URL parser to get the hostname of https://evil.com\@good.com/ it will parse as good.com, but with a browser’s url parser you get evil.com.

              1. 2

                I think the WHATWG standard is laxer than the IETF standard, as it allows for “auto-correcting” URLs with problems (search for “spaceAsPlus” in the text, for example - that allows for optionally converting spaces in the URL to plus signs, depending on a parser mode flag). I guess this means a browser will (should?) accept IETF URLs, but there will be URLs accepted by browsers which won’t be considered URLs by other parsers.

                But of course, without a proper BNF, who knows exactly what’s different? There might be some edge cases where an IETF URL would be rejected by a browser. This verbose step-wise description is less formal and more wordy, which makes it more difficult to figure out what grammar is being described.

              2. 1

                I am really disappointed in Firefox for having helped Google do this. URLs are the foundations of modern web, and if there is no stable definition (as you said, there is no formal grammar definition any more) then what exactly are we building on?

                These replies are especially sad. If one requires Turing complete steps, what can you guarantee about how a given URL is processed?

                1. 1

                  Despite those comments I would be extremely surprised if the url format was actually Turing complete. There’s an obvious preference to express the parsing algorithm using a Turing complete language, but that doesn’t mean that the url grammar is actually Turing complete

                  1. 1

                    Why are you disappointed? Mozilla and Google folk were trying to ensure that the standard matched reality.

                2. 12

                  The WHATWG wants to claim they own everything, but then ignore all the interesting cases since they only need things to work in Chrome…

                  1. 9

                    That’s incorrect. Changes in a whatwg spec require at least two implementations (e.g., one of WebKit, Blink, Gecko) to agree for any changes.

                    1. 8

                      That’s barely a difference, since it’s still just a list of large web browser engines. And in practise if Google wants something do you really think Mozilla and Apple could block? Chrome would implement, force the issue in the name of compatibility 🤷‍♂️

                      1. 7

                        And in practise if Google wants something do you really think Mozilla and Apple could block?

                        Considering there have already been multiple things that Google has implemented but Apple and Mozilla have refused to (citing security/privacy concerns), I think that it is in fact possible Google might implement something and Apple and Mozilla would refuse to.

                        1. 1

                          What other browsers should matter? There aren’t any that have sufficiently complete implementations to provide meaningful info regarding implementation issues, and there aren’t any others that have sufficient users to be relevant.

                          1. 1

                            For URLs/URIs especially mostly things that are not browsers or web related at all. cURL sure, but also XMPP clients, Telecom equipment, git, lots of stuff that relies on this syntax!

                            1. 0

                              They don’t rely on the url format that browsers have to support

                      2. 2

                        In fairness, it’s worth asking if that is because the IETF was not willing to consider the needs of the browser and/or moved to slow or just because they are bad actors?

                        1. 10

                          The IETF barely exists as an “entity”, it’s made up of the people who show up to do the work. I don’t think the WHATWG are “bad actors” but rather I think they’re only interested in documenting whatever Google was going to do anyway rather than improving anything.

                          Don’t get me wrong, having public documentation on what Google was going to do anyway is great! Way better than reverse engineering their nonstandard stuff. But what’s dangerous is when non-Google people treat it as a standard and implement it instead of the actual standard.

                          1. 6

                            I think this misrepresents and underplays both the historical purpose and current value of WHATWG. When half of the web is violating e.g. the html standard in one way or another, and it all just so happens to work because of browser bugs or misfeatures from IE6, it is very useful to have a spec that fully describes error recovery. It is not useful to have a standard with ebnf that describes an ideal world but doesn’t really tell you how to write a real parser that actually works on 99% of websites.

                            I’m not aware of any inherent reason those specs couldn’t live within the IETF body, but I know of and have experienced the editorial process of the IETF. It is absolutely not frictionless and I can imagine the writing style of WHATWG would not pass their editorial guidelines.

                            I think you should take a look in WHATWG’s public issue tracker before asserting that IETF is fundamentally a more open venue than the WHATWG. I feel like half of the assertions you make are coming from your public perception of what the Chromium team is doing, not from actual problems observed with the spec body.

                            1. 2

                              When it comes to something that is web specific, I agree. And in general, as I said “having public documentation on what Google was going to do anyway is great!”

                              But for things that are used far beyond the web and in all sorts of contexts, like URLs/URIs, having web browsers write their own incompatible “spec” for how these should work is… strange at best. And again, having something written to document what they do is still useful. But it’s not a standard, and it should not be implemented by anyone not trying to be compatible with browser behaviour for a specific reason.

                              1. 3

                                “whatever Google is going to do” is exactly the misrepresentation I am talking about. This is not how it started and this is not how it works today. The fork did not just happen due to ideological resentment of existing spec bodies and corporate greed, the fork happened in practice much sooner in browsers like IE6. And the earliest whatwg spec was written at Opera.

                                I can apprechiate your other points and the complaint in the article. The html spec in particular is written with a very particular audience (browser implementors) in mind.

                                1. 3

                                  It’s great that Opera started documenting the insanity that is browser URL parsing.

                                  So your perspective is, if Google wanted google!/blahblah as a valid URL, implemented it in Chrome and pushed it to production, WHATWG wouldn’t accept it?

                                  Note: I’m not suggesting Google would ever want such a thing, and I just randomly made up URL nonsense on purpose :)

                                  My guess is, and I imagine @singpolyma’s perspective also, WHATWG would accept the change, perhaps with some token amount of GRRR.. but it would happen.

                                  1. 3

                                    I’m not going to speculate as to what might happen in such a scenario. Can you point out a comparable scenario where Google did something like this and the standardization process was as you described? Otherwise we’re talking about projected fears, not reality.

                                    What I have seen is that Google does indeed push forward with feature impls, but the standardization process in WHATWG is not as lackluster you describe.

                                    1. 2

                                      It’s happened in other aspects of web browsing, but I’m not currently aware of it in URL parsing, specifically.

                                      1. 1

                                        I can recall a few cases of the opposite: Google went on to implement something, others refused, so it was not standard, and google rolled back its implementation.

                            2. 2

                              What needed to change in the URL spec that you think the IETF was so slow about?

                              1. 1

                                I don’t knows the history here, thus my question. I’ve always used ietf compliant parsers and was surprised there were two competing standards. I was merely musing on the potential reasons for another standard having been started.

                                1. 1

                                  Ah, it sounded like you were making a claim about the IETF, not asking a question. Got it.

                            3. 0

                              The WHATWG predates chrome by many many years, and is part of the reason chrome was possible. Having an actual accurate specification meant webkit was able to resolve a number of compatibility problems, so when chrome came out it was compatible with the majority of the web.

                            4. 5

                              Besides the deficiencies of WHATWG, you can always have bugs in any one parser, which is another reason not to mix parsers.

                              1. 1

                                The core problem is that the IETF specification does not reflect the reality of what browsers have to handle. It doesn’t matter how much you might say “this is an invalid url”: If it works in some browsers but not others, users will point to those browsers as evidence that your browser is wrong.

                                The same happened with the W3C. Instead of attempting to define a standard that actually matched reality, they went off into the xhtml weeds, and never tried to address the issues that actually mattered: the actual content browsers had to support. That is the primary reason the WHATWG ended up being responsible for the actual html standard.

                                It does not matter how much you may want to be the sole definition, if that spec fails to match actual content it is irrelevant.

                                It’s why javascript supports <!– comments, despite that being insane

                                1. 1

                                  Javascript supports <!-- comments for backwards-compatibility with browsers that didn’t implement <script>. You use it like this:

                                  <script>
                                  <!-- This starts a multiline HTML and is a single line JS comment
                                  alert("this is JS, still in the HTML comment tho")
                                  // time to end the HTML comment! -->
                                  </script>
                                  
                                  1. 2

                                    Oh I know. I worked on the portions of the spec that standardized it :D

                                    I was never entirely aware of the why of it existing, it honestly seemed to line up more (at the period I was dealing with it) with “xhtml” that was being served as html. Sidenote: more or less every single “xhtml” document I ever encountered was not treated as xhtml due to various response errors, and more or less none of them would actually validate as xml :D

                              1. -1

                                The complicators won.

                                We just can’t have simple binary formats.

                                1. 19

                                  I think this is a nice bit of history, and describes a very good reason for folks moving away from a.out format executables:

                                  When introducing shared libraries certain design decisions had to be made to work in the limitations of a.out. The main accepted limitation was that no reloca- tions are performed at the time of loading and afterward. The shared libraries have to exist in the form they are used at run-time on disk. This imposes a major restric- tion on the way shared libraries are built and used: every shared library must have a fixed load address; otherwise it would not be possible to generate shared libraries which do not have to be relocated.

                                  —Ulrich Drepper, How To Write Shared Libraries

                                  So someone would have to manage a registry of library load addresses, and ensure they didn’t try to load into the same bit of address space.

                                  In this example, it largely comes down to whether you prefer your complications automated or artisanally hand managed.

                                  1. 2

                                    Relevant if you want standardized shared libraries.

                                  2. 16

                                    I am not knowledgeable enough to contribute anything meaningful on the technical side of this discussion, but I think we should really avoid this type of discussion style. I’ve seen it over and over that people who lament the complexity of new systems and romanticize the simplicity of old ones simply don’t recognize the usecases of others, or are actively dismissive towards them. Accusing them of introducing complexity for its own sake is even worse than that.

                                    1. 8

                                      Isn’t it simpler to support the one binary format than two? Even if that one format is more complex.

                                      1. 1

                                        Sure, you can argue that supporting more than one format (ELF) is itself unnecessary complexity, but that’s reduction to absurd, once we’re considering the complexity forced by ELF itself.

                                      2. 3

                                        What’s a good reason to use a.out these days? I mean, we need the elf for things that use metadata, extra sections, fat binaries, etc. so if it’s already available, why would I want to use a.out?

                                        1. 3

                                          Simplicity. You can do everything else manually. Same sort of appeal as COM executables on Freedos.

                                          1. 7

                                            Sure, we can do a lot of things in a different way, but why simplicity? There are existing interfaces to the format so you mostly don’t need to think about that and when you do, it’s not that hard (I’ve parsed ELF for specific info and it was ok). I’m trying to understand what would we gain from simplicity apart from an esthetically purer result that’s functionally weaker.

                                            1. 2

                                              Simplicity is valuable itself.

                                              1. 12

                                                No it isn’t. Simplicity is a means to an end. That end is software that can be understood, used and modified easily. Don’t lose sight of the forest for the trees.

                                                1. 2

                                                  No, it really is valuable itself.

                                                  Complexity is inherently bad. Justification is needed for any complexity added: Any complexity needs to be weighted against the value of simplicity (which is even higher than what you already seem to be aware of).

                                                  It is indeed a high burden, but that’s reality.

                                                  1. 3

                                                    it’s simpler having 1 binary format instead of 5 though?

                                                    1. 1

                                                      Addressed here.

                                                2. 9

                                                  Simplicity at one layer in the stack often drives complexity at another level. ELF was driven by the ‘simple formats, complex tools’ model and the complexity of an ELF run-time linker is huge but at least it’s standardised compared to a.out, where any kind of dynamic code loading or shared library usage had to be bespoke for the application(s) concerned.

                                                  1. 1

                                                    had to be bespoke for the application(s) concerned.

                                                    I prefer to think about it as “able to do bespoke”. I am not arguing for the removal of ELF.

                                                    1. 3

                                                      You can also do bespoke things with ELF (I have) and the container format makes it a lot easier by providing a richer mechanism for expressing metadata to drive your own mechanism.

                                            2. 3

                                              I mean, we need the elf for things that use metadata, extra sections, fat binaries, etc.

                                              Why do we need these?

                                              1. 2

                                                Fat binaries are convenient for distribution. Metadata or extra sections for things like binary signing, embedding icons, annotating source package, tagging versions/name understood by cve scanners, lots of other uses…

                                                1. 4

                                                  Elf doesn’t support fat binaries. The rest can be done with symbols instead of sections.

                                                  It seems that in practice, few enough people are aware of the intricacies of elf, and do it with symbols regardless of what elf permits.

                                                  The big advantage of elf is dynamic linking, and the consensus around that is changing.

                                                  1. 1

                                                    That part is aspirational, correct. Unfortunately FatELF was not merged, but if we ever want to revisit it, we need something with dynamic sections, like ELF.

                                                    1. 1

                                                      I like AmigaOS’s HUNK executables, as they do not force complexity upon implementations.

                                                      HUNK is structured as a linked list. It is possible to just ignore these that aren’t essential for what you’re trying to do (e.g. running the program), thus as long as essentials happen early in the list, implementations aren’t burdened no matter how many non-essential sections are added.

                                                      With ELF, there’s a lot of structs to navigate through, no matter what you’re trying to do. The barrier of entry is much higher, and I argue it isn’t justified, particularly when HUNK does not need anywhere near as much work.

                                                      1. 2

                                                        Why do you think it’s easier to iterate hunks than iterate over the section list? It feels almost the same to me (you can skip unknown entries in either case)

                                                        1. 1

                                                          Because there’s only one structure to care about, a quite simple one which forms the linked list.

                                                          Then you can care/not care about a node. If you care, you implement that.

                                                          1. 3

                                                            Ok, but that’s basically hunk iteration:

                                                            for(cur=...;cur<...;cur+=cur->len)
                                                              if (cur->id is interesting)
                                                                do_something(cur->data, cur->len);
                                                            

                                                            Elf exec iteration (either sections or exec table)

                                                            for(table=elf->table,i=0;i<elf->table_size;i++)
                                                              if (table[i].id is interesting)
                                                                do_something(table[i].offset, table[i].len);
                                                            

                                                            Ignoring the necessary casting and precise offset calculation… The difference seems trivial.

                                                            1. 1

                                                              Except you need a bunch of structs for even minimal access, rather than just one.

                                                              I admit bias; I got traumatized a couple decades ago, when I wrote a Linux virus in x86 asm which had to touch ELFs back in high school.

                                              2. 2

                                                Surely you don’t need ELF for any of these?

                                                1. 1

                                                  In theory, no, you could create a different one. In practice ELF and PE are the only supported options. Or did you mean something else?

                                                  1. 3

                                                    In practice ELF and PE are the only supported options.

                                                    And Mach-O.

                                                    1. 2

                                                      Yes, that’s what I meant. If you’re doing an OS, you are free to use and design what you want. That may also entail maintaining your own Binutils, but it’s still for you to decide.

                                              1. 7

                                                My primary usecase for adblocker is to make the site load faster, I don’t care about tracking. Ad fraud detection will probably catch this (therefore making tracking work again), and if not the extra clicks just drive ad network’s revenue. I don’t understand the point.

                                                1. 5

                                                  The ad sector is mostly a scam: a good chunk of clicks are made by bots or automated system of some kind. The assumption is that if you can make this ratio higher and higher eventually everybody will take note, stop buying targeted ads and bring down the monstrosity that is the ad-driven internet and Facebook and Google with it.

                                                1. 3

                                                  Does anybody know an efficient or even production-ready implementation of this?

                                                  1. 3

                                                    I have used Rinda, the Ruby implementation of Linda, once at work to avoid depending on an external message broker.

                                                    Basically, there was one server process that collected all messages sent by the computers on the network and a web service implemented in Rails that queried the tuple space. There was also the possibility to post messages to the tuple space that were essentially commands for certain computers to run.

                                                  1. 3

                                                    there’s no machine specs in this benchmark – on its homepage swc essentially claims to utilize cores better than babel. i’d assume babel is worse at parallelizing than either esbuild or swc, but I’d be interested in seeing whether swc or esbuild makes better use of multiple cores.

                                                    esbuild’s own benchmarks show a much larger gap between esbuild and parcel (swc). I can’t tell at a glance whether this is because of different benchmark inputs and different machine or because parcel adds significant overhead

                                                    EDIT: functional comparison (bundle size) would probably make sense to add too. Considering that NextJS is opting for using SWC instead of esbuild even though there’s this much of a performance gap.

                                                    1. 1

                                                      there’s no machine specs in this benchmark

                                                      Good point, thanks! Just pushed it with this detail: It’s a dedicated instance on OVH, OVH Rise-1.

                                                      • RAM: 64 GB DDR4 ECC 2,133 MHz
                                                      • Disk: 2x450 GB SSD NVMe in Soft RAID
                                                      • Processor: Intel Xeon E3-1230v6 - 4c/8t - 3.5 GHz/3.9 GHz
                                                      1. 1

                                                        even though there’s this much of a performance gap

                                                        What do you mean by this? esbuild and swc were basically identical in performance (especially in human timescales) and both much better than typescript and babel.

                                                        1. 2

                                                          right, in your benchmark, but not in the one provided by esbuild. should’ve added “potential”

                                                          1. 1

                                                            Oh I see, thanks for clarifying.

                                                    1. 1

                                                      I find very surprising that esbuild, written in Go, is faster than swc, written in Rust. This is one of the main arguments behind the “rewrite everything in Rust” wave we are having.

                                                      1. 4

                                                        It is possible for an O(n) implementation in Python to outperform an O(n^2) implementation in C. To me, a go program being faster than a rust program at a similar task is not surprising. The program with better data oriented design will come out ahead. I view the RIIR brigade as mostly about safety, anyways.

                                                        Edit: here’s the esbuild FAQ answer for “why is it fast?”, https://esbuild.github.io/faq/#why-is-esbuild-fast

                                                        1. 1

                                                          it seems to me that esbuild is less configurable, but it’s important to keep in mind that the RIIR argument holds insofar as that swc is comparing itself to babel, not esbuild just yet

                                                        1. 2

                                                          Neat, I wanted to build a static site generator that works this way! Run a bunch of cgi scripts and dump the output somewhere

                                                          1. 1

                                                            Might consider taking a look at https://mkws.sh.

                                                          1. 1

                                                            I’m not up to speed on writing gRPC plugins, but it seems that putting a client TLS cert in an environment variable is a gaping security hole. Which I think the OP is alluding to by the way they couched the statement. I’d be curious to know if this is a common practice, and whethet or not it actually presents a security risk being that I imagine a fair amount of secrets traverse plugins.

                                                            1. 4

                                                              A cert should be fine. It only has a public key in it, not the private key. Someone else being able to observe that public key by seeing the env vars in the output of ps is pretty much harmless.

                                                              1. 2

                                                                Given that this is all intended to run on the same machine, or even container, I’m kinda surprised they even bothered with encryption? Like, what would be an actual scenario where this could be exploited?

                                                                1. 3

                                                                  Perhaps it was easier to make gRPC do mTLS than to make gRPC run over an AF_UNOX socket or a socketpair()?

                                                                  1. 2

                                                                    You can actually make it run over a unix socket by sending “unix” not “tcp” and a path to a socket and the code seems to indicate there’s still the certificates. Not sure offhand how mTLS works in that case, because I haven’t tried that path yet.

                                                                    1. 1

                                                                      Stupid question, can’t the client turn mTLS off? It looks like there’s a field in a config struct for it.

                                                                  2. 1

                                                                    What if you’re starting a small netcat program that points to another host? I’m not sure why I would do that, but possibly running in a virtual machine with a different architecture or os.

                                                                    1. 1

                                                                      I just don’t remember ever seeing any mentions of running providers non-locally. Like, ever.

                                                                1. 8

                                                                  I would recommend Nix over Conda for new projects. The NixOS community has a comparison between Nix and Conda. While there may be many ways in which Nix is slightly painful compared to Conda, the terms of service alone seem like a motivating reason to avoid Conda.

                                                                  1. 1

                                                                    What problems (particularly ones stated in the article) does Nix solve here?

                                                                    1. 3

                                                                      From the article’s perspective, Nix is like Conda but more reproducible. The running example from the article becomes a single-line expression with nixpkgs:

                                                                      python39.withPackages (ps: [ ps.numpy ps.pandas ps.pillow ])
                                                                      

                                                                      To make a third column in the first table, adding Nix alongside pip and Conda, Nix does everything that Conda can do. The third table is similarly simple; features like reproducibility and virtual environments are baked into Nix’s design, and nixpkgs has a security team. To make a third column in the second table, comparing nixpkgs to PyPI and conda-forge:

                                                                      • package names are maintained by consensus and anybody can contribute
                                                                      • build infrastructure is centralized but optional since Nix can build packages locally
                                                                      • any open-source or Free Software Python packages can be installed as long as source code is available
                                                                      • more tools are available than with any traditional Linux distribution (visual comparison)
                                                                      • Linux is first-class, Darwin is well-supported but has fewer packages, and Windows is unsupported

                                                                      Finally, but non-trivially, the production of Docker-compatible container images is reproducible when using nixpkgs’ tools to assemble the image. This provides a reproducible alternative to Docker’s builder. In contrast, the article gives two non-reproducible Dockerfiles which depend on the state of global package repositories.

                                                                      On serious examination, the only reason to recommend Conda might be for Windows users, but I usually recommend that they change their entire software stack at that point.

                                                                      1. 2

                                                                        Yes, Nix can do everything Conda can, the problem I see is that you may want to install precise versions of your dependencies and instead if you stick to a certain Nixpkgs release or commit, you have to use the version in the distro. I don’t know if that affects Conda as well. That’s the reason why for most my Python project I use few distro packages and instead i rely on mach-nix which does a very good work together with a private Nix binary cache in order to not have to rebuild the packages every here and then.

                                                                    2. 1

                                                                      Can nix do non-root install now (but still use a binary cache)? That was the one thing keeping me from using nix for this last time I checked (which was a while ago).

                                                                      1. 3

                                                                        It needs root just to create /nix directory during installation (in single-user mode, in multi-user mode it will need root also to create some additional users to run the nix build service as) and yes, it does support binary caches in such setup

                                                                        1. 1

                                                                          Right, thanks. So that’s pretty much what I remember. Without a non-root install option (that also supports binary caches) it’s not a good fit for what I need to do (I can’t assume that I’ll have root access where I need to run my software, or that anyone would install it for me). Conda ticks those boxes, but I’d prefer the stronger guarantees that come with nix. Also the tooling.

                                                                          1. 3

                                                                            If the problem is one of distribution alone, then nix-bundle is a great prototyping tool, and nixpkgs supports static linking for many languages. As long as you’re not trying to share development environments too, Nix would still work for that situation.

                                                                            For the specific case of living in homedirs on somebody else’s hardware as a permanent tenant, I personally think that specialized toolchains should be developed which use “living off the land” techniques. These techniques are typically used by malware, but they could be used for good, too.

                                                                            1. 1

                                                                              living off the land

                                                                              TIL. Thanks for that, very interesting! I guess having scripts setup conda environments (or similar) in a semi-transparent way is not too far off that concept, which is what I’m doing at the moment. miniconda (or a custom installer created via constructor take quite a bit of pain out of that work, because they’ve already taken care of wrapping the install process into scripts, and they also take care of a lot of the inconsistencies between target platforms, and so on. As I said earlier, conceptually I’d prefer having something more rigorous like nix, but there is only so much time I can afford to spend on making deployment as simple as possible for my users…

                                                                              Also, thanks for mentioning nix-bundle, I think I stumbled upon it when reading about the packaging of Nyxt (or some other tool), but forgot about it again. I could probably make it work somehow, but the advantages compared to Conda would probably not be worth it in my scenario (modular, extendable data-science tool for non- (or not-so)-technical users, basically). And then I’d still have to worry about Darwin and Windows.

                                                                              If I ever find the time I would really love to look into it even if it’s just for Linux. Being able to create nix-based containers and AppImages and all that easily would be something very nice to have.

                                                                    1. 8

                                                                      I see an implicit, glossed-over assumption that autocomplete is only powered by user input. What if autocomplete is also powered by product descriptions?

                                                                      1. 2

                                                                        I tried a whole bunch of prefixes to generate URLs and they all appear to be ones that appear in Amazon product descriptions.

                                                                      1. 5

                                                                        I usually type cd . instead of cd "$(pwd)"

                                                                        1. 3

                                                                          That doesn’t work inside a deleted directory, since the . and .. directory entries have been removed.

                                                                          1. 3

                                                                            How would “cd $(pwd)” work inside a deleted directory?

                                                                            [jeeger@dumper /tmp] $ mkdir -p /tmp/test/test
                                                                            [jeeger@dumper /tmp] $ cd /tmp/test/test
                                                                            [jeeger@dumper /tmp/test/test] $ rm -rf /tmp/test
                                                                            [jeeger@dumper /tmp/test/test] $ mkdir -p /tmp/test/test
                                                                            [jeeger@dumper /tmp/test/test] $ cd .
                                                                            [jeeger@dumper /tmp/test/test] $ ls
                                                                            [jeeger@dumper /tmp/test/test] $
                                                                            
                                                                            1. 5

                                                                              That’s a good point. I ran some tests of my own:

                                                                              • start a shell, change to /tmp/test/test
                                                                              • in another terminal, remove and recreate that directory
                                                                              • in the shell in question, run /bin/pwd to verify the problem is detected
                                                                              • run cd . or cd $(pwd)
                                                                              • run /bin/pwd again to verify the problem is fixed

                                                                              It turns out cd . and cd $(pwd) work equally well in posh, mksh, dash, zsh and bash, in the default configuration. However, for historical reasons I have set -P in my .bashrc file, which (as an unintended side-effect) turns off the magic that allows cd . to work.

                                                                              1. 1

                                                                                Very interesting, it’s only after bash resolves /tmp/test/test/. to /tmp/test/test that they become equivalent.

                                                                              2. 1

                                                                                I’m not sure what you’re demonstrating but please be advised that ls does not error in deleted cwds anyway, it simply exits

                                                                          1. 23

                                                                            The lobste.rs title is truncated from the actual title of

                                                                            Relative paths are faster than absolute paths on Linux (and other jokes I tell myself)

                                                                            Which has a different meaning than

                                                                            Relative paths are faster than absolute paths on Linux

                                                                            But also, both titles are unrelated to the contents of the blog post, which is about the semantics of openat flavor of syscalls and not in any way about performance.

                                                                            1. 4

                                                                              I had hoped that the last few paragraphs would explain the title, because performance was the reason I’ve been looking into this.

                                                                              1. 4

                                                                                It also has very little to do with rust. The examples could be written in any language with the same result

                                                                              1. 19

                                                                                Huh, interesting article :)

                                                                                Not to take away from the interesting content, but the headline isn’t a surprise, like many (most?) typed languages

                                                                                • Rust’s type system is turing complete

                                                                                • You must typecheck rust to compile it (otherwise you can’t do things like #[no_mangle]pub static X: usize = mem::size_of::<T>())

                                                                                • Therefore compiling rust isn’t even computable.

                                                                                • Therefore rust is trivially NP-hard, and exptime-hard, and expspace-hard, and …

                                                                                1. 3

                                                                                  I’d argue that the type system, as actually implemented, is barely not Turing-complete, as you have to set a fixed recursion_limit in advance.

                                                                                  1. 21

                                                                                    That’s still “morally” Turing compete. You still can’t determine whether or not a program will halt natural or will only halt by hitting the recursion limit. From there you get Rice’s theorem and all that other fun nonsense.

                                                                                    1. 3

                                                                                      Rust is a Turing completionist language.

                                                                                    2. 3

                                                                                      Fair point.

                                                                                      I’d have to check, but I think that with this stabilized it should be possible to make a procedural macro that expands to a string of usize::max() and then set #![recursion_limit = usize_max_macro!()].

                                                                                      At that point I’d argue we’re as turing complete as a full rust program is. On any given system there is a recursion limit equal to 2^(machine pointer size) but that’s not hugely different from the fact that on any given system there is a memory limit of 2^(machine pointer size).

                                                                                      Incidentally with const generics there also seem to be plans to give users the ability to disable the current instruction limit that prevents using loops to achieve turing completeness, but that’s not stable yet (not sure if it’s there in nightly).

                                                                                      1. 3

                                                                                        Any real Turing machine will have a finite tape length too though. I think this has been recently discussed here.

                                                                                        1. 3

                                                                                          Well, there are languages where the spec imposes no limit on memory usage. For example, you can write code to generate an infinitely long linked list in JavaScript, and, to my knowledge, there is nothing in the spec which requires that you would ever run out of memory. Compare that to C, where all object references are on the form of a fixed-size pointer. The spec doesn’t specify what the size of a pointer should be, but it does require that a pointer is some finite number of bits long, so you fundamentally can’t express an infinitely long linked list in C.

                                                                                          Obviously an implementation of JavaScript wouldn’t actually be able to store an infinitely long linked list. But the difference is in whether the language even lets you express it or not. A machine with an infinite amount of memory could store an arbitrarily long linked list in javascript, but not in C.

                                                                                        2. 2

                                                                                          Well, by that same logic, C isn’t Turing complete because you have to set a fixed pointer size in advance which sets a hard limit on the amount of memory you can have allocated at once.

                                                                                          And that’s true, but we usually consider C “fairly Turing complete” even though it technically isn’t.

                                                                                          1. 3

                                                                                            C can do I/O. This effectively mirrors the infinite tape: you can always read and write data to some external infinite storage medium. I don’t believe the Rust compiler can (Rust has a sane module system, rather than a small extension to cat). The C preprocessor can perform input (and can include files whose names are specified by macros from other files, thought there’s normally a fixed recursion limit) but it can’t write some output to a file and then read it back with a subsequent #include, so the total state is always bounded.

                                                                                            In C++, parsing is not distinct from type checking. It is in Rust, I believe, because Rust has generics that must type check for any possible instantiation, whereas C++ has templates that may or may not type check for any given instantiation and so producing the parse tree requires some type checking. This means that even parsing C++ is undecidable.

                                                                                            1. 2

                                                                                              Since the title is “compiling rust”, beyond the type checker there are mechanisms in Rust to execute arbitrary Rust code and read random files at compile time

                                                                                              1. 1

                                                                                                C can do I/O. This effectively mirrors the infinite tape: you can always read and write data to some external infinite storage medium.

                                                                                                Hmm – is the amount of storage addressable via C’s I/O capabilities truly infinite? I’m no theoretician, but it’s not immediately obvious to me that it is…within a file your random-access ability is limited by the size of loff_t or whatever, though you can of course extend that by spanning multiple files and adding more “address bits” in the names of your files. But even setting aside limitations like PATH_MAX and such, it seems like whatever storage you’re accessing via I/O operations (even if you expand beyond the filesystem and start using network URLs or something), wouldn’t your “addresses” still need to fit in memory and hence still be finite? (Even if it grows you from 2^64 to 2^(2^64), say.) I suppose maybe you could cook up some scheme for storing secondary addresses in primary storage, tertiary addresses in secondary storage, etc…but then it seems like you’d need to have magic I/O primitives that could innately handle the N-deep indirection. But even given an I/O operation that takes that indirection-count as a parameter, wouldn’t the representation of that number still need to be finite? [End ramble – I’m probably out of my depth here, but it’s interesting to think about…]

                                                                                                1. 1

                                                                                                  It depends on the hardware interface, and (unlike david) I take objection to calling any access to storage ability part of C (C the language doesn’t guarantee access to any hardware), but for the right interface it’s truly infinite.

                                                                                                  You just need a hardware interface where you never have to send an absolute offset, that hardware is the tape, and the rest of the machine is your finite state/head. E.g. if you’re given access to storage device where you just say “give the the previous byte” or “give me the next byte” (the classic turing machine move left/right ability). More realistically you might make a cloud API with that sort of interface for your tape.

                                                                                                  1. 1

                                                                                                    For random access, you are limited to a fixed displacement, but C FILEs are streams: they can model an infinite tape (within the real universe, you obviously can’t actually implement one, but C can read from an arbitrarily long one and advance the read pointer one step in either direction for every read).

                                                                                                2. 1

                                                                                                  Technically you don’t set a fixed pointer size, the compiler chooses one.

                                                                                                  As long as you never do anything that assumes a maximum pointer size your program should be able to use as much memory as is made available to it by the compiler (by picking pointer size) + hardware (by actually giving it memory).

                                                                                                  I think that’s a step up from the user defined limit being discussed. With a constant user defined limit the program is limited in it’s number of steps regardless of what hardware you run it on. Which is why I suggested the “proc macro usize::max()” workaround in my own reply (which mimics the C behavior in that it then scales the complexity of the problems it can solve by the complexity of the hardware).

                                                                                            1. 10

                                                                                              IIRC the tracking was being done as a part of understanding how people use Audacity.

                                                                                              What’s always interesting to me is the lack of information actually being propagated in this kind of setup. If you only see which buttons someone clicks and at what times, how are you going to figure out what problem they are actually trying to solve? And how are you going to improve the UX if you don’t clearly understand the problem?

                                                                                              I felt there was some similar sentiment towards the Firefox redesign which moved features because they weren’t being clicked on as much.

                                                                                              1. 5

                                                                                                I would be much more interested to see the reasons why people aren’t using certain features. With widely used features, there’s usually enough natural feedback: you can learn about their deficiencies by looking at beginner questions, reading users’ rants and so on.

                                                                                                An unused features is a real puzzle. Is it broken in a way that doesn’t affect me but affects a majority of users? Is its UX so poor that people would rather use a workaround? Is there simply no need for that feature among the user community?

                                                                                                1. 2

                                                                                                  That’s still hard to get from usage analytics, because an unused feature could also be the result of a lack of discoverability.

                                                                                                  1. 2

                                                                                                    I think I didn’t communicate my point clearly. That’s exactly what I mean: the hardest questions are also the questions that can’t be answered using analytics!

                                                                                                    One way to answer them is in-depth user surveys.

                                                                                                    1. 1

                                                                                                      I hear you!

                                                                                                2. 8

                                                                                                  Amusing that the ostensibly benign reasons for data collection (which you should never believe) lead to concrete harms like disruptive UI redesigns, which are more acutely felt than loss of privacy.

                                                                                                  1. 2

                                                                                                    You can mostly figure out how to reproduce crashes that way

                                                                                                  1. 7

                                                                                                    Tiny nit: Sentry is not open source. They use the Business Source License.

                                                                                                    1. 3

                                                                                                      Huh. I remember running it internally for a while, I wonder if they changed their license or if we were just abusing the license.. We never even got out of beta internally as it kept breaking and upstream seemed to have zero desire to make it stable(for us at least), even if we sent patches.

                                                                                                      1. 6

                                                                                                        idk when you had this experience but nowadays we (I am an employee of Sentry) have a dedicated team for open-source work that makes sure the issue tracker gets triaged and external PRs don’t fall under the table. We also have a docker-compose setup since the complexity of the service increased over time.

                                                                                                        We do still get a large amount of bug reports that we have a hard time remote-diagnosing, particularly around Kafka/Zookeeper and networking.

                                                                                                        Yup, we changed the license. Unless you were trying to build a direct competitor to Sentry you were probably not abusing either license, but IANAL.

                                                                                                        1. 2

                                                                                                          This was all just for internal, and it was many years ago(5-ish or maybe more, I’m not sure). I’m glad they/you seem to be doing better around open source stuff! We just wrote our own, very simple system that basically amounts to an issue in our issue tracker with a stack trace attached.

                                                                                                          When we were running it, there was no Kafka or Zookeeper, so before those dependencies came in. As I remember, it was strictly python, with maybe a redis or SQL dependency and that was basically it. It sounds drastically more complicated now.

                                                                                                          I make no claims that it won’t work for someone else, just that it was(at the time) terrible for us, stability wise. People should evaluate it for themselves and if it will work for their use-case.

                                                                                                      2. 2

                                                                                                        Thank you very much for pointing this out. I updated the article adding this!

                                                                                                      1. 1

                                                                                                        It’s a great read for me, but I don’t quite understand the following part:

                                                                                                        Since each closure has its own type, there’s no compulsory need for heap allocation when using closures: as demonstrated above, the captures can just be placed directly into the struct value.

                                                                                                        How does each closure having a distinct type eliminate the need for heap allocation? While i32 and Vec have different types, they both normally live on the stack (unless you explicitly use something like Box).

                                                                                                        1. 3

                                                                                                          He’s saying that since each has its own distinct type, there’s no inherent need to abstract them behind a trait object. They can be stored directly on the stack instead of being behind a trait object pointer (Box<dyn Trait> or &dyn Trait)

                                                                                                          1. 1

                                                                                                            Suppose |x| x + 1 and |x| x + 2 have the same type, then I can do something like vec![|x| x + 1, |x| x + 2], which seems to have the same effect of using trait objects?

                                                                                                            1. 2

                                                                                                              Those two closures do not have the same type. Here’s a Rust Playground example showing this. Each closure is compiled into its own struct which implements the correct traits (from Fn, FnMut, and FnOnce) and which (if they don’t capture from the environment) may be coerced to a function pointer. Their types, even if the closures are identical, are not the same.

                                                                                                              1. 1

                                                                                                                Exactly, and that’s why I said suppose they have the same type :P

                                                                                                                You pointed out that each closure having a unique type lifts the restriction that closures must be put behind a trait object, and I was saying there is no such restriction even if all closures with the same signature share one common type, because in that case we can do stuff like vec![|x| x+1, |x| x+2] which normally calls for trait objects.

                                                                                                                1. 2

                                                                                                                  From a type system pov you’d have two different implementations of Fn/FnMut for the same type, since you want different function bodies each. That would be kind of weird. If you then put two instances into the same vec I’m not entirely sure how Rust would find the correct trait impl without adding extra markers on the struct. Which smells like dynamic dispatch already.

                                                                                                                  1. 1

                                                                                                                    I suppose the compiler can look at the closures and figure out the least restrictive trait for each closure (can I make this Fn, No? then what about FnMut?Also no? Well I guess it’s a FnOnce then) and find the greatest common divisor of all closures in the Vec.

                                                                                                                    1. 2

                                                                                                                      That’s not the issue. Even if there was only one trait for all, you would have overlapping implementations of the trait. The traits have a call method that will need to be implemented differently, one for x+1 and one for x+2.

                                                                                                                      1. 2

                                                                                                                        Oh, I see. In the case where some closures share a common type, instances of that type may include an additional field holding the pointer to its function body, which is essentially dynamic dispatch. That doesn’t call for heap allocation though, because the function pointers typically refer to .TEXT instead of the heap.

                                                                                                                        EDIT: I realized that if you put the instruction for a closure on the stack, then it would not have 'static lifetime and thus cannot be returned from a function. See more discussion here.

                                                                                                        1. 2

                                                                                                          Very cool. Some questions: You’re mentioning that the lambda instances are reused between instances. Was the 90s measurement with cold- or warm start? How much overhead (in time) do you have for preparing the lambdas and checking caches (i’d expect this to matter for partial rebuilds?)

                                                                                                          Somewhat unrelated, I wish CI would run testsuites this way: Deep integration into test frameworks such that you can run one test per lambda function invocation if you’re really in a hurry.

                                                                                                          1. 2

                                                                                                            90s is a cold start; I re-create the Lambda function before benchmarks to make AWS flush any running instances. I could run a second build with everything warmed up and see how different it is – that might be an interesting comparison. I’d expect it to be substantial – uploading files from the client, especially, is a significant part of the latency.

                                                                                                          1. 1

                                                                                                            Company: Sentry (sentry.io)

                                                                                                            Company site: https://sentry.io

                                                                                                            Position(s): Software Engineer (jr/sr/staff) on various teams, SRE, Engineering Manager

                                                                                                            Location: San Francisco, Vienna, Toronto (onsite for all)

                                                                                                            Description: Sentry is a company that sells application monitoring. We have a lot of openings to describe them all here, check https://sentry.io/careers/#openings

                                                                                                            Tech stack: Python, JS, Rust

                                                                                                            Contact: Please go through the website