1. 14

    This is a pretty common pitfall with hash tables, which has resulted in vulnerabilities in (at least) php and python in the past, just off the top of my head.

    The general pattern was only identified in 2011 though, so the hash table library involved here pre-dates this being a known common pitfall. and fixing it in Haskell is a bit thorny; the standard approach is to use SipHash instead of a non-cryptographic hash function like FNV. But SipHash is a MAC not just a hash function; it requires a secret key, normally chosen randomly. This doesn’t really work for purely functional languages Haskell, at least without changing APIs, since that would break referential transparency. You would need to provide an rng to a function for creating a map.

    This is why I tend to use Map from the containers package (which is a balanced tree) rather than HashMap, unless I know the inputs were trusted and I actually need a performance boost that HashMap can provide.

    1. 3

      You could probably fudge something with unsafePerformIO and {-# NOINLINE #-} inside unordered-containers, but it seems fraught, especially if you want to share structure between tables.

      I tend to default to Map as well, because unioning singleton HashMaps is O(n^2), but there’s an optimisation in Map that makes it fast.

      1. 2

        It might be enough to pick a single global key on process startup, using unsafe constructs to make this happen – But you then also have to make sure that nothing in the API makes the difference observable between runs. That probably means sorting the output of toList, to start with, which introduces an asymptotic performance hit.

    1. 3

      Does this bug class have a common name yet? That’s usually the first step towards getting a Wikipedia article and educating language designers.

      1. 7

        The article on SipHash calls it “hash flooding,” though I don’t know if that’s widespread: https://en.wikipedia.org/wiki/SipHash

        1. 5

          I have heard it called hash flooding. Denial of Service via Algorithmic Complexity Attacks introduced these attacks.

        1. 11

          Consider two recent events: Microsoft killing Windows 7, and the Python community killing Python 2. The story is essentially the same in both cases: the creator of a piece of infrastructure software ends support for an old but still widely used version, forcing its users to move on to a later but not fully backwards compatible version

          This is just silly. When MS killed windows 7 there was nothing you could do about it. No one killed python 2. You can still get it, some distros still ship it, and if you really wanted to you could decide to keep maintaining it or hire people to do so. Some people you don’t pay and who owe you nothing stopped deciding to work on it themselves, but that’s not the same kind of killing as an MS EOL.

          1. 4

            Case in point, https://github.com/naftaliharris/tauthon appears to be still maintained.

          1. 17

            Very insightful. I do find type level programming in Haskell (and, to a lesser extent, Rust) to be a confusing nightmare. Nonetheless, Rust could not exist without traits (i.e. without bounded polymorphism). The Sync and Send marker traits (combined with borrow checking) are the basis of thread safety.

            I think Zig takes an interesting approach here, with it’s compile-time programming (i.e. type level programming with the same syntax and semantics as normal programming), but it suffers from the same typing issues as C++ templates, i.e. types are only checked after use and monomorphization. Rust’s bounded polymorphism can and is type checked before monomorphization, so you know if there are type errors in general. In Zig (and C++), you only know if there are type errors with a particular type, and only after using a function (template) on that type.

            I think there’s room for an approach that’s more like Zig’s, but with sound polymorphic typing, using dependent type theory. Coq, Agda, and Idris include type classes (aka implicits, bounded polymorphism), but it doesn’t seem like type classes should be necessary in a dependently typed language. In particular, it doesn’t seem like they should provide any increase in expressiveness, though perhaps they reduce verbosity.

            1. 5

              Fwiw, even in Haskell you only really need one extension to obviate type classes in terms of “expressiveness,” namely RankNTypes. See https://www.haskellforall.com/2012/05/scrap-your-type-classes.html

              …though it doesn’t solve the verbosity issues. But I suspect that a language with better support for records might make this a pretty good solution (I have a side project where I am working on such a language).

              1. 2

                RankNTypes is my top pick for something to add to Haskell. however, for common cases type classes have the advantage of having decidable inference.

                1. 3

                  Note that in the context of replacing type classes, the usual decidability problem with inference doesn’t really come up, because either way the higher rank types only show up in type definitions. E.g.

                  class Functor f where
                      fmap :: (a -> b) -> f a -> f b


                  data Functor' f = Functor'
                      { fmap' :: forall a b. (a -> b) -> f a -> f b

                  In the latter case, the problems with inference don’t come up, because the higher-rank quantifier is “hidden” behind the data constructor, so normal HM type inference can look at a call to fmap' and correctly infer that its argument needs to be a Functor' f, which it can treat opaquely, not worrying about the quantifier.

                  You can often make typchecking advanced features like this easier by “cheating” and either hiding them behind a nominal type, or bundling them with a other features as a special case.

                  (N.B. I should say that for just Functor' you only need Rank2Types, which actually is decidable anyway – but I don’t think GHC actually decides it in practice, so it’s kindof a nitpick).

                  Of course this is talking about type inference, whereas type classes are really more about inferring the values, which, as I said, this doesn’t solve the verbosity issues.

              2. 5

                Type classes aren’t just about verbosity, global coherence is a very important aspect. Any decision on whether to use a type class vs. explicit dictionary passing needs to consider the implications of global coherence. I think Type Classes vs. the World is a must-watch in order to engage in productive discussion about type classes.

              1. 15

                Maybe not surprisingly, the GPL leans way too heavily onto the US interpretation of copyright law.

                1. 3

                  As I understand it, correcting this (to the extent possible) was one of the big considerations in GPLv3, though given the timeframe in this case it looks like they were using v2.

                  1. 1

                    I don’t think that’s the conclusion here. This seems like a decision on the way a case must be prosecuted in France. This issue - mostly - doesn’t arise in common law countries because requiring correct “forms of action” was seen as one of the major inconveniences of English law in the Middle Ages which was explicitly reformed. To the extent that there’s any scope for falling into this sort of problem, it is usual practice to plead all viable causes of action as alternates and introduce evidence to support all of them.

                  1. 7

                    I don’t follow the issue here. If glibc always versioned symbols and musl never versioned symbols, then every symbolic reference would be completely unambiguous. Alpine binaries would always use musl, and newly compiled binaries might use glibc, but their dependencies would use musl (so both in one process), but every reference is unambiguous.

                    This requires basic ABI cleanliness - things like a library cannot malloc() and expect its caller to free(), since those might be calls to different C libraries. Each module needs to not leak its C library implementation, but that’s achievable (although not necessarily achieved.)

                    I think the issue is more that glibc doesn’t always version symbols, and ELF doesn’t have a two level namespace, unlike OS X. OS X uses library + symbol to resolve symbols, whereas ELF traditionally only looked for a symbol (without library) which is very problematic once two C libraries are loaded into one process and are trying to resolve the same symbol name. I’d be interested to know if Linux has changed in this regard; obviously each binary encodes which shared libraries it needs, so the symbol lookup not checking for a library name was always a bit odd. It’s also the basis for how things like LD_PRELOAD work, because they can resolve a symbol from a completely different library.

                    If I’m right, the problem isn’t symbol versioning, it’s the lack of symbol versioning. With no symbol versioning, alpine-glibc wouldn’t have got off the ground.

                    1. 7

                      This requires basic ABI cleanliness - things like a library cannot malloc() and expect its caller to free(), since those might be calls to different C libraries.

                      I think you have partially answered your own question. It’s really common for C libraries to have functions that take ownership of their arguments and require that those are heap allocated values, assuming they can free() them when they’re done, or conversely they’ll malloc() something and return it to the caller, and part of the specified API is that the caller is responsible for free()-ing it. So it’s actually really really common for one library to malloc() and another to free(), and if they’re calling out to different implementations of libc, Bad Things will happen.

                      Additionally any library that expects to “own” part of the process state (which definitely includes libc) is going to run into problems with multiple copies/versions of it linked into the same process. E.g. from How To Corrupt An SQLite Database File:

                      As pointed out in the previous paragraph, SQLite takes steps to work around the quirks of POSIX advisory locking. Part of that work-around involves keeping a global list (mutex protected) of open SQLite database files. But, if multiple copies of SQLite are linked into the same application, then there will be multiple instances of this global list. Database connections opened using one copy of the SQLite library will be unaware of database connections opened using the other copy, and will be unable to work around the POSIX advisory locking quirks. A close() operation on one connection might unknowingly clear the locks on a different database connection, leading to database corruption.

                      The scenario above sounds far-fetched. But the SQLite developers are aware of at least one commercial product that was released with exactly this bug. The vendor came to the SQLite developers seeking help in tracking down some infrequent database corruption issues they were seeing on Linux and Mac. The problem was eventually traced to the fact that the application was linking against two separate copies of SQLite. The solution was to change the application build procedures to link against just one copy of SQLite instead of two.

                      1. 5

                        So it’s actually really really common for one library to malloc() and another to free()

                        Well, where I’m from, that’s a very clear no-no. There’s probably cultural differences at play here, but note that on Windows, the version of the C runtime library is determined by whatever compiler the application developer chooses, and any library exposing a stable ABI for application developers can’t know that in advance. Binary compatibility, when combined with continual C runtime library changes, implies a need to eliminate this pattern.

                        This leads to a handful of fairly simple patterns that always need to be followed:

                        1. Expose a function that can indicate the size of an allocation, have the caller allocate, and call a second time (or possibly different function) with an appropriate buffer;
                        2. Have a library allocate an object and expose a handle (opaque pointer) where all operations on the pointer, including its destruction, are owned by the module that allocated it;
                        3. In rare cases, specify the allocator used to interchange allocations across a module boundary. This is probably the least common and least clean, but lingers in some places, eg. clipboard.
                        1. 7

                          On Linux, from-source builds are the norm; binary compatibility is indeed fiddlier than I expect it is on proprietary platforms, and shipping cross-distro dynamically linked binaries can be done but is definitely a second-class citizen. Typically, a distro will have one version of libc installed, and if you want to drop a binary onto that distro you have a couple scenarios:

                          1. The binary was dynamically linked, and built against a compatible version of libc.
                          2. The binary was dynamically linked, and built against an incompatible version of libc. you’re probably SOL if you can’t build from source.
                          3. The binary is statically linked, in which case it doesn’t use any system libraries and talks directly to the kernel ABI (which is very stable).

                          For proprietary software vendors static linking isn’t a great option because LGPL licensed libraries require that the user can swap out a compatible version of the library – dynamic linking is usually how this is achieved if you’re not distributing source. So these vendors tend to go to the trouble of making sure (1) is the case, which can be done if the target system uses some version of glibc and you compile against a version that is at least as old as anything your users might be using.

                          In any case, the fact that this pattern does work on basically any Linux system means that applications will inevitably rely on it, and building a system where you might have two versions of malloc()/free() in the same process is indeed a recipie for disaster; the article is right to suggest that this is a terrible idea.

                          (Vague tangent; I read somewhere that internally at Google, if a service is exceeding its SLO, the maintainers will intentionally introduce artificial downtime, to ensure that users aren’t relying on it being more reliable than it’s supposed to be).

                        2. 4

                          So it’s actually really really common for one library to malloc() and another to free()

                          IME that’s not very common, and it’s actually very poor library design to return pointers that need to be cleaned up by the user directly with free()/delete.

                          The libraries I’ve used almost always provide explicit cleanup functions to handle the pointers they return. GDAL, to pick an example I’ve used recently, has a whole bunch of Create* functions that return pointers, and bunch of corresponding Destroy* functions that clean up after them.

                          Not only does it avoid the problem of C library differences, but it also lets the library swap in different mallocs (such as jemalloc) without impacting developers using the library, and it simplifies using the library from languages other than C (if I use your library from Python or Haskell, how do I ‘free’ a pointer?).

                          I’m not saying it doesn’t happen, but IMO it’s a “code smell”, and a good indicator I should look for a different library.

                          1. 2

                            I’m not saying it doesn’t happen, but IMO it’s a “code smell”, and a good indicator I should look for a different library.

                            Fair enough. “really common” isn’t exactly a precise number and I don’t have hard numbers for you; I would agree that it is the minority relative to encapsulating allocation in wrapper functions, but it is common enough that, in the context of the article, it doesn’t really matter whether it’s good or bad, if your system breaks code that does this you’re going to have a bad time.

                          2. 1

                            It’s really common for C libraries to have functions that take ownership of their arguments and require that those are heap allocated values, assuming they can free() them when they’re done, or conversely they’ll malloc() something and return it to the caller, and part of the specified API is that the caller is responsible for free()-ing it. So it’s actually really really common for one library to malloc() and another to free(), and if they’re calling out to different implementations of libc, Bad Things will happen.

                            I’m sorry, you’re posting with the voice of authority but I think you’re just wrong. That’s bad API design and it’s just begging for crashes. APIs that start off like that might be common, but sooner rather than later (assuming user adoption) someone is going to point this out and after the requisite denial period it’ll get fixed or the project will go nowhere because no serious user will integrate such a library into their code.

                            The biggest culprit is poorly designed C++ libraries that pass objects across API boundaries, which are then implicitly freed, but C APIs are much more resilient by nature since they typically pass pointers instead. Example: https://github.com/yue/yue/issues/82

                          3. 3

                            The problem is:

                            However, [alpine-glibc] is conceptually flawed, because it uses system libraries where available, which have been compiled against the musl C library.

                            So you’ve got an unholy, leaky mix of versioned and unversioned symbols from two different implementations calling into each other

                            1. 3

                              …with their own mallocs

                          1. 1

                            Only tangentially related, but am I to infer from this that generics have in fact been merged to tip, and will presumably be in the next major release? Looking forward to it.

                            1. 1

                              They are in heavy development and will most likely be available in the next version of Go, Go 1.18, as a preview (not finalized) feature with finalization happening in Go 1.19.

                              You can try them today if you use tip, but they are in heavy flux.

                            1. 9

                              I increasingly feel the opposite of this; continuously integrating new features into the browsers proper, when folks can already get those via transpilers and the like, serves to further bloat the web platform without a huge upside.

                              Fussing with import maps seems like it would be just as bad as fussing with the tooling that exists now. And the tooling could be much better than it is without the browsers lifting a finger.

                              Furthermore, especially with things like typescript, and languages that compile to wasm or javascript, off-line tooling isn’t going away, no matter what. Browsers will never be able to keep pace with what the latest tooling has to offer, and everything they add is code that has to be carried forever – whereas off-line tooling has softer constraints here.

                              What I’d like to see instead is for web platform development to focus on being a good platform for targeting with said tooling. Fill in gaps that make things harder on tool authors, and work to smooth out the experience of using the offline toolchains.

                              1. 8

                                Can someone explain this to me? I assume by “copy to another computer” they mean another computer with the same CPU type and OS. But then how is a compiled C program not portable? I mean, most of the commercial-software industry is literally based on copying the output of cc onto a {floppy, CD-ROM, Zip archive} and selling it to people to run on their computers.

                                Is this portability problem something specific to Linux? The comment in the table says “no, libc”, but libc is either built into the OS or statically linked into the executable, no?

                                (Also, how can Nim rate as more portable than C, when Nim is translated and run through a C compiler?)

                                1. 6

                                  Yes, I assume the author’s complaints mostly revolve around Linux, where different distros (and different distro versions) may have different versions of libc which are not ABI-compatible. It is possible to build an executable which will run against most glibc versions on most distros, and proprietary software vendors do this, but it is fair to say that there’s effort involved, whereas with say Go it “Just Works.”

                                  I’m given to understand that similar problems arise across major versions of some of the FOSS BSDs, but I’m less familiar there.

                                  Unsurprisingly, the FOSS world is much less built around distributing object code than the proprietary software world, and the toolchains reflect this.

                                  You can get around this by using static linking, since the kernel ABI is stable. In some cases this is easy, in others it can be a little annoying, e.g. because your distro doesn’t actually ship the static libraries for you to link against. But it does work.

                                  1. 9

                                    You can get around this by using static linking, since the kernel ABI is stable. In some cases this is easy, in others it can be a little annoying, e.g. because your distro doesn’t actually ship the static libraries for you to link against. But it does work.

                                    Of course, this only does you good for basic POSIX or server applications. It kinda falls apart when you get into more complex scenarios like as Andrew pointed out, “displaying a window”. The userland stability of say, FreeBSD at least gets you libc and perhaps things like audio, albeit not really X. The advantage of Win32 isn’t just the ABI stability, but the fact it covers at least the foundation to build most scenarios on without fuckery. You might need to ship a static libpng to display an image, but you can talk to USER32.DLL to display your window with minimal fuss.

                                    If I were looking into shipping a desktop Linux application in a sustainable way and couldn’t rely on distros, the options are probably down to AppImage and Flatpak. My tendencies would lean towards Flatpak because that’s seemingly where the puck is going, but you never know with desktop Linux. Or maybe just mandate Wine. Win32 is the stable userland ABI for Linux, after all. Just ask Valve.

                                    1. 3

                                      I’ve wondered for a while why, when asked about an OpenBSD release, the developer of Ripcord (coincidentally also the author of this post) said that it wasn’t going to happen because “The BSDs do not provide ABI stability or compatibility.”.

                                      Your post helped me finally understand, thank you!

                                      1. 4

                                        The BSDs vary wildly here. OpenBSD has zero ABI stability and intentionally breaks it between releases. FreeBSD and NetBSD try to provide dynamic library-level (so libc plus whatever additional libraries are part of the base system) backwards compatibility across releases with optional syscall compatibility through modules with varying degrees of support. (@david_chisnall could certainly elaborate on FreeBSD’s policy here.) edit: FreeBSD’s policies are better than Linux’s here, if only because you at least get libc and supporting cast. I don’t know how ports/pkgs come into play, since you prob want i.e. X.

                                        1. 9

                                          @david_chisnall could certainly elaborate on FreeBSD’s policy here.

                                          For the base system, FreeBSD provides complete backwards (but not forwards) ABI compat within a major release series, including kernel interfaces. A binary (program, library, kernel module) built for FreeBSD 12, for example, is guaranteed to work on any later FreeBSD 12.x release. A binary built for FreeBSD 12.1 may use interfaces not in 12.0 and so not work on older versions. Note that this applies only to things in the base system

                                          Between major versions, all kernel control-plane interfaces (e.g. the things that ifconfig or fstat depend on) may change. System calls may be removed but if they are then they are hidden behind a COMPAT_ configuration option and so you can build FreeBSD 13 with COMPAT_12 all the way back to COMPAT_4 and get access to old system calls. Most system libraries use symbol versioning and provide compatibility versions of symbols for older versions. The few that don’t will have a .so version bump between major versions if they change (recently, that’s only happened to move from not-using-symbol-versions to using-symbol-versions editions of the libraries) and the old version available in a compat package.

                                          The userspace guarantees are similar to glibc. The kernel guarantees are stronger than Linux within a major release (kernel modules don’t need recompiling) but weaker between major releases (control interfaces can break), on paper at least (Linux changes the behaviour of ioctls periodically, though typically not for anything Linus uses).

                                          Note that this applies only to the base system. Packages are on a rolling release. Some of these have very strong ABI guarantees. For example, X applications that use Xlib directly will work even if their X-facing parts haven’t changed for 30 years. Some have much weaker guarantees, for example GTK and Qt have much weaker guarantees. LLVM is really bad in this regard, reserving the right to break all of the C++ APIs between every 6-monthly release. We typically end up shipping multiple versions of these in the packages collection and maintaining them as long as they have consumers in the packages collection (llvm 7, 8, 9, 10, 11, 12, 13, and 14 are currently maintained in packages, for example). As long as you don’t link one of them directly and a different one indirectly via a dependency then things tend to be fine.

                                          If you’re shipping a proprietary application (as the linked rant seems to imply the author is doing), you’re better off shipping your own copy of the toolkit that you’re using (either statically linked or installed in a private location and dynamically linked via -rpath if you need to easily comply with the LGPL). If you do this then it’s guaranteed to work on FreeBSD for the lifetime of a major release (typically about 5 years) and will almost certainly work on the next one (which will have the previous version’s compat layer installed by default) and probably the one after that.

                                      2. 2

                                        Containers are the new static linking :P

                                    2. 4

                                      Also, how can Nim rate as more portable than C, when Nim is translated and run through a C compiler?

                                      I was wondering about that too, so I asked him. His reply:

                                      Nim has an extra dependency that requires an actual C compiler installed and working. It requires the Nim compiler, and also a C compiler, because the Nim compiler emits C code which the C compiler then compiles. That’s why it’s more fragile and complicated and more likely to break. (I have experienced this first-hand.)

                                      1. 3

                                        Isn’t that kind of repeating the objection? Sounds like he considers it no more portable than C, so his table has a typo(?) in it.

                                        1. 1

                                          I suppose it just means there are more moving parts that can break. If the path to the C compiler that’s baked into the Nim binary isn’t correct, for example, it won’t work. If the C compiler isn’t invoked with the correct flags, it won’t work, etc.

                                      2. 3

                                        how is a compiled C program not portable?

                                        Unsure given that the author doesn’t provide many rationales, but maybe they’re putting C in the not-portable-by-default bucket because libc isn’t statically linked by default? 🤷‍♂️

                                        1. 5

                                          …on Linux, I guess. As I thought, this article seems implicitly Linux-specific. On Apple/Darwin platforms, and I think on Windows, the C library is built into the OS and everything dynamically links it.

                                          1. 5

                                            Windows would prefer not to have any C runtime, and C programs are supposed to have an installer and bundle their runtime: https://devblogs.microsoft.com/oldnewthing/20140411-00/?p=1273 (statically linking MSVCRT.lib still requires a DLL at run time).

                                            macOS doesn’t have a stable kernel interface, so you can’t link libc (libSystem) statically. Fortunately it’s stable-enough for dynamic linking, provided that you set MACOSX_DEPLOYMENT_TARGET to an old-enough version.

                                            1. 4

                                              Windows doesn’t ship a C library*. What it does guarantee is everything that isn’t the C/C++ library. It ships language neutral (not linked against libc) libraries (not syscalls!) that applications link against instead. In practice, this means ABI stability that works for complex GUI applications.

                                              *: Well kinda. Now there’s UCRT which came after the Chen post, but there was MSVCRT/CRTDLL before it. But shipping the libc means you can’t go wrong.

                                              1. 2

                                                You make it sound exotic to link against libSystem. ¯_(ツ)_/¯ Everything does. Shipping a C binary isn’t exotic, it’s the way nearly every piece of software on Apple platforms works.

                                                As for Windows … I’ve learned it doesn’t pay to overestimate it.

                                              2. 1

                                                On most Linux-based OS it is the same.

                                            2. 2

                                              Is this portability problem something specific to Linux? The comment in the table says “no, libc”, but libc is either built into the OS or statically linked into the executable, no?

                                              On Linux, you can pick your poison of libc: musl or glibc. Sure, most distros go for glibc, but musl is very popular, especially in container world.

                                              (I personally think this list is mostly author’s personal aesthetics though.)

                                              1. 4

                                                Having a choice of C libs seems more like a curse than a boon, honestly.

                                            1. 6

                                              generally, I’m not a big fan of using Turing complete languages for configuration because the law of entropy dictates that eventually your configuration will depend on side-effects when it has the ability to.

                                              So if the configuration should indeed be changed, IMHO, it should be changed to something entirely declarative that cannot depend on arbitrary side-effects.

                                              1. 4

                                                Oh, that is a very good point. This has not been a problem with elm-review because Elm does not allow side-effects, but it’s different for JavaScript. Knowing JavaScript, it will likely become both a huge problem and a superpower…

                                                I think ESLint rules already feel like they can’t do side-effects, but some do like the rules in eslint-plugin-import (to resolve imports or something, can’t remember more details).

                                                1. 1

                                                  You could probably use SES to sandbox the config, so it can’t actually perform external side-effects (though unlike elm could use mutation internally for its own computation – but that sees fine).

                                              1. 9

                                                Interesting list, because of course it’s way more complicated than most of us thought. A shame they don’t take their own advice:

                                                Some people do not own phones, or do not wish to provide you with their telephone number when asked.

                                                Looking at you, GMail signup form. (And you too, Signal.)

                                                1. 9

                                                  It’s a difficult problem, because spam and other types of abuse is rampant, and phone verification is somewhat effective at filtering some of that out. It’s not perfect, of course, and especially organized criminals find ways around it. But it’s reasonably effective.

                                                  To give a simple example, I’ve had people persistently stalk me on WhatsApp; mostly Tinder dates and the like (including some whom I never even met in person!) I can now just block them with little fear of them creating a new account. And I’m fairly sure that women receive considerably more nonsense like this.

                                                  I really dislike it myself too though, I don’t even want to own a smartphone, and these stupid messaging apps with their phone integration is pretty much the only reason I do :-/

                                                  1. 4

                                                    In fairness to Signal, it started off as an app to send encrypted text messages over SMS, so being usable without a phone number would have been somewhat strange. Obviously that’s changed, and ironically when they stopped supporting SMS some people complained about it requiring a data connection…

                                                  1. 1

                                                    Interleaved with some contract work, I’m going to be dropping some old APIs in haskell-capnp and doing some much needed performance tuning.

                                                    1. 22

                                                      The first case (newtypes and specialization) is actually pretty interesting, because even the “bad” case compiles to a memset, which most would consider “about as fast as feasible”. See the code here: https://godbolt.org/z/rxxhMGjr6

                                                      The trick is that the “good” case is better than you would expect, because it calls __rust_alloc_zeroed, which as described has special handling by the compiler (or rather LLVM, which handles calls to calloc and such similarly, based on the comments in the rustc source). If you change the allocation to vec![1u8; ...] then it compiles to a memset instead and runs at the same speed as vec![WrappedByte(0); ...].

                                                      But we gotta go deeper (cue Inception sound sting): how the heck does allocating zero’ed memory run 1,000,000x faster than memset? This is hard to find out for certain, but I think the answer is “kernel trickery”: according to ye olde Stack Overflow, the kernel doesn’t actually zero the memory on allocation, but rather on first read. So it just hands the program a bunch of memory pages and says “these are all zero’ed, I promise”. Then when the program tries to actually read from them, it causes a page fault and the kernel says “oh uh, hold up one sec”, zero’s the memory, and then lets the program go about its business. On the other hand if the program writes to the memory before ever reading from it, the kernel never has to do the extra work of zero’ing them when it will never matter.

                                                      I think Rust needs a better concept of what types have 0 as a valid value. I suspect it would make a lot of the cases where people use uninitialized memory for performance reasons a lot safer.

                                                      1. 7

                                                        On the other hand if the program writes to the memory before ever reading from it, the kernel never has to do the extra work of zero’ing them when it will never matter.

                                                        I don’t thinks this works out; all the kernel sees is an attempted write to a small part of the relevant page, leaving the rest of the page still all supposedly zero. At this point the kernel has to provide a writable page, and it has to zero the rest of that page, since it won’t get a page fault on the next access if it marks it writable.

                                                        There are other reasons why doing this lazily is advantageous though. In particular:

                                                        • If some subset of the requested (virtual) pages are read but not written, the kernel can just map the same all-zero (physical) page to each of them, saving physical memory. It only needs to allocate separate pages on write.
                                                        • It’s common for programs to allocate a big slab of memory up front and not actually use all of it, so this can avoid doing work at all.
                                                        • Even if the whole thing is indeed written to up-front, I suspect doing the page allocation lazily would improve performance, as otherwise you’d be doing it in two separate passes, hurting locality.
                                                        1. 3

                                                          There already exists a trait, IsZero, that tells when the memory representation of a value is zero, 0 the trait is not public so it is not possible to implement for an arbitrary type.

                                                          That trait then again is used for the specialization such that it knows it can use calloc instead of malloc+memset. 1

                                                          It is possible to make the WrappedByte example use calloc by consuming it as an iterator at which point I assume llvm can see that it is mapped with an identity function and can optimize it out 2

                                                          vec![0u8; len]
                                                              .map(|x| WrappedByte(x))
                                                          1. 2

                                                            Addendum: You can see this in action by changing the allocation to let v: Vec<u8> = Vec::with_capacity(1<<34);, which allocates uninitialized memory. It runs about as quickly as the vec![0u8; 1<<34] case.

                                                          1. 1

                                                            Finishing up the main RPC bits of the API refactor I’ve been working on for haskell-capnp, and porting some of my downstream code to them. Then I get to start slashing & burning legacy code.

                                                            1. 24

                                                              I agree with most of what’s said in the article. However, it misses a bit the forest for the trees, at least considering the introduction. Programs that take seconds to load or display a list are not just ignoring cache optimizations. They’re using slow languages (or language implementations, for the pedantics out there) like cpython where even the simplest operations require a dictionary lookup, or using layers and layers of abstractions like electron, or making http requests for the most trivial things (I suspect it’s what makes slack slow; I know it’s what makes GCP’s web UI absolutely terrible). A lot of bad architectural choices, too.

                                                              Cache optimizations can be important but only as the last step. There’s a lot to be fixed before that, imho.

                                                              1. 16

                                                                Even beyond than that, I think there are more more baseline things going on: Most developers don’t even benchmark or profile. In my experience the most egregious performance problems I’ve seen have been straight-up bugs, and they don’t get caught because nobody’s testing. And the profiler basically never agrees with what I would have guessed the problem was. I don’t disagree with the author’s overall point, but it’s rare to come across a program that’s slow enough to be a problem that doesn’t have much lower hanging fruit than locality issues.

                                                                1. 3

                                                                  I agree so much! I’d even say that profiling is one half of the problem (statistical profiling, that is, like perf). The other half is tracing, which nowadays can be done with very convenient tools like Tracy or the chrome trace visualizer (“catapult”) if you instrument your code a bit so it can spit out json traces. These give insights in where time is actually spent.

                                                                  1. 1

                                                                    Absolutely. Most developers only benchmark if there’s a serious problem, and most users are so inured to bad response times that they just take whatever bad experience they receive and try to use the app regardless. Most of the time it’s some stupid thing the devs did that they didn’t realize and didn’t bother checking for (oops, looks like we’re instantiating this object on every loop iteration, look at that.)

                                                                  2. 9

                                                                    Programs that take seconds to load or display a list are not just ignoring cache optimizations.

                                                                    That’s right. I hammered on the cache example because it’s easy to show an example of what a massive difference it can make, but I did not mean to imply that it’s the only reason. Basically, any time we lose track of what the computer must do, we risk introducing slowness. Now, I don’t mean that having layers of abstractions or using dictionary are inherently bad (they will likely have a performance cost, but it may be reasonable to reach another objective), but we should make these choices intentionally rather than going by rote, by peer pressure, by habit, etc.

                                                                    1. 5

                                                                      The article implies the programmer has access to low level details like cache memory layout, but if you are programming in Python, Lua, Ruby, Perl, or similar, the programmer doesn’t have such access (and for those languages, the trade off is developer ease). I’m not even sure you get to such details in Java (last time I worked in Java, it was only a year old).

                                                                      The article also makes the mistake that “the world is x86”—at work, we still use SPARC based machines. I’m sure they too have cache, and maybe the same applies to them, but micro-optimizations are quite difficult across different architectures (and even across the same family but different generations).

                                                                      1. 6

                                                                        The article implies the programmer has access to low level details like cache memory layout, but if you are programming in Python, Lua, Ruby, Perl, or similar, the programmer doesn’t have such access

                                                                        The level of control that a programmer has is reduced in favor of other tradeoffs, as you said, but there’s still some amount of control. Often, it’s found in those languages best practices. For example, in Erlang one should prefer to use binaries for text rather than strings, because binaries are a contiguous sequence of bytes while strings are linked lists of characters. Another example, in Python it’s preferable to accumulate small substrings in a list and then use the join method rather that using concatenation (full += sub).

                                                                        The article also makes the mistake that “the world is x86”—at work, we still use SPARC based machines. I’m sure they too have cache, and maybe the same applies to them, but micro-optimizations are quite difficult across different architectures (and even across the same family but different generations).

                                                                        I don’t personally have that view, but I realize that it wasn’t made very clear in the text, my apologies. Basically what I want myself and other programmers to be mindful of is mechanical sympathy — to not lose track of the actual hardware that the program is going to run on.

                                                                        1. 4

                                                                          I know a fun Python example. Check this yes implementation:

                                                                          def yes(s):
                                                                            p = print
                                                                            while True:

                                                                          This hot-loop will perform significantly better than the simpler print(s) because of the way variable lookups work in Python. It first checks the local scope, then the global scope, and then the built-ins scope before finally raising a NameError exception if it still isn’t found. By adding a reference to the print function to the local scope here, we reduce the number of hash-table lookups by 2 for each iteration!

                                                                          I’ve never actually seen this done in real Python code, understandably. It’s counter-intuitive and ugly. And if you care this much about performance then Python might not be the right choice in the first place. The dynamism of Python (any name can be reassigned, at any time, even by another thread) is sometimes useful but it makes all these lookups necessary. It’s just one of the design decisions that makes it difficult to write a high-performance implementation of Python.

                                                                          1. 3

                                                                            That’s not how scoping works in Python.

                                                                            The Python parser statically determines the scope of a name (where possible.) If you look at the bytecode for your function (using dis.dis) you will see either a LOAD_GLOBAL, LOAD_FAST, LOAD_DEREF, or LOAD_NAME, corresponding to global, local, closure, or unknown scope. The last bytecode (LOAD_NAME) is the only situation in which multiple scopes are checked, and these are relatively rare to see in practice.

                                                                            The transformation from LOAD_GLOBAL to LOAD_FAST is not uncommon, and you see it in the standard library: e.g., https://github.com/python/cpython/blob/main/Lib/json/encoder.py#L259

                                                                            I don’t know what current measurements of the performance improvement look like, after LOAD_GLOBAL optimisations in Python 3.9, which reported 40% improvement: https://bugs.python.org/issue26219 (It may be the case that the global-to-local transformation is no longer considered a meaningful speed-up.)

                                                                            Note that the transformation from global-to-local scope, while likely innocuous, is a semantic change. If builtins.print or the global print is modified in some other execution unit (e.g., another thread,) the function will not reflect this (as global lookups can be considered late-bound, which is often desirable.)

                                                                            1. 8

                                                                              I think this small point speaks more broadly to the dissatisfaction many of us have with the “software is slow” mindset. The criticisms seem very shallow.

                                                                              Complaining about slow software or slow languages is an easy criticism to make from the outside, especially considering that the biggest risk many projects face is failure to complete or failure to capture critical requirements.

                                                                              Given a known, fixed problem with decades of computer science research behind it, it’s much easier to focus on performance—whether micro-optimisations or architectural and algorithmic improvements. Given three separate, completed implementations of the same problem, it’s easy to pick out which is the fastest and also happens to have satisfied just the right business requirements to succeed with users.

                                                                              I think the commenters who suggest that performance and performance-regression testing should be integrated into the software development practice from the beginning are on the right track. (Right now, I think the industry is still struggling with getting basic correctness testing and documentation integrated into software development practice.)

                                                                              But the example above shows something important. Making everything static or precluding a number of dynamic semantics would definitely give languages like Python a better chance at being faster. But these semantics are—ultimately—useful, and it may be difficult to predict exactly when and where they are critical to satisfying requirements.

                                                                              It may well be the case that some languages and systems err too heavily on the side of allowing functionality that reduces the aforementioned risks. (It’s definitely the case that Python is more dynamic in design than many users make use of in practice!)

                                                                              1. 2

                                                                                Interesting! I was unaware that the parser (!?) did that optimization. I suppose it isn’t difficult to craft code that forces LOAD_NAME every time (say, by reading a string from stdin and passing it to exec) but I find it totally plausible that that rarely happens in non-pathological code.

                                                                                Hm. For a lark, I decided to try it:

                                                                                >>> def yes(s):
                                                                                ...  exec("p = print")
                                                                                ...  p(s)
                                                                                >>> dis.dis(yes)
                                                                                  2           0 LOAD_GLOBAL              0 (exec)
                                                                                              2 LOAD_CONST               1 ('p = print')
                                                                                              4 CALL_FUNCTION            1
                                                                                              6 POP_TOP
                                                                                  3           8 LOAD_GLOBAL              1 (p)
                                                                                             10 LOAD_FAST                0 (s)
                                                                                             12 CALL_FUNCTION            1
                                                                                             14 POP_TOP
                                                                                             16 LOAD_CONST               0 (None)
                                                                                             18 RETURN_VALUE
                                                                                >>> yes("y")
                                                                                Traceback (most recent call last):
                                                                                  File "<stdin>", line 1, in <module>
                                                                                  File "<stdin>", line 3, in yes
                                                                                NameError: name 'p' is not defined
                                                                          2. 5

                                                                            and for those languages, the trade off is developer ease

                                                                            I heard Jonathan Blow make this point on a podcast and it stuck with me:

                                                                            We’re trading off performance for developer ease, but is it really that much easier? It’s not like “well, we’re programming in a visual language and just snapping bits together in a GUI, and it’s slow, but it’s so easy we can make stuff really quickly.” Like Python is easier than Rust, but is it that much easier? In both cases, it’s a text based OO language. One just lets you ignore types and memory lifetimes. But Python is still pretty complicated.

                                                                            Blow is probably a little overblown (ha), but I do think we need to ask ourselves how much convenience we’re really buying by slowing down our software by factors of 100x or more. Maybe we should be more demanding for our slow downs and expect something that trades more back for it.

                                                                            1. 2

                                                                              Like Python is easier than Rust, but is it that much easier?

                                                                              I don’t want to start a fight about types but, speaking for myself, Python became much more attractive when they added type annotations, for this reason. Modern Python feels quite productive, to me, so the trade-off is more tolerable.

                                                                              1. 1

                                                                                It depends upon the task. Are you manipulating or parsing text? Sure, C will be faster in execution, but in development?

                                                                                At work, I was told to look into SIP, and I started writing a prototype (or proof-of-concept if you will) in Lua (using LPeg to parse SIP messages). That “proof-of-concept” went into production (and is still in production six years later) because it was “fast enough” for use, and it’s been easy to modify over the years. And if we can ever switch to using x86 on the servers [1], we could easily use LuaJIT.

                                                                                [1] For reasons, we have to use SPARC in production, and LuaJIT does not support that architecture.

                                                                          3. 7

                                                                            The trick about cache optimizations is that that can be a case where, sure, individually you’re shaving nanoseconds off, but sometimes those are alarmingly common in the program flow and worth doing before any higher-level fixes.

                                                                            To wit: I worked on a CAD system implemented in Java, and the “small optimization” of switching to a pooled-allocation strategy for vectors instead of relying on the normal GC meant the difference between an unusable application and a fluidly interactive one, simply because the operation I fixed was so core to everything that was being done.

                                                                            Optimizing cache hits for something like mouse move math can totally be worth it as a first step, if you know your workload and what code is in the “hot” path (see also sibling comments talking about profiling).

                                                                            1. 6

                                                                              They’re using slow languages (or language implementations, for the pedantics out there) like cpython where even the simplest operations require a dictionary lookup

                                                                              I take issue with statements like this, because the majority of code in most programs is not being executed in a tight loop on large enough data to matter. The overall speed of a program has more to do with how it was architected than with how well the language it’s written in scores on microbenchmarks.

                                                                              Besides, Python’s performance cost isn’t a just an oversight. It’s a tradeoff that provides benefits elsewhere in flexibility and extensibility. Problems like serialization are trivial because of meta-programming and reflection. Complex string manipulation code is simple because the GC tracks references for you and manages the cleanup. Building many types of tools is simpler because you can easily hook into stuff at runtime. Fixing an exception in a Python script is a far more pleasant experience than fixing a segfault in a C program that hasn’t been built with DWARF symbols.

                                                                              Granted, modern compiled languages like Rust/Go/Zig are much better at things like providing nice error messages and helpful backtraces, but you’re paying a small cost for keeping a backtrace around in the first place. Should that be thrown out in favor of more speed? Depending on the context, yes! But a lot of code is just glue code that benefits more from useful error reporting than faster runtime.

                                                                              For me, the choice in language usually comes down to how quickly I can get a working program with limited bugs built. For many things (up to and including interactive GUIs) this ends up being Python, largely because of the incredible library support, but I might choose Rust instead if I was concerned about multithreading correctness, or Go if I wanted strong green-thread support (Python’s async is kinda meh). If I happen to pick a “fast” language, that’s a nice bonus, but it’s rarely a significant factor in that decision making process. I can just call out to a fast language for the slow parts.

                                                                              That’s not to say I wouldn’t have mechanical sympathy and try to keep data structures flat and simple from the get go, but no matter which language I pick, I’d still expect to go back with a profiler and do some performance tuning later once I have a better sense of a real-world workload.

                                                                              1. 4

                                                                                To add to what you say: Until you’ve exhausted the space of algorithmic improvements, they’re going to trump any microoptimisation that you try. Storing your data in a contiguous array may be more efficient (for search, anyway - wait until you need to insert something in the middle), but no matter how fast you make your linear scan over a million entries, if you can reframe your algorithm so that you only need to look at five of them to answer your query then a fairly simple data structure built out of Python dictionaries will outperform your hand-optimised OpenCL code scanning the entire array.

                                                                                The kind of microoptimisation that the article’s talking about makes sense once you’ve exhausted algorithmic improvements, need to squeeze the last bit of performance out of the system, and are certain that the requirements aren’t going to change for a while. The last bit is really important because it doesn’t matter how fast your program runs if it doesn’t solve the problem that the user actually has. grep, which the article uses as an example, is a great demonstration here. Implementations of grep have been carefully optimised but they suffer from the fact that requirements changed over time. Grep used to just search ASCII text files for strings. Then it needed to do regular expression matching. Then it needed to support unicode and do unicode canonicalisation. The bottlenecks when doing a unicode regex match over a UTF-8 file are completely different to the ones doing fixed-string matching over an ASCII text file. If you’d carefully optimised a grep implementation for fixed-string matching on ASCII, you’d really struggle to make it fast doing unicode regex matches over arbitrary unicode encodings.

                                                                                1. 1

                                                                                  The kind of microoptimisation that the article’s talking about makes sense once you’ve exhausted algorithmic improvements, need to squeeze the last bit of performance out of the system, and are certain that the requirements aren’t going to change for a while.

                                                                                  To be fair, I think the article also speaks of the kind of algorithmic improvements that you mention.

                                                                                2. 3

                                                                                  Maybe it’s no coincidence that Django and Rails both seem to aim at 100 concurrent requests, though. Both use a lot of language magic (runtime reflection/metaprogramming/metaclasses), afaik. You start with a slow dynamic language, and pile up more work to do at runtime (in this same slow language). In this sense, I’d argue that the design is slow in many different ways, including architecturally.

                                                                                  Complex string manipulation code is simple because the GC tracks references for you

                                                                                  No modern language has a problem with that (deliberately ignoring C). Refcounted/GC’d strings are table stakes.

                                                                                  I personally dislike Go’s design a lot, but it’s clearly designed in a way that performance will be much better than python with enough dynamic features to get you reflection-based deserialization.

                                                                                3. 1

                                                                                  All the times I had an urge to fire up a profiler the problem was either an inefficient algorithm (worse big-O) or repeated database fetches (inefficient cache usage). Never have I found that performance was bad because of slow abstractions. Of course, this might be because of software I work with (Python web services) has a lot of experiences on crafting good, fast abstractions. Of course, you can find new people writing Python that don’t use them, which results in bad performance, but that is quickly learned away. What is important if you want to write performant Python code, is to use as little of “pure Python” as possible. Python is a great glue language, and it works best when it is used that way.

                                                                                  1. 1

                                                                                    Never have I found that performance was bad because of slow abstractions.

                                                                                    I have. There was the time when fgets() was the culprit, and another time when checking the limit of a string of hex digits was the culprit. The most surprising result I’ve had from profiling is a poorly written or poorly compiled library.

                                                                                    Looking back on my experiences, I would have to say I’ve been surprised by a profile result about half the time.

                                                                                  2. 1

                                                                                    As a pedantic out here, I wanted to say that I appreciate you :)

                                                                                  1. 3

                                                                                    As someone who spends an unreasonable amount of time thinking about CLIs, it seems the main reason for all the confusion initially was just that in git checkout <tree-ish> -- <pathspec> the <tree-ish> allowed a default value. I wonder if that hadn’t been the case (you has to specify a branch/ref at all times) if the same confusion would have happened.

                                                                                    1. 6

                                                                                      The thing that strikes me as semantically muddled is that if you provide <pathspec>, it doesn’t change what branch/commit you have checked out, it just modifies files in the working directory. But if you don’t, it moves you to a different branch/commit.

                                                                                      1. 1

                                                                                        I think the thinking is that moving to a different commit is just modifying the working directory. You’re just modifying it it look like whatever the tree you specify is.

                                                                                        1. 7

                                                                                          It isn’t ‘just’ changing the working directory though, there’s a branch pointer that keeps track of where you are in the history, reflected by git status.

                                                                                          $ echo A > foo.txt
                                                                                          $ git add foo.txt
                                                                                          $ git commit -m foo.txt
                                                                                          [main (root-commit) b6197ed] foo.txt
                                                                                           1 file changed, 1 insertion(+)
                                                                                           create mode 100644 foo.txt
                                                                                          $ git checkout -b newbranch
                                                                                          Switched to a new branch 'newbranch'
                                                                                          $ echo B >> foo.txt 
                                                                                          $ git commit -am 'Add a line'
                                                                                          [newbranch a65b94d] Add a line
                                                                                           1 file changed, 1 insertion(+)
                                                                                          $ git checkout main foo.txt 
                                                                                          Updated 1 path from 46733b9
                                                                                          $ git status
                                                                                          On branch newbranch
                                                                                          Changes to be committed:
                                                                                            (use "git restore --staged <file>..." to unstage)
                                                                                          	modified:   foo.txt
                                                                                          $ git checkout main
                                                                                          Switched to branch 'main'
                                                                                          $ git status
                                                                                          On branch main
                                                                                          nothing to commit, working tree clean
                                                                                          1. 1

                                                                                            OK yeah, it changes HEAD, but defaults to keeping it the same.

                                                                                            So the summary is that it moves HEAD to whatever you specify, or leaves it unchanged if you specify nothing, and updates the working directory to reflect the move, filtered by a path list that you specify after --.

                                                                                            1. 3

                                                                                              Note that in the example above, git checkout main foo.txt does not change HEAD, despite having explicitly specified a branch (main). If pathspec is included, specifying a branch explicitly just influences where to get the files from.

                                                                                    1. 5

                                                                                      Quality is not negotiable. […] Our only measure of progress is delivering into our customers hands things they find valuable.

                                                                                      I once asked Holub on Twitter what you should do if the client doesn’t consider security valuable. His response was something like “then don’t make it secure.”

                                                                                      Edit found the thread, someone else was taking the hardline, but he was still handwaving away a lot of the concerns. I’ll be honest, I don’t really Holub’s online personality or any of his stances, so that might be coloring how I see this.

                                                                                      1. 3

                                                                                        “effective software development” occurs in a realm abstracted away from “software development that people with money pay for”.

                                                                                        One would assume they were related somehow, but the computer industry has a long history of particularly successful businesses succeeding despite, rather than because, of their development practices. Alas, business success is then equated, by those desirous of business success, with Good/Effective Software Development practices.

                                                                                        And so life gets worse.

                                                                                        1. 3

                                                                                          I think the comparison to radium/asbestos is apt, and ultimately while I try to practice some basic degree of professional ethics myself, I don’t think we can really dig ourselves out of this at scale without externally imposed regulations.

                                                                                          See also: https://idlewords.com/talks/haunted_by_data.htm

                                                                                          1. 1

                                                                                            I’ll be honest, I don’t really Holub’s online personality or any of his stances, so that might be coloring how I see this.

                                                                                            I appreciate your honesty and share your feelings. I find that Holub’s attitude tends toward the opposite extremes of “idealism” and, at the other end, “give them what they ask for” which can be hard to reconcile. I see shades of that extreme dichotomy of approach in this list.

                                                                                          1. 13

                                                                                            I’m not really sure what would be best for Copilot to do in this situation. I don’t think what it’s doing now is actually useful in practice, although it’s an impressive-looking demonstration.

                                                                                            A big problem that seems pervasive in machine learning systems is that they don’t try to detect when they’ve been given a problem that’s above their pay-grade so-to-speak; they just do the best they can, which given tough problems is often worse than just giving up and telling the human they’re stumped. Or at least signaling low confidence in their answer.

                                                                                            My gut tells me that this is a shallower problem than its pervasiveness would have you believe, but AI isn’t really my area, so I may be wrong.

                                                                                            1. 4

                                                                                              When I first learned about computers, there was an abbreviation in common use: GIGO. Garbage in, garbage out. This is especially true for any ML system. The output quality depends hugely on the training set. This, unfortunately, composes very badly with Sturgeon’s Law.

                                                                                              Determining whether a problem is ‘above their pay grade’ is a difficult problem because it requires the system to estimate the difficulty of the task. Training a machine-learning system to understand this requires having a data set that is annotated with the difficulty of tasks. You might be able to infer it from something else (for example, the number of commits to a particular bit of code with commit messages labelled some variant of bug fix, or the amount of time people spend in VS Code reading a particular bit of code, on average) but in general it’s very hard. To do it without ML requires being able to articulate a set of rules that define the difference between easy and difficult programming problems and if you could do that then you could probably produce a program synthesis tool that works a lot better than Copilot.

                                                                                            1. 10

                                                                                              I think an important direction for future programming language development is better support for writing single programs that span multiple nodes. It’s been done, e.g. erlang, but it would be nice to see more tight integration of network protocols into programming languages, or languages that can readily accommodate libraries that do this without a lot of fuss.

                                                                                              There’s still some utility in IDLs like protobufs/capnproto in that realistically the whole world isn’t going to converge on one language any time soon, so having a nice way of describing an interface in a way that’s portable across languages is important for some use cases. But today we write a lot of plumbing code that we probably shouldn’t need to.

                                                                                              1. 3

                                                                                                I couldn’t agree more. Some sort of language feature or DSL or something would allow you to have your services architecture without paying quite so many of the costs for it.

                                                                                                Type-checking cross-node calls, service fusion (i.e. co-locating services that communicate with each other on the same node to eliminate network traffic where possible), RPC inlining (at my company we have RPC calls that amount to just CPU work but they’re in different repos and different machines because they’re written by different teams; if the compiler had access to that information it could eliminate that boundary), something like a query planner for complex RPCs that decay to many other backend RPC calls (we pass object IDs between services but often many of them need the data about those same underlying objects so they all go out to the data access layer to look up the same objects). Some of that could be done by ops teams with implementation knowledge but in our case those implementations are changing all of the time so they’d be out of date by the time the ops team has time to figure out what’s going on under the hood. There’s a lot that a Sufficiently Smart Compiler(tm) can do given all of the information

                                                                                                1. 3

                                                                                                  There is also a view that it is a function of underlying OS (not a particular programming language) to seamlessly provide ‘resources’ (eg memory, CPU, scheduling) etc. across networked nodes.

                                                                                                  This view is, sometimes, called Single Image OS (briefly discussed that angle in that thread as well )

                                                                                                  Overall, I agree, of course, that creating safe, efficient and horizontally scalable programs – should much easier.

                                                                                                  Hardware is going to continue to drive horizontal scalability capabilities (whether it is multiple cores, or multiple nodes, or multiple video/network cards)

                                                                                                  1. 2

                                                                                                    I was tempted to add some specifics about projects/ideas I thought were promising, but I’m kinda glad I didn’t, since everybody’s chimed with stuff they’re excited about and there’s a pretty wide range. Some of these I knew about others I didn’t, and this turned out to be way more interesting than if it had been about one thing!

                                                                                                    1. 2

                                                                                                      Yes, but: you need to avoid the mistakes of earlier attempts to do this, like CORBA, Java RMI, DistributedObjects, etc. A remote call is not the same as an in-process call, for all the reasons called out in the famous Fallacies Of Distributed Computing list. Earlier systems tried to shove that inconvenient truth under the rug, with the result that ugly things happened at runtime.

                                                                                                      On the other hand, Erlang has of course been doing this well for a while.

                                                                                                      I think we’re in better shape to deal with this now thanks all the recent work languages have been doing to provide async calls, Erlang-style channels, Actors, and better error handling through effect systems. (Shout out to Rust, Swift and Pony!)

                                                                                                      1. 2

                                                                                                        Yep! I’m encouraged by signs that we as a field have learned our lesson. See also: https://capnproto.org/rpc.html#distributed-objects

                                                                                                        1. 1

                                                                                                          Cap’nProto is already on my long list of stuff to get into…

                                                                                                      2. 2

                                                                                                        Great comment, yes, I completely agree.

                                                                                                        This is linked from the article, but just in case you didn’t se it, http://catern.com/list_singledist.html lists a few attempts at exactly that. Including my own http://catern.com/caternetes.html

                                                                                                        1. 2

                                                                                                          This is what work like Spritely Goblins is hoping to push forward

                                                                                                          1. 1

                                                                                                            I think an important direction for future programming language development is better support for writing single programs that span multiple nodes.


                                                                                                            I think the model that has the most potential is something near to tuple spaces. That is, leaning in to the constraints, rather than trying to paper over them, or to prop up anachronistic models of computation.

                                                                                                            1. 1

                                                                                                              better support for writing single programs that span multiple nodes.

                                                                                                              That’s one of the goals of Objective-S. Well, not really a specific goal, but more a result of the overall goal of generalising to components and connectors. And components can certainly be whole programs, and connectors can certainly be various forms of IPC.

                                                                                                              Having support for node-spanning programs also illustrates the need to go beyond the current call/return focus in our programming languages. As long as the only linguistically supported way for two components to talk to each other is a procedure call, the only way to do IPC is transparent RPCs. And we all know how well that turned out.

                                                                                                              1. 1

                                                                                                                indeed! Stuff like https://www.unisonweb.org/ looks promising.

                                                                                                              1. 11

                                                                                                                I feel they emphasize the wrong things to be modularizable; I think the battery is the most important by far. Everything else is likely not worthwhile due to signalling changes/bottlenecking/etc, but batteries are perishable and the most important thing for portability. It’d be great if you could easily put in some 16850s without soldering and have it Just Work.

                                                                                                                1. 10

                                                                                                                  The battery is replaceable, although internal so you have to open it up. It is a bit odd to me that that isn’t called out explicitly in their marketing, but I did find this thread: https://community.frame.work/t/framework-team-why-did-you-choose-to-make-the-battery-internal/1187/3

                                                                                                                  which is about why it’s internal vs. external, but one of their employees confirms that it is at least replaceable (it is internal mainly for space savings; there’s a genuine design trade-off there).

                                                                                                                  1. 7

                                                                                                                    I don’t really care about being able to hotswap batteries - that’s a stupid parlour trick. What matters more is if you can get new (not NOS, since those decay) batteries that fit. Prismatic batteries are an essential compromise, but they make this much harder.

                                                                                                                    1. 2

                                                                                                                      True; I think making standardized form factors for prismatic batteries is important future work if this kind of thing is going to take off.

                                                                                                                      1. 2

                                                                                                                        We kinda have this for smaller devices - many Sony, Nokia, and Samsung batteries became de facto standards for things like wireless keyboards.

                                                                                                                  2. 2

                                                                                                                    16850s are way, way too fat for “ultrabooks”. There should be some kind of new thin-but-big battery standard…

                                                                                                                    1. 8

                                                                                                                      Honestly, I’ve never seen the appeal in ultrabooks. Oh, it’s thin? That’s nice. My mechanical keyboard is at an excellent height, and that’s rather more than an inch off of the table.

                                                                                                                      What matters is weight and mechanical stiffness. If the user can pick it up, open, by one corner, and not get flexing, there’s nothing wrong with the physical specs.

                                                                                                                      1. 10

                                                                                                                        Size and weight matter. I don’t want it to weigh much when it’s in my bag or even carting it around desks, but thickness is underrated in terms of “can I hold it around my arms easily?” and “can I fit more stuff into my luggage?”

                                                                                                                        I will say my MBA is much thinner and lighter than my old X230t, yet is much more physically stiff. Old ThinkPads are bendier than people remember.

                                                                                                                      2. 7

                                                                                                                        Sure but that’s just another way of saying “ultrabook is the wrong form factor for a device that prioritizes long life” if you ask me.

                                                                                                                        1. 4

                                                                                                                          Well… the form factor is just the bigger priority for lots of people, myself included. For long life, max power, upgradeability, and all the other good things I have a big beefy desktop already. I don’t need a laptop that competes with the desktop, I need a laptop I can take anywhere easily — it must occupy minimum weight and space in my backpack, should be light enough to carry around in one hand.

                                                                                                                          1. 2

                                                                                                                            I understand the argument about weight, but what kind of backpack do you have that you can’t fit a non-ultrabook laptop in it? Maybe you should buy a bigger bag instead of a less useful laptop.

                                                                                                                            1. 2

                                                                                                                              “Less useful” for you, maybe. For him, it’s the ideal compromise.