1. 4

    This is an interesting thread on making Makefiles which are POSIX-compatible. The interesting thing is that it’s very hard or impossible, at least if you want to keep some standard features like out-of-tree builds. I’ve never restricted myself to write portable Makefiles (I use GNU extensions freely), but I previously assumed it wasn’t that bad.

    That this is so hard is maybe a good example of why portability to different dependencies is a bad goal when your dependencies are already open source and portable. As many posters in the thread say, you can just use gmake on FreeBSD. The same goes for many other open source dependencies: If the software is open source, portability to alternatives to that software is not really important.

    1. 4

      you can just use gmake on FreeBSD.

      I can, but I don’t want to.

      If you want to require any specific tool or dependecy, fine, that’s your prerogative, just don’t force your idea of the tool’s cost on me. Own your decision, if it impacts me, be frank about it, just don’t bullshit me that it doesn’t impact me just because the cost for you is less than the cost for me.

      The question of why don’t you use X instead of Y is nobody’s business but mine. I fully understand and expect that you might not care about Y, please respect my right not to care about X.

      1. 11

        That’s very standard rhetoric about portability, but the linked thread shows it’s not so simple in this case: It’s essentially impossible to write good, portable Makefiles.

        1. 5

          Especially considering how low cost using GNU Make is, over i.e. switching OS/architecture.

          1. 2

            It’s just as easy to run BSD make on Linux as it is to run GNU make on BSDs, yet if I ship my software to Linux users with a BSD makefile and tell them to install BSD make, there will hardly be a person who wouldn’t scorn at the idea.

            Yet Linux users expect BSD users not to complain when they do exact same thing.

            Why is this so hard to understand, the objection is not that you have to run some software dependency, the objection is people telling you that you shouldn’t care about the nature of the dependency because their cost for that dependency is different than yours.

            I don’t think that your software is bad because it uses GNU make, and I don’t think that using GNU make makes you a bad person, but if you try to convince me that “using GNU make is not a big deal”, then I don’t want to ever work with you.

            1. 2

              Are BSD makefiles incompatible with GNU make? I actually don’t know.

              1. 2

                The features, syntax, and semantics of GNU and BSD make are disjoint. Their intersection is POSIX make, which has almost no features.

                …but that’s not the point at all.

                1. 2

                  If they use BSD specific extensions then yes

            2. 2

              Posix should really standardize some of GNU make’s features (e.g. pattern rules) and/or the BSDs should just adopt them.

              1. 5

                I get the vibe at this point that BSD intentionally refuses to make improvements to their software specifically because those improvements came from GNU, and they really hate GNU.

                Maybe there’s another reason, but why else would you put up with a program that is missing such a critically important feature and force your users to go thru the absurd workarounds described in the article when it would be so much easier and better for everyone to just make your make better?

                1. 4

                  I get the vibe at this point that BSD intentionally refuses to make improvements to their software specifically because those improvements came from GNU, and they really hate GNU.

                  Really? I’ve observed the opposite. For example, glibc refused to adopt the strl* functions from OpenBSD’s libc, in spite of the fact that they were useful and widely implemented, and the refusal to merge them explicitly called them ‘inefficient BSD crap’ in spite of the fact that they were no less efficient than existing strn* functions. Glibc implemented the POSIX _l-suffixed versions but not the full set from Darwin libc.

                  In contrast, you’ll find a lot of ‘added for GNU compatibility’ functions in FreeBSD libc, the *BSD utilities have ‘for GNU compatibility’ in a lot of places. Picking a utility at random, FreeBSD’s du [has two flags that are listed in the man page as first appearing in the GNU version], whereas GNU du does not list any as coming from BSDs (though -d, at least, was originally in FreeBSD’s du - the lack of it in GNU and OpenBSD du used to annoy me a lot since most of my du invocations used -d0 or -d1).

                  1. 2

                    The two are in no way mutually exclusive.

                  2. 1

                    Maybe there’s another reason, but why else would you put up with a program that is missing such a critically important feature and force your users to go thru the absurd workarounds described in the article when it would be so much easier and better for everyone to just make your make better?

                    Every active software project has an infinite set of possible features or bug fixes; some of them will remain unimplemented for decades. glibc’s daemon function, for example, has been broken under Linux since it was implemented. The BSD Make maintainers just have a different view of the importance of this feature. There’s no reason to attribute negative intent.

                    1. 1

                      The BSD Make maintainers just have a different view of the importance of this feature

                      I mean, I used to think that too but after reading the article and learning the details I have a really hard time continuing to believe that. we’re talking about pretty basic everyday functionality here.

                    2. 1

                      Every BSD is different, but most BSDs are minimalist-leaning. They don’t want to add features not because GNU has them, but because they only want to add things they’ve really decided they need. It’s an anti-bloat philosophy.

                      GNU on the other hand is basically founded in the mantra “if it’s useful then add it”

                      1. 6

                        I really don’t understand the appeal of the kind of philosophy that results in the kind of nonsense the linked article recommends. Why do people put up with it? What good is “anti-bloat philosophy” if it treats “putting build files in directories” as some kind of super advanced edge case?

                        Of course when dealing with people who claim to be “minimalist” it’s always completely arbitrary where they draw the line, but this is a fairly clear-cut instance of people having lost sight of the fact that the point of software is to be useful.

                        1. 3

                          The article under discussion isn’t the result of a minimalist philosophy, it’s the result of a lack of standardisation. BSD make grew a lot of features that were not part of POSIX. GNU make also grew a similar set of features, at around the same time, with different syntax. FreeBSD and NetBSD, for example, both use bmake, which is sufficiently powerful to build the entire FreeBSD base system.

                          The Open Group never made an effort to standardise any of them and so you have two completely different syntaxes. The unfortunate thing is that both GNU Make and bmake accept all of their extensions in a file called Makefile, in addition to looking for files called GNUmakefile / BSDmakefile in preference to Makefile, which leads people to believe that they’re writing a portable Makefile and complain when another Make implementation doesn’t accept it.

                2. 7

                  But as a programmer, I have to use some build system. If I chose Meson, that’d be no problem; you’d just have to install Meson to build my software. Ditto if I chose cmake. Or mk. Why is GNU make any different here? If you’re gonna wanna compile my software, you better be prepared to get my dependencies onto your machine, and GNU make is probably gonna be one of the easiest build systems for a BSD user to install.

                  As a Linux user, if your build instructions told me to install bsdmake or meson or any other build system, I wouldn’t bat an eye, as long as that build system is easy to install from my distro’s repos.

                  1. 3

                    Good grief, why is this so difficult to get through? If you want to use GNU make, or Meson, or whatever, then do that! I use GNU make too! I also use Plan 9’s mk, which few people have installed, and even fewer would want to install. That’s not the point.

                    The problem here has nothing to do with intrinsic software properties at all, I don’t know why this is impossible for Linux people to understand.

                    If you say “I am using GNU make, and if you don’t like it, tough luck”, that’s perfectly fine.

                    If you say “I am using GNU make, which can’t cause any problem for you because you can just install it” then you are being ignorant of other people’s needs, requirements, or choices, or you are being arrogant for pretending other people’s needs, requirements, or choices are invalid, and of course in both cases you are being patronizing towards users you do not understand.

                    This has nothing to do with GNU vs. BSD make. It has nothing to do with software, even. It’s a social problem.

                    if your build instructions told me to install bsdmake or meson or any other build system, I wouldn’t bat an eye, as long as that build system is easy to install from my distro’s repos.

                    And this is why Linux users do not understand the actual problem. They can’t fathom that there are people for whom the above way of doing things is unacceptable. It perfectly fine not to cater to such people, what’s not fine is to demand that their reasoning is invalid. There are people to whom extrinsic properties of software are far more important than their intrinsic properties. It’s ironic that Linux people have trouble understanding this, given this is the raison d’etre for the GNU project itself.

                    1. 5

                      I think the question is “why is assuming gmake is no big deal any different than assuming meson is no big deal?” And I think your answer is “those aren’t different, and you can’t assume meson is no big deal” but you haven’t come out and said that yet.

                  2. 1

                    I can, but I don’t want to.

                    Same. Rewriting my Makefiles is so annoying, that so far I have resigned to just calling gmake on FreeBSD. Maybe one day I will finally do it. I never really understood how heavily GNUism “infected” my style of writing software, until I switched to the land of the BSD.

                  3. 2

                    What seems to irk BSD users the most is putting gnuisms in a file called Makefile; they see the file and expect to be able to run make, yet that will fail. Naming the file GNUMakefile is an oft-accepted compromise.

                    I admit I do not follow that rule myself, but if I ever thought a BSD user would want to use my code, I probably would follow it, or use a Makefile-generator.

                    1. 4

                      I’d have a lot more sympathy for this position if BSD make was actually good, but their refusal to implement pattern rules makes it real hard to take seriously.

                      1. 2

                        I’d have a lot more sympathy for this position if BSD make was actually good

                        bmake is able to build and install the complete FreeBSD source tree, including both kernel and userland. The FreeBSD build is the most complex make-based build that I’ve seen and is well past the level of complexity where I think it makes sense to have hand-written Makefiles.

                        For the use case in mind, it’s worth noting that you don’t need pattern rules, bmake puts things in obj or $OBJDIRPREFIX by default.

                    2. 1

                      That this is so hard is maybe a good example of why portability to different dependencies is a bad goal when your dependencies are already open source and portable.

                      I mean, technically you are right, but in my opinion, you are wrong because of the goal of open source.

                      The goal of open source is to have as many people as possible using your software. That is my premise, and if it is wrong, the rest of my post does not apply.

                      But if that is the goal, then portability to different dependencies is one of the most important goals! The reason is because it is showing the user empathy. Making things as easy as possible for users is being empathetic towards them, and while they may not notice that you did it, subconsciously, they do. They don’t give up as easily, and in fact, sometimes they even put extra effort in.

                      I saw this when porting my bc to POSIX make. I wrote a configure script that uses nothing other than POSIX sh. It was hard, mind you, I’m not denying that.

                      But the result was that my bc was so portable that people started using on the BSD’s without my knowledge, and one of those users decided to spend effort to demonstrate that my bc could make serious performance gains and help me to realize them once I made the decision to pursue that. He also convinced FreeBSD to make my bc the system default for FreeBSD 13.

                      Having empathy for users, in the form of portability, makes some of them want to give back to you. It’s well worth it, in my opinion. In fact, I just spent two days papering over the differences between filesystems on Windows and on sane platforms so that my next project could be portable enough to run on Windows.

                      (Oh, and my bc was so portable that porting it to Windows was little effort, and I had a user there help me improve it too.)

                      1. 4

                        The goal of open source is to have as many people as possible using your software.

                        I have never heard that goal before. In fact, given current market conditions, open source may not be the fastest way if that is your goal. Millions in VC to blow on marketing does wonders for user aquisition

                        1. 1

                          That is true, but I’d also prefer to keep my soul.

                          That’s the difference. One is done by getting users organically, in a way that adds value. The other is a way to extract value. Personally, I don’t see Open Source as having an “extract value” mindset in general. Some people who write FOSS do, but I don’t think FOSS authors do in general.

                        2. 4

                          The goal of open source is to have as many people as possible using your software.

                          I actually agree with @singpolyma that this isn’t necessarily a goal. When I write software and then open source it, it’s often stuff I really don’t want many people to use: experiments, small tools or toys, etc. I mainly open source it because the cost to me of doing it is negligible, and I’ve gotten enough neat random bits and pieces of fun or interesting stuff out of other people’s weird software that I want to give back to the world.

                          On the other hand, I’ve worked on two open source projects whose goal was to be “production-quality” solutions to certain problems, and knew they weren’t going to be used much if they weren’t open source. So, you’re not wrong, but I’d turn the statement around: open source is a good tool if you want as many people as possible using your software.

                      1. 5

                        I find myself reluctantly agreeing with most of the article, which makes me sad. Nevertheless, I would like to be pragmatic about this.

                        That said, I think that most of the problems with the GPL can be sufficiently mitigated if we just remove the virality. In particular, I don’t think that copyleft is the problem.

                        The reason is because I believe that without the virality, companies would be willing to use copyleft licenses since the requirements for compliance would literally be “publish your changes.” That’s a low bar and especially easy in the world of DVCS’s and GitHub.

                        However, I could be wrong, so if I am, please tell me how.

                        1. 10

                          The problem with ‘non-viral’ copyleft licenses (more commonly known as ‘per-file copyleft’ licenses) is that they impede refactoring. They’re fine if the thing is completely self-contained but if you want to change where a layer is in the system then you can’t move functions between files without talking to lawyers. Oh, and if you use them you’re typically flamed by the FSF because I don’t think anyone has managed to write a per-file copyleft license that is GPL-compatible (Mozilla got around this by triple-licensing things).

                          That said, I think one of the key parts of this article is something that I wrote about 15 or so years ago: From an end-user perspective, MS Office better meets a bunch of the Free Software Manifesto requirements than OpenOffice. If I find a critical bug in either then, as an experienced C++ programmer, I still have approximately the same chance of fixing it in either: zero. MS doesn’t let me fix the MS Office bug[1] but I’ve read some of the OpenOffice code and I still have nightmares about it. For a typical user, who isn’t a C++ programmer, OpenOffice is even more an opaque blob.

                          The fact that MS Office is proprietary has meant that it has been required to expose stable public interfaces for customisation. This means that it is much easier for a small company to maintain a load of in-house extensions to MS Office than it is to do the same for most F/OSS projects. In the ‘90s, MS invested heavily in end-user programming tools and as a result it’s quite easy for someone with a very small amount of programming experience to write some simple automation for their workload in MS Office. A lot of F/OSS projects have an elitist attitude about programming and don’t want end users to be extending the programs unless they pass the gatekeeping requirements of learning programming languages whose abstractions are far too low-level for the task at hand. There is really no reason that anything other than a core bit of compute-heavy code for any desktop or mobile app needs to be written in C/C++/Rust, when it could be in interpreted Python or Lua without any user-perceptible difference in performance.

                          Even the second-source argument (which is really compelling to a lot of companies) doesn’t really hold up because modern codebases are so huge. Remember that Stallman was writing that manifesto back when a typical home computer such as the BBC Model B was sufficiently simple that a single person could completely understand the entire hardware and software stack and a complete UNIX system could be written by half a dozen people in a year (Minix was released a few years later and was written by a single person, including kernel and userland. It was around 15,000 lines of code). Modern software is insanely complicated. Just the kernel for a modern *NIX system is millions of lines of code, so is the compiler. The bc utility is a tiny part of the FreeBSD base system (if memory serves, you wrote it, so should be familiar with the codebase) and yet is more code than the whole of UNIX Release 7 (it also has about as much documentation as the entire printed manual for UNIX Release 7).

                          In a world where software is this complex, it might be possible for a second company to come along and fix a bug or add a feature for you but it’s going to be a lot more expensive for them to do it than the company that’s familiar with the codebase. This is pretty much the core of Red Hat’s business model: they us Fedora to push core bits of Red Hat-controlled code into the Linux ecosystem, make them dependencies for everything, and then can charge whatever the like for support because no one else understands the code.

                          From an end-user perspective, well-documented stable interfaces with end-user programming tools give you the key advantages of Free Software. If there are two (or more) companies that implement the same stable interfaces, that’s a complete win.

                          F/OSS also struggles with an economic model. Proprietary software exists because we don’t have a good model for any kind of zero-marginal-cost goods. Creating a new movie, novel, piece of investigative journalism, program, and so on, is an expensive activity that needs funding. Copying any of these things has approximately zero cost, yet we fund the former by charging for the latter. This makes absolutely no sense from any rational perspective yet it is, to date, the only model that has been made to work at scale.

                          [1] Well, okay, I work at MS and with the whole ‘One Microsoft’ initiative I can browse all of our internal code and submit fixes, but this isn’t an option for most people.

                          1. 3

                            The fact that MS Office is proprietary has meant that it has been required to expose stable public interfaces for customisation. This means that it is much easier for a small company to maintain a load of in-house extensions to MS Office than it is to do the same for most F/OSS projects. In the ‘90s, MS invested heavily in end-user programming tools and as a result it’s quite easy for someone with a very small amount of programming experience to write some simple automation for their workload in MS Office. A lot of F/OSS projects have an elitist attitude about programming and don’t want end users to be extending the programs unless they pass the gatekeeping requirements of learning programming languages whose abstractions are far too low-level for the task at hand. There is really no reason that anything other than a core bit of compute-heavy code for any desktop or mobile app needs to be written in C/C++/Rust, when it could be in interpreted Python or Lua without any user-perceptible difference in performance.

                            I’ve found this to be true for Windows too, as I wrote in a previous comment. I technically know how to extend the Linux desktop beyond writing baubles, but it’s shifting sands compared to how good Windows has been with extensibility. I’m not going to maintain a toolkit or desktop patchset unless I run like, Gentoo.

                            BTW, from your other reply:

                            I created a desktop environment project around this idea but we didn’t have sufficient interest from developers to be able to build anything compelling. F/OSS has a singular strength that is also a weakness: It is generally written by people who want to use the software, not by people who want to sell the software. This means that it tends to be incredibly usable to the authors but it is only usable in general if the authors are representative of the general population (and since they are, by definition, programmers, that is intrinsically not the case).

                            I suspect this is why it never built a tool something like Access/HyperCard/Excel/etc. that empower end users - because they don’t need it, because they are developers. Arguably, the original sin of free software (is assuming users are developers), and in a wider sense, why its threat model drifted further from reality.

                            1. 2

                              The problem with ‘non-viral’ copyleft licenses (more commonly known as ‘per-file copyleft’ licenses) is that they impede refactoring. They’re fine if the thing is completely self-contained but if you want to change where a layer is in the system then you can’t move functions between files without talking to lawyers.

                              Is it possible to have a non-viral copyleft license that is not per-file? I hope so, and I wrote licenses to do that which I am going to have checked by a lawyer. If he says it’s impossible, I’ll have to give up on that.

                              Oh, and if you use them you’re typically flamed by the FSF because I don’t think anyone has managed to write a per-file copyleft license that is GPL-compatible (Mozilla got around this by triple-licensing things).

                              Eh, I’m not worried about GPL compatibility. And I’m not worried about being flamed by the FSF.

                              That said, I think one of the key parts of this article is something that I wrote about 15 or so years ago: From an end-user perspective, MS Office better meets a bunch of the Free Software Manifesto requirements than OpenOffice. If I find a critical bug in either then, as an experienced C++ programmer, I still have approximately the same chance of fixing it in either: zero. MS doesn’t let me fix the MS Office bug[1] but I’ve read some of the OpenOffice code and I still have nightmares about it. For a typical user, who isn’t a C++ programmer, OpenOffice is even more an opaque blob.

                              This is a good point, and it is a massive blow against Free Software since Free Software was supposed to be about the users.

                              Even the second-source argument (which is really compelling to a lot of companies) doesn’t really hold up because modern codebases are so huge.

                              I personally think this is a separate problem, but yes, one that has to be fixed before the second-source argument applies.

                              The bc utility is a tiny part of the FreeBSD base system (if memory serves, you wrote it, so should be familiar with the codebase) and yet is more code than the whole of UNIX Release 7 (it also has about as much documentation as the entire printed manual for UNIX Release 7).

                              Sure, it’s a tiny part of the codebase, but I’m not sure bc is a good example here. bc is probably the most complicated of the POSIX tools, and it still has less lines of code than MINIX. (It’s about 10k of actual lines of code; there are a lot of comments for documentation.) You said MINIX implemented userspace; does that mean POSIX tools? If it did, I have very little faith in the robustness of those tools.

                              I don’t know if you’ve read the sources of the original Morris bc, but I have (well, its closest descendant). It was terrible code. When checking for keywords, the parser just checked for the second letter of a name and then just happily continued. And hardly any error checking at all.

                              After looking at that code, I wondered how much of original Unix was terrible in the same way, and how terrible MINIX’s userspace is as well.

                              So I don’t think holding up original Unix as an example of “this is how simple software can be” is a good idea. More complexity is needed than that; we want robust software as well.

                              In other words, I think there is a place for more complexity in software than original Unix had. However, the complexity in modern-day software is out of control. Compilers don’t need to be millions of lines of code, and if you discount drivers, neither should operating systems. But they can have a good amount of code. (I think a compiler with 100k LOC is not too bad, if you include optimizations.)

                              So we’ve gone from too much minimalism to too much complexity. I hope we can find the center between those two. How do we know when we have found it? When our software is robust. Too much minimalism removes robustness, and too much complexity does the same thing. (I should write a blog post about that, but the CMake/Make recursive performance one comes first.)

                              bc is complex because it’s robust. In fact, I always issue a challenge to people who claim that my code is bad to find a crash or a memory bug in bc. No one has ever come back with such a bug. That is the robustness I am talking about. That said, if bc were any more complex than it is (and I could still probably reduce its complexity), then it could not be as robust as it is.

                              Also, with regards to the documentation, it has that much documentation because (I think) it documents more than the Unix manual. I have documented it to ensure that the bus factor is not a thing, so the documentation for it goes down to the code level, including why I made decisions I did, algorithms I used, etc. I don’t think the Unix manual covered those things.

                              From an end-user perspective, well-documented stable interfaces with end-user programming tools give you the key advantages of Free Software. If there are two (or more) companies that implement the same stable interfaces, that’s a complete win.

                              This is a point I find myself reluctantly agreeing with, and I think it goes back to something you said earlier:

                              A lot of F/OSS projects have an elitist attitude about programming and don’t want end users to be extending the programs unless they pass the gatekeeping requirements of learning programming languages whose abstractions are far too low-level for the task at hand.

                              This, I think, is the biggest problem with FOSS. FOSS was supposed to be about user freedom, but instead, we adopted this terrible attitude and lost our way.

                              Perhaps if we discarded this attitude and made software designed for users and easy for users to use and extend, we might turn things around. But we cannot make progress with that attitude.

                              That does, of course, point to you being correct about other things, specifically, that licenses matter too much right now because if we changed that attitude, would licenses really matter? In my opinion, not to the end user, at least.

                              1. 5

                                Sure, it’s a tiny part of the codebase, but I’m not sure bc is a good example here. bc is probably the most complicated of the POSIX tools, and it still has less lines of code than MINIX. (It’s about 10k of actual lines of code; there are a lot of comments for documentation.) You said MINIX implemented userspace; does that mean POSIX tools? If it did, I have very little faith in the robustness of those tools.

                                To be clear, I’m not saying that everything should be as simple as code of this era. UNIX Release 7 and Minix 1.0 were on the order of 10-20KLoC for two related reasons:

                                • The original hardware was incredibly resource constrained, so you couldn’t fit much software in the available storage and memory.
                                • They were designed for teaching (more true for Minix, but somewhat true for early UNIX versions) and so were intentionally simple.

                                Minix did, I believe, implement POSIX.1, but so did NT4’s POSIX layer: returning ENOTIMPLEMENTED was a valid implementation and it was also valid for setlocale to support only "C" and "POSIX". Things that were missing were added in later systems because they were useful.

                                My point is that the GNU Manifesto was written at a time when it was completely feasible for someone to sit down and rewrite all of the software on their computer from scratch. Today, I don’t think I would be confident that I could rewrite awk or bc, let alone Chromium or LLVM from scratch and I don’t think I’d even be confident that I could fix a bug in one of these projects (I’ve been working on LLVM since around 2007 and I there are bugs I’ve encountered that I’ve had no idea how to fix, and LLVM is one of the most approachable large codebases that I’ve worked on).

                                So we’ve gone from too much minimalism to too much complexity. I hope we can find the center between those two. How do we know when we have found it? When our software is robust. Too much minimalism removes robustness, and too much complexity does the same thing. (I should write a blog post about that, but the CMake/Make recursive performance one comes first.)

                                I’m not convinced that we have too much complexity. There’s definitely some legacy cruft in these systems but a lot of what’s there is there because it has real value. I think there’s also a principle of conservation of complexity. Removing complexity at one layer tends to cause it to reappear at another and that can leave you with a less robust system overall.

                                Perhaps if we discarded this attitude and made software designed for users and easy for users to use and extend, we might turn things around. But we cannot make progress with that attitude.

                                I created a desktop environment project around this idea but we didn’t have sufficient interest from developers to be able to build anything compelling. F/OSS has a singular strength that is also a weakness: It is generally written by people who want to use the software, not by people who want to sell the software. This means that it tends to be incredibly usable to the authors but it is only usable in general if the authors are representative of the general population (and since they are, by definition, programmers, that is intrinsically not the case).

                                One of the most interesting things I’ve seen in usability research was a study in the early 2000s that showed that only around 10-20% of the population thinks in terms of hierarchies for organisation. Most modern programming languages implicitly have a notion of hierarchy (nested scopes and so on) and this is not a natural mindset of the majority of humans (and the most widely used programming language, Excel, does not have this kind of abstraction). This was really obvious when iTunes came out with its tag-and-filter model: most programmers said ‘this is stupid, my music is already organised in folders in a nice hierarchy’ and everyone else said ‘yay, now I can organise my music!’. I don’t think we can really make usable software until we have programming languages that are usable by most people, so that F/OSS projects can have contributors that really reflect how everyone thinks. Sadly, I’m making this problem worse by working on a programming language that retains several notions of hierarchy. I’d love to find a way of removing them but they’re fairly intrinsic to any kind of inductive proof, which is (to date) necessary for a sound type system.

                                That does, of course, point to you being correct about other things, specifically, that licenses matter too much right now because if we changed that attitude, would licenses really matter? In my opinion, not to the end user, at least.

                                Licenses probably wouldn’t matter to end users, but they would still matter for companies. I think one of the big things that the F/OSS community misses is that 90% of people who write software don’t work for a tech company. They work for companies whose primary business is something else and they just need some in-house system that’s bespoke. Licensing matters a lot to these people because they don’t have in-house lawyers who are an expert in software licenses and so they avoid any license that they don’t understand without talking to a lawyer. These people should be the ones that F/OSS communities target aggressively because they are working on software that is not their core business and so releasing it publicly has little or no financial cost to them.

                                1. 1

                                  To be clear, I’m not saying that everything should be as simple as code of this era.

                                  Apologies.

                                  My point is that the GNU Manifesto was written at a time when it was completely feasible for someone to sit down and rewrite all of the software on their computer from scratch.

                                  Okay, that makes sense, and I agree that the situation has changed.

                                  Today, I don’t think I would be confident that I could rewrite awk or bc, let alone Chromium or LLVM from scratch and I don’t think I’d even be confident that I could fix a bug in one of these projects (I’ve been working on LLVM since around 2007 and I there are bugs I’ve encountered that I’ve had no idea how to fix, and LLVM is one of the most approachable large codebases that I’ve worked on).

                                  I think I can tell you that you could rewrite awk or bc. They’re not that hard, and 10k LOC is a walk in the park for someone like you. But point taken with LLVM and Chromium.

                                  But then again, I think LLVM could be less complex. Chromium, could be as well, but it’s limited by the W3C standards. I could be wrong, though.

                                  I think the biggest problem with most software, including LLVM, is scope creep. Even with bc, I feel the temptation to add more and more.

                                  With LLVM, I do understand that there is a lot of inherent complexity, targeting multiple platforms, lots of needed canonicalization passes, lots of optimization passes, codegen, register allocation. Obviously, you know this better than I do, but I just wanted to make it clear that I understand the inherent complexity. But is it all inherent?

                                  I’m not convinced that we have too much complexity. There’s definitely some legacy cruft in these systems but a lot of what’s there is there because it has real value. I think there’s also a principle of conservation of complexity. Removing complexity at one layer tends to cause it to reappear at another and that can leave you with a less robust system overall.

                                  There is a lot of truth to that, but that’s why I specifically said (or meant) that maximum robustness is the target. I doubt you or anyone would say that Chromium is as robust as possible. I personally would not claim that about LLVM either. I also certainly would not claim that about Linux, FreeBSD, or even ZFS!

                                  And I would not include legacy cruft in “too much complexity” unless it is past time that it is removed. For example, Linux keeping deprecated syscalls is not too much complexity, but keeping support for certain arches that have only single-digit users, none of whom will update to the latest Linux, is definitely too much complexity. (It does take a while to identify such cruft, but we also don’t spend enough effort on it.)

                                  Nevertheless, I agree that trying to remove complexity where you shouldn’t will lead to it reappearing elsewhere.

                                  F/OSS has a singular strength that is also a weakness: It is generally written by people who want to use the software, not by people who want to sell the software. This means that it tends to be incredibly usable to the authors but it is only usable in general if the authors are representative of the general population (and since they are, by definition, programmers, that is intrinsically not the case).

                                  I agree with this, and the only thing I could think of to fix this is to create some software that I myself want to use, and to actually use it, but to make it so good that other people want to use it. Those people need support, which could lead to me “selling” the software, or services around it. Of course, as bc shows (because it does fulfill all of the requirements above, but people won’t pay for it), it should not just be anything, but something that would be critical to infrastructure.

                                  One of the most interesting things I’ve seen in usability research was a study in the early 2000s that showed that only around 10-20% of the population thinks in terms of hierarchies for organisation. Most modern programming languages implicitly have a notion of hierarchy (nested scopes and so on) and this is not a natural mindset of the majority of humans (and the most widely used programming language, Excel, does not have this kind of abstraction).

                                  I think I’ve seen that result, and it makes sense, but hierarchy unfortunately makes sense for programming because of the structured programming theorem.

                                  That said, there is a type of programming (beyond Excel) that I think could be useful for the majority of humans is functional programming. Data goes in, gets crunched, comes out. I don’t think such transformation-oriented programming would be too hard for anyone. Bonus points if you can make it graphical (maybe like Blender’s node compositor?). Of course, it would probably end up being quite…inefficient…but once efficiency is required, they can probably get help from a programmer.

                                  I don’t think we can really make usable software until we have programming languages that are usable by most people, so that F/OSS projects can have contributors that really reflect how everyone thinks.

                                  I don’t think it’s possible to create programming languages that produce software that is both efficient and well-structured without hierarchy, so I don’t think, in general, we’re going to be able to have contributors (for code specifically) that are not programmers. That does make me sad. However, what we could do is have more empathy for users and stop assuming we have the same perspective as they do. We could assume that what is good for normal users might not be bad for us and actually try to give them what they need.

                                  But even with that, I don’t think the result from that research is that people 80-90% of people can’t think in hierarchies, just that they do not do so naturally. I think they can learn. Whether they want to is another matter…

                                  I could be wrong about both things; I’m still young and naive.

                                  Licenses probably wouldn’t matter to end users, but they would still matter for companies. I think one of the big things that the F/OSS community misses is that 90% of people who write software don’t work for a tech company. They work for companies whose primary business is something else and they just need some in-house system that’s bespoke. Licensing matters a lot to these people because they don’t have in-house lawyers who are an expert in software licenses and so they avoid any license that they don’t understand without talking to a lawyer. These people should be the ones that F/OSS communities target aggressively because they are working on software that is not their core business and so releasing it publicly has little or no financial cost to them.

                                  That’s a good point. How would you target those people if you were the one in charge?

                                  Now that I have written a lot and taken up a lot of your time, I must apologize. Please don’t feel obligated to respond to me. But I have learned a lot in our conversations.

                              2. 1

                                They’re fine if the thing is completely self-contained but if you want to change where a layer is in the system then you can’t move functions between files without talking to lawyers.

                                Maybe I misunderstand MPL 2.0, but I think this is a non-issue: if you’re not actually changing the code (just the location), you don’t have to publish anything. If you modify the code (changing implementation), then you have to publish the changes. This is easiest done on a per file basis of course, but I think you technically only need to publish the diff.

                                This is why it’s non viral: you say, “I’ve copied function X into my code and changed the input from integer to float”. You don’t have to say anything else about how it’s used or why such changes were necessary.

                                1. 1

                                  Generally, when you refactor, you don’t just move the code, you move and modify it. If you modify code from an MPL’d file that you’ve copied into another file then you need to make sure that you propagate the MPL into that file and share the changes.

                                2. 1

                                  they us Fedora to push core bits of Red Hat-controlled code into the Linux ecosystem, make them dependencies for everything, and then can charge whatever the like for support because no one else understands the code.

                                  How do they make their things “dependencies for everything”? It seems you left out a step where other vendors/distributions choose to adopt Red Hat projects or not.

                                  1. 2

                                    ISTM that quite a number of RH-backed projects are now such major parts of the infrastructure of Linux that it’s quite hard not to use them. Examples: pulseaudio, systemd, Wayland, and GNOME spring to mind.

                                    All the mainstream distros are now based on these, and the alternatives that are not are increasingly niche.

                                3. 4

                                  If you want “non viral copyleft”, there are options: Mozilla Public License and the CDDL which has been derived from it. While they have niches in which they’re popular it’s not like they have taken off, so I’m not sure if “companies would be willing” is the right description.

                                  1. 1

                                    I think you have a point, which is discouraging to say the least.

                                  2. 1

                                    Without the viral-nature, couldn’t you essentially white-wash the license by forking once and relicensing as MIT, then forking the MIT fork? It would take any power out of the license to enforce itself terms.

                                    1. 1

                                      No, because you can’t relicense someone else’s work.

                                      “Virality” is talking about how it forces other software the depends on the viral software to release under the same license.

                                      1. 1

                                        So would you have to submit the source of the individual GPL components used as part of a derivative work? I don’t think the GPL would even make sense if it didn’t effect the whole project, that’s what the LGPL is for.

                                        1. 1

                                          I think if you want to add a single GPL component you would need to release the full software under GPL. (Unless there were other licenses to allow the mixing)

                                      2. 1

                                        No.

                                        Virality is a separate thing from copyleft. People just think they are connected because the GPL is the first license that had both.

                                        You can have a clause in the license that says that the software must be distributed under that license for the parts of the software that were originally under the license.

                                        An example is a license I’ve written (https://yzena.com/yzena-copyleft-license/). It says specifically that the license only applies to the original source code, and any changes to the original source code. Anything else that is integrated (libraries, etc.) is not under the license.

                                        Warning: Do NOT use that license. I have not had a lawyer check it. I will as soon as I can, but until then, it’s not a good idea to use.

                                    1. 37

                                      I hate to say it, but while the substance of the article is useful, it disproves the title, in my opinion.

                                      The title says that Nix is close to perfect, and then lays out a lot of shortcomings that take it far away from perfect.

                                      I personally wish that there was a Nix that was well-documented, not so academic and elitist, and in general, had some empathy for users, especially for new users. In fact, lacking empathy for users alone makes something not close to perfect, in my opinion.

                                      Also, the soft split that is mentioned makes me nervous. Why such a split? What makes flakes better enough that some people use them, but not better enough that others don’t?

                                      This all might sound very negative, and if so, I apologize. I want Nix’s ideas to take off, so I actually feel discouraged about the whole thing.

                                      1. 16

                                        Unpopular opinion here. The Nix docs are weird, but they are mostly fine. I usually don’t have any issue with them. The thing that usually gets me is the holes in my knowledge about how certain packaging technologies (both distro- and language-level ones) work and taking some of the things other distros do automatically for granted.

                                        Here’s an example. You are playing in a Ubuntu-based distro, and you are writing some Python. You pip install some-dependency, import it, and everything is easy, right. Well, it felt easy because two months ago you apt install-ed a C dependency you forgot about, and that brought in a shared lib that your Python package uses. Or your pip install fetches a pre-built wheel that “just runs” (only on Ubuntu and few other distros, of course).

                                        Nix is brutally honest and makes this shit obvious. Unfortunately, dealing with it is hard. [1] Fortunately, once you deal with all that, it tends to stay dealt with and doesn’t randomly break on other people’s computers.

                                        Learning Nix has helped me learn Linux in ways I never originally suspected. It’s tons of fun (most of the time)!


                                        [1] The rude awakening that a Python library can require a Fortran compiler is always fun to watch from the side. :)

                                        1. 10

                                          The Nix docs are weird because they’re written for the wrong audience: people who want to learn Nix. I don’t care about Nix. Every moment I spend learning about Nix is just an inconvenience. Most Nix users are probably like that too. Users just want a packaging system that works, all of this discussion about Nix fundamentals is anti-documentation, things we need to skip to get to what we want: simple recipes for common tasks.

                                          But Nix also has at least two fundamental technical issues and one practical issue that exacerbate the doc situation. The practical issue has to do with the name: it’s just a search disaster that Nix is three things (a distro, a language, and a package manager). On to the technical issues.

                                          1. I can’t explore Nix because of the choice of a lazy language. Forcing values by printing them with builtins.trace is a minefield in Nix. Sometimes printing an object will result in it trying to create thousands of .drv files. Other times you find yourself printing one of the many circular objects that Nix uses.

                                            In Haskell and C++ I get to look at types to figure out what kind of object I’ve got, in addition to the docs. In Scheme and Python I get to print values to explore any object. In Nix? I can do neither. I don’t get types and I don’t get to print objects at runtime easily. At least you can print .drv files to figure out that say, a package happens to have a lib output, and that’s what you need to depend on instead of the default out output.

                                          2. There are almost no well-defined APIs within the Nix world.

                                            Aside from derivations, it’s all ad-hoc. Different parts of the Nix ecosystem work completely differently to accomplish the same goals. So learning how to do something to C packages, doesn’t help you when you’re dealing with Python packages, and doesn’t help you when you’re dealing with Haskell packages (where there are two completely different ecosystems that are very easy for novices to confuse). Flakes add a bit of structure, but they’ve been unstable for 3 years now with no stability on the horizon.

                                          1. 2

                                            I agree on both technical issues. Static types and type signatures in Nix would be especially amazing. I spend so much time wondering “what type does this have” when looking at nixpkgs code. :(

                                            As for the fundamentals and anti-documentation features, I am not so sure. I think Nix is such a fundamentally different way of doing things, that you need to start somewhere. For example, I can’t give users a packaging script sprinkled with content SHA’s without explaining what those are and why we need them in the first place (It’s especially baffling when they sit right next to git commit SHA’s). The Nix pills guide has a good way of introducing the important concepts and maybe it can be shortened, so that people can go through most of the stuff they need in half an hour. I don’t know…

                                        2. 10

                                          not so academic and elitist

                                          For a language that is the bastard child of Bash and ML, I would not consider it “academic”. The ugliness of the language is due in no small part to the affordances for real-world work.

                                          As far as elitism…well, it’s a hard tool to use. It’s getting easier. It’s strange to me to expect that such powerful magic shouldn’t take some work to learn (if not master).

                                          1. 19

                                            For a language that is the bastard child of Bash and ML, I would not consider it “academic”. The ugliness of the language is due in no small part to the affordances for real-world work.

                                            I never said the language was academic. I don’t think it is. In fact, it’s less that the language is academic and more that the documentation and culture are.

                                            As far as elitism…well, it’s a hard tool to use. It’s getting easier. It’s strange to me to expect that such powerful magic shouldn’t take some work to learn (if not master).

                                            Power does not imply something must be hard to learn. That is a common misconception, but it’s not true.

                                            As an example, consider Python. It’s not that hard to learn, yet it is enormously powerful. Or Ruby.

                                            In fact, Ruby is a great example because even though it’s powerful, it is mostly approachable because of the care taken in helping it to be approachable, epitomized by _why’s Poignant Guide.

                                            _why’s efforts worked because he made Ruby approachable with humor and by tying concepts of Ruby to what people already knew, even if he had to carefully lay out subtle differences. Those techniques, applied with care, would work for Nix too.

                                            So the problem with Nix is that they use the power as an excuse to not put more effort into making it approachable, like _why did for Ruby. This, I think, is a side effect of the culture I mentioned above.

                                            If someone wrote a Nix equivalent of _why’s Poignant Guide, playing to their own strengths as writers and not just trying to copy _why, I think Nix would have a massive uptake not long after.

                                            In fact, please do write that guide, if you would like to.

                                            1. 12

                                              If I had more spoons I’d definitely do that

                                          2. 9

                                            I agree. It is some of the coolest Linux technology that is out there, but it is so hard to use. Both because its poor documentation and because of how different it is. When I used it, it felt like once a week I would try to do something not allowed/possible and then would have to go multiple pages deep on a thread somewhere to find multiple competing tools that claim to solve that problem best. I think I will try Nix the language on the next personal project I make that involves more than one language, but I haven’t had a chance to do that in a while.

                                            I would love to try NixOS again sometime. Hopefully they will come out with better documentation and/or a “let me cheat just this once so I can continue working” feature.

                                            Edit: I forgot to say, great article though! I enjoyed your perspective.

                                            1. 6

                                              I have found the Guix documentation quite good, and the community very welcoming, for what it’s worth.

                                              1. 6

                                                Keep in mind that the more familiar someone is with the subject the more issues they can talk about. I could go on for ages about problems with python, even though it is a perfect language for most of my use cases - It’s not a contradiction.

                                                The post just concentrated on the negatives rather than positives - and there are some really cool things about nix. Especially if you have use cases where everything else seems to be worse (looking at you chef/puppet/ansible).

                                                1. 1

                                                  I wouldn’t feel discouraged if I were you. Nix’s community is nothing but growing. Most of these issues are warts, not dealbreakers.

                                                  1. 1

                                                    makes me miss WORLDofPEACE. Didn’t know them, but they’re a good example of someone that can make anyone feel welcome IMO.

                                                  1. 4

                                                    I’m going to post my Hacker News comment about this article here.

                                                    This post has good ideas, but there are a few things wrong with this.

                                                    First, we forget that filesystems are not hierarchies, they are graphs, whether DAG’s or not. 1

                                                    Second, and this follows from the first, both tags and hierarchy are possible with filesystems as they currently are.

                                                    Here’s how you do it:

                                                    1. Organize your files in the hierarchy you want them in.
                                                    2. Create a directory in a well-known place called tags/ or whatever you want.
                                                    3. For every tag <name>, create a directory tags/<name>/
                                                    4. Hard-link all files you want to tag under each tag directory that apply.
                                                    5. For extra credit, create a soft link pointing to the same file, but with a well-known name.

                                                    This allows you to use the standard filesystem tools to get all files under a specific tag. For example,

                                                    find tags/<name> -type f
                                                    

                                                    (The find on my machine does not follow symbolic links and does not print them if you use the above command.) If you want to find where the file is actually under the hierarchy, use

                                                    find -L tags/ -xtype l
                                                    

                                                    Having both hard and soft links means that 1) you cannot lose the actual file if it’s moved in the hierarchy (the hard link will always refer to it), and 2) you can either find the file in the hierarchy from the tag or you know that the file has been moved in the hierarchy.

                                                    Also, if you want to find files under multiple tags, I found that the following command works:

                                                    find -L tags/tag1 tags/tag2 -xtype l | xargs readlink -f | sort | uniq -d
                                                    

                                                    I have not figured out how to find files under more than one tag without following the links, but it could probably be done by taking the link name and prepending where the link points to plus a space, then sorting on where the link points.

                                                    Of course, I’m no filesystem expert, so I probably got a few things wrong. I welcome smarter people to tell me how I am wrong.

                                                    1. 6

                                                      The hard/soft link scheme has some problems. The way most application save files breaks hard links, because for safety you have to write to a new file and then rename the new file replacing the old. A symlink will survive that, but if you both save and move/rename a file in between checking the tag, your links are both broken.

                                                      In a way you’re trying to reinvent the file alias, which has existed on macOS since 1991. An alias is like a symlink but also contains the original’s fileID (like a hard link), and if the file’s on a remote volume it has metadata allowing the filesystem to be remounted. macOS got around the safe-save problem with an FSExchangeFiles system call that preserves the original file’s fileID during the rename.

                                                      At a higher level, though, I think your argument is similar to saying “you can already do X in language Y, because Y is Turing-complete.” Which is true, but irrelevant if doing X is too awkward or slow, or incompatible with the way everyone uses Y. Apple’s Spotlight metadata/search system represents this approach applied to a normal filesystem, but it’s still pretty limited.

                                                      As an example of how things could be really, fundamentally different, my favorite mind-opening example is NewtonOS’s “soup”.

                                                      1. 5

                                                        It’s worth noting that this is more or less what BFS did. It provided four high-level features:

                                                        • Storage entities that contained key-value pairs.
                                                        • Storage for small values.
                                                        • Storage for large values.
                                                        • Maps from queries to storage entitites.

                                                        Every file is an entity and the ‘contents’ is typically either a large or small value (depending on the contents of the file) with a well-known key. HFS-style forks / NTFS alternative data streams could be implemented as other key-value pairs. Arbitrary metadata could also be stored with any file (the BeOS Tracker had some things to grab ID3 tags from MP3s and store them in metadata, for example).

                                                        BeOS provided a search function that would crawl metadata and generate a set of files that matched a specific query. This could be stored in BFS and any update to the metadata of any file could update the query. Directories were just a special case of this: they were saved queries of a key-pair identifying a parent-child relationship.

                                                        The problem is not that filesystems can’t represent these structures it’s that:

                                                        • Filesystems other than BFS don’t have a consistent way of representing them (doubly true for networked filesystems) and,
                                                        • UIs don’t expose this kind of abstraction at the system level, so if it exists it’s inconsistent from one application to another.
                                                        1. 2

                                                          (I think the correct spelling is “BeFS”.) BeFS designer Dominic Giampaolo went on to Apple and applied a lot of these concepts in Spotlight. It’s not as deeply wired into the filesystem itself, but provides a lot of the same functionality.

                                                          1. 6

                                                            I think the correct spelling is “BeFS”

                                                            I take Dominic’s book describing the FS as the canonical source for the name, and it uses BFS, though I personally prefer BeFS.

                                                            BeFS designer Dominic Giampaolo went on to Apple and applied a lot of these concepts in Spotlight. It’s not as deeply wired into the filesystem itself, but provides a lot of the same functionality.

                                                            Spotlight is very nice in a lot of ways, but it is far less ambitious. In particular, Spotlight had the design requirement that it should work with SMB2 shares with the same abstractions. Because Spotlight maintains the indexes in userspace, it is possible to get out of sync (and is actually quite easy, which then makes the machine really slow for a bit as Spotlight goes and tries to reindex everything, and things like Mail.app search just don’t work until it’s finished). Spotlight also relies on plugins to parse files, rather than providing structured metadata storage, which means that the same file on two different machines may appear differently in searches (saved or otherwise). For example, if you put a Word document on an external disk and then search for something in a keyword in the metadata, it will be found. If you then plug this disk into a machine that doesn’t have Word installed, it won’t be. In contrast, with the BFS model Word would have been responsible for storing the metadata and then it would have been preserved everywhere.

                                                        2. 2

                                                          I like your idea. It made me realize that one little system of my own is, in fact, tagging: I have a folder called to-read that contains symlinks to documents I’ve saved.

                                                          Tangentially: I want rich Save dialogs. Current ones only let you save a file. I would love it if I could

                                                          • Save a file
                                                          • Set custom file attributes like ‘downloaded-from’ or ‘see-also’ or ‘note-to-self’
                                                          • Create symlinks or hardlinks in other directories
                                                          • Check or select programs/scripts to run on the newly-created file?
                                                          • All in one dialog
                                                          1. 2

                                                            Then somebody will edit their file with an editor that uses atomic rename, and your hard links will all be busted.

                                                            1. 2

                                                              TBH This sounds like a fragile system that only addresses the very shallow benefits of a more db-like object store.

                                                            1. 2

                                                              It’s amazing how much of a difference something like this makes. There’s a lot of cool stuff in Boost but I’d avoided most of it because either vendoring bits of boost into my tree or depending on the system-provided boost libraries (which may be old or shipped in an odd way) was too much effort. I’ve started using Boost since I started using vcpkg: adding a boost components just requires adding a single line to a json file and it works everywhere. Great to see more things supporting it.

                                                              1. 2

                                                                I wonder if this might be a sign of a trend towards language package management in C/C++. Traditionally after all, most projects often just vendored stuff, because they couldn’t really rely on tthe system package management.

                                                                1. 1

                                                                  There is definitely a trend though things are complicated by a zoo of build system and now a growing number of package managers.

                                                                  Also, in case of build2, we don’t aim it as a “language package manager for C/C++” but rather as a general-purpose build toolchain that starts at the beginning (C/C++). For example we already have modules for bash and rust, though the latter is still very limited.

                                                                2. 1

                                                                  Not to diminish vcpkg‘s achievements (which for one had Boost packaged for several years now and for two has quite a few more packages), but this goes a step further and replaces Boost’s own build system with build2 which mean everything is built with the same build system and in the same invocation and that has some benefits (not least of which is build speed).

                                                                  1. 4

                                                                    That’s cool, but I’m not sure how useful it is for boost. All of the boost libraries that I have wanted to use so far are header-only, so they implicitly integrate with my build system (once C++ modules are actually supported by compilers in a sensible way, they’ll actually integrate with my build system, rather than just being visible to my compiler). I have a couple of projects that use vcpkg to pull in LLVM and that’s a lot more painful because vcpkg does a Debug and a Release build of every dependency at configure time. A Debug build of LLVM generates about 70 GiB of object code and each build takes a few CPU hours, so even on my desktop (10-core Xeon) it takes about half an hour to do the configure step and eats a lot of disk. I’d love to have the equivalent of a ninja file that combines them into a single thing, but I’m not sure how feasible that would be for build2: There are at least two non-CMake build modes for LLVM, but each of them support a subset of the upstream-supported platforms and configurations and it would be a huge amount of work to add a build2-based equivalent.

                                                                    My biggest complaint about CMake at the moment is that, as far as I can tell, there is no easy way of making an exported project that is consistent between installed and included versions of a project. Most of the projects in vcpkg generate a fooConfig.cmake as part of their install step, so the file that find_package uses doesn’t even exist until the dependency has been built and installed. I’d love it CMake provided a consistent way of saying ‘if this file is included, export the target from here that would be generated if this project were installed and expose all of the build steps required to build all of the things that this depends on’. I think this is possible but I have not been able to parse the CMake docs sufficiently to be able to handle it.

                                                                    For better or for worse, CMake+Ninja is increasingly the de-facto standard build system for C++ and (in some ways, more importantly) for projects that are written in multiple languages. The big advantage of classic UNIX Makefiles was the latter point: they weren’t a C build system, they were a build system that could drive a lot of different tools. The down side was that they accidentally depended on a lot of things that turned out to be fairly specific to how C code was built with a traditional UNIX toolchain.

                                                                    These days, most big projects are built from multiple languages. Language-specific build systems are great for simple things but then encounter problems when you want to build, for example, a C# project that depends on a Rust crate and a C++ library, each of which has a load of dependencies. CMake has been gradually growing the set of languages that it supports (though it took a really long time to get Objective-C officially supported, even though it uses exactly the same compiler as C/C++ with a tiny extra set of flags). It looks as if build2 has a module interface for adding other languages, so I should probably take a closer look. Unfortunately, the solution to the problem of too many dependency management and build systems for C++ has so far always been for people to create a new dependency management and build system for C++.

                                                                    I don’t really like CMake but I’d rather use CMake than rewrite all of the existing CMake in something else. CMake evolving into something I’d actually like seems more feasible than something new that I actually like displacing CMake. For example, I like the fact that xmake is built on top of a real language, but adding Lua as an alternative to CMake’s scripting language and exposing the existing functions into Lua is less work than writing a new build system and getting it adopted and would give me an immediate path to adopting a Lua-based build system. I like a lot of the things in Meson and has a way of pulling in CMake projects as dependencies, so might provide a migration path, but a Python dependency in a build system is a show stopper for me.

                                                                    1. 1

                                                                      It looks as if build2 has a module interface for adding other languages, so I should probably take a closer look.

                                                                      It also has ad hoc recipes and pattern rules so for once-off integration of other tools/languages you may not need to go the full-blown build system module route.

                                                                      In fact, with the latest release we have feature parity with GNU make (sans mis-features that we intentionally have no equivalents of) and in many cases with improvements. For example, build2’s pattern rules are regex-based and allow both more precise patterns and multiple stems.

                                                                      1. 1

                                                                        I am also working on a build system, mostly for myself and to support exotic features of my new programming language. But I also like trying to make it as good as possible because I don’t like to do things halfway, even for personal projects. (See FreeBSD’s bc for an example; it’s technically a personal project of mine.)

                                                                        So, that said, if you don’t mind, please explain more about what you mean when you said:

                                                                        I’d love it CMake provided a consistent way of saying ‘if this file is included, export the target from here that would be generated if this project were installed and expose all of the build steps required to build all of the things that this depends on’.

                                                                        For starters, where should the file be included, and where should the target be exported to? To a separate project? Or to make it seem like the project is already installed?

                                                                        The way I read what you said, you’d like the build system to be able to build multiple projects together. Is that correct?

                                                                        1. 4

                                                                          In CMake, there are two ways of using a project:

                                                                          In the version that works well with packaging systems, you build the project and it uses install to install a target. This creates a fooConfig.cmake file for you, which contains all of your exported targets, and allows consumers to find the package with find_package(foo CONFIG). You then add all of the compile, link, whatever flags to your target by just saying that you link with foo (or, more likely, with foo::some_target).

                                                                          In the version that’s typically used with vendored dependencies, you include_directory the child project’s directory. CMake differentiates between internal and external flags, so (assuming the child project’s CMakeLists.txt has sensible encapsulation) you can still use its exported targets from the outer project and it doesn’t pollute your environment with any build flags that are needed to build the project. Generally, this gives you a single Ninja file that builds the child project along with the parent and, in theory, means that you can build only the parts of the child project that are exposed by the parent. In practice, it doesn’t quite give the parallelism that you’d want because CMake can’t (on first build) tell whether your project depends on anything generated by the child project (so you can’t, for example, build a .cc file in the outer project that depends only on .h files in the child until the .a or .so in the child has been built - CMake doesn’t know for sure that one of the build steps in the child doesn’t generate a header).

                                                                          There’s always a bit of a tension in these build systems between implicit and explicit dependencies. In FreeBSD’s terrifying build system, META_MODE uses the kernel’s filesystem access tracking to be able to dynamically discover all of the dependencies (and then lets you commit these files) but I don’t know how well this works with option changes. If you had infinite memory, the right thing to do would be to make the compiler’s VFS layer talk to the build system and if you compiled foo.c tell the build system about any header that it looks for and pause execution if there’s a build rule that would generate it, run that build, and then continue. In practice, this would pause execution and be too slow. The make depend step does this by skipping compilation and just asking the compiler to preprocess things emit what it looked for. This works, but it is very slow. Neither of these approaches works with negative dependencies. If I’m compiling with -Iinc1 -Iinc2 and I #include <foo.h>, if inc2/foo.h exists, the compiler will emit this as a dependency. If a later build step adds inc1/foo.h then this won’t cause recompilation because the build system didn’t track the fact that my file depended on inc1/foo.h not existing.

                                                                          1. 1

                                                                            In practice, it doesn’t quite give the parallelism that you’d want because CMake can’t (on first build) tell whether your project depends on anything generated by the child project

                                                                            That’s because CMake uses the traditional “extract dependencies as part of the compilation” trick, right?

                                                                             

                                                                            The make depend step does this by skipping compilation and just asking the compiler to preprocess things emit what it looked for. This works, but it is very slow.

                                                                            If you don’t throw away the result of preprocessing, this can actually be even faster than the traditional way, see: https://build2.org/article/preprocess-compile-performance.xhtml

                                                                             

                                                                            Neither of these approaches works with negative dependencies.

                                                                            Yes, this is a real PITA. In build2 we have re-mapping mechanisms to deal with this but it’s quite hairy (can elaborate if there is interest). Perhaps the module mapper will be our savior: https://wg21.link/P1842R0

                                                                            1. 2

                                                                              That’s because CMake uses the traditional “extract dependencies as part of the compilation” trick, right?

                                                                              Kind of. CMake does very coarse-grained dependency tracking but generates Ninja files that use that trick for the second build.

                                                                              If you don’t throw away the result of preprocessing, this can actually be even faster than the traditional way, see: https://build2.org/article/preprocess-compile-performance.xhtml

                                                                              That does look interesting. It’s tricky, clang loses a lot of information when it generates preprocessed output, so you’d need to do it on the AST not the rest. In debug builds, adding a comment will change the line numbers, so you have a trade-off between debug info accuracy and incremental build speed. I’m more interested in something like clangd to be able to do this, because it could keep ASTs in memory and serialise them out as required.

                                                                              Yes, this is a real PITA. In build2 we have re-mapping mechanisms to deal with this but it’s quite hairy (can elaborate if there is interest). Perhaps the module mapper will be our savior: https://wg21.link/P1842R0

                                                                              I’d be very interested in this, yes.

                                                                              1. 1

                                                                                It’s tricky, clang loses a lot of information when it generates preprocessed output, so you’d need to do it on the AST not the rest.

                                                                                Both Clang and GCC provide the “partial preprocessing” mode (-frewrite-includes and -fdirectives-only, respectively) that AFAIK, do not suffer any losses (I presume you are talking about diagnostics position information during macro expansion). MSVC doesn’t have such a mode, so it could be a problem.

                                                                                 

                                                                                In debug builds, adding a comment will change the line numbers, so you have a trade-off between debug info accuracy and incremental build speed.

                                                                                Yes, we actually factor the token line number into the hash (it’s not just the debug info, things like assert() will also break).

                                                                                However, even ignoring same-line whitespace changes helps a lot in some cases. For example, between releases, we make sure each commit results in a unique version by factoring the commit id into the pre-release component of semver. We also auto-generate the version.hxx header which contains something like:

                                                                                #define LIBFOO_VERSION 1.2.3-a.0.af23456fd0
                                                                                

                                                                                Which means this line changes with each commit and without this ignorable change detection the whole idea would be impractical (you would end up with essentially a from-scratch rebuild after each commit).

                                                                                 

                                                                                I’d be very interested in this, yes.

                                                                                The solution has the following parts (it’s hairy, as promised). Also, here I assume the worst case scenario: a public auto-generated header in a library (for example, the above mentioned version.hxx) – for private headers some of these steps (in particular, step 3) may not be necessary:

                                                                                1. Firstly, the header should be included as <libfoo/version.hxx> rather than <version.hxx>. This makes sure that it doesn’t get confused with an identically-named header from libbar. In fact, this is a must for any kind of ecosystem of C/C++ libraries and luckily the idea seems to be catching on.

                                                                                2. In libfoo buildfile we put the pair of -I options before any other options (notice the strange looking =+ – that’s a prepend). This makes sure that if the header is already generated, that’s what gets picked up:

                                                                                cxx.poptions =+ "-I$out_root" "-I$src_root"
                                                                                
                                                                                1. But that doesn’t solve everything: the header may be not yet generated but an older version of the library may be installed in, say, /usr/include and the compiler will pick that old header. To overcome this, we add an empty version.hxx into our source directory ($src_root). Which means if the actual header hasn’t been generated, this empty dummy is next in line.

                                                                                  On the build system side we have a re-mapping logic that goes like this: (1) if we have a pair of -I options with the first pointing to a project’s out and the second – to its src, and (2) the compiler picked up a header from this src, and (3) we have a explicit target for the same header in out, then assume this is a missing auto-generated header, generate it, and re-run the preprocessing step.

                                                                            2. 1

                                                                              Thank you! I think I understand now.

                                                                              You want the child to export all possible targets and then the parent to pick those up during the configure step, so that the parent is aware of all targets during the build, which would give more parallelism. I think I can do that.

                                                                              I also love your idea of “negative dependencies,” including the term! I will be using that, if you don’t mind (with credit, of course, however you would like it).

                                                                              More generally, as bazel has shown, the best build systems are ones that force implicit dependencies to be explicit, so I think it will be important to actually track those negative dependencies. Somehow.

                                                                              Build systems are far more complicated than people think, including me before I started this project. And I only have a requirements list at this point…

                                                                              1. 2

                                                                                You want the child to export all possible targets and then the parent to pick those up during the configure step, so that the parent is aware of all targets during the build, which would give more parallelism. I think I can do that.

                                                                                Yes, exactly. CMake can do this, it just doesn’t do it the same way as for targets that you’ve exported, which I find a bit frustrating. Over time, we’re moving everything to CMake + vcpkg, so it doesn’t matter in the long term (everything will end up being built once, cached, and installed with vcpkg so we won’t be doing the submodule / vendored dependency thing) but there’s a long transition process where we want to consume the same library in both ways that’s a bit annoying.

                                                                                I also love your idea of “negative dependencies,” including the term! I will be using that, if you don’t mind (with credit, of course, however you would like it).

                                                                                Not my term. I think it came from the project Simon Peyton-Jones was doing on build systems a few years back but there may be older uses.

                                                                                Build systems are far more complicated than people think, including me before I started this project. And I only have a requirements list at this point…

                                                                                Honestly, it makes me a bit sad when I hear about new build systems. CMake isn’t my ideal build system, but it is something that could be incrementally evolved into my ideal build system and that has a lot more of a chance of becoming a universally adopted build system than something new.

                                                                                If I had the time, my first task would be to decouple the scripting language in CMake from the implementations. The scripting language started as a macro language and it shows but it would be fairly easy to replace it with Lua. Once you’ve done that, you can start replacing some of the hard-coded bits (for example, the description of toolchains) that need to be written in C with Lua that can be overridden at compile time.

                                                                                The abstractions for importing and exporting projects are fairly sensible (CMake 3 was almost a completely different system to CMake 2 just from these changes, but it provided an incremental adoption path) and there are a huge number of incredibly useful CMake packages that you’d need to recreate to even get to parity, let alone be better, and a load of built-in things such as generating XCode or Visual Studio projects, which are a show-stopper for a lot of folks but which are a lot of tedious work to do..

                                                                                1. 2

                                                                                  The abstractions for importing and exporting projects are fairly sensible (CMake 3 was almost a completely different system to CMake 2 just from these changes, but it provided an incremental adoption path) and there are a huge number of incredibly useful CMake packages that you’d need to recreate to even get to parity, let alone be better, and a load of built-in things such as generating XCode or Visual Studio projects, which are a show-stopper for a lot of folks but which are a lot of tedious work to do..

                                                                                  To further amplify this advantage, Clion and Visual Studio now both natively consume CMakeLists.txt files. It’s the only project format for Clion.

                                                                                  This (begins to address) one of my biggest annoyances with cmake, which is that there’s no way to do the moral equivalent of shipping a tarball where autogen has already been run. If Xcode starts bundling cmake and eating CMakeLists.txt natively, flawless victory is within reach.

                                                                                  1. 2

                                                                                    This (begins to address) one of my biggest annoyances with cmake, which is that there’s no way to do the moral equivalent of shipping a tarball where autogen has already been run.

                                                                                    There isn’t really even with autotools. You still pick up a run-time dependency on GNU Make (CMake targets POSIX Make so works with whatever you have), bash (the output of autoconf is generally not portable POSIX shell, and even if it were then that doesn’t help on non-POSIX platforms, such as Windows), and often on libtool and other things. In contrast, CMake is self-contained and will generate whatever your platform’s preferred build system is. I generally end up using Ninja everywhere, but that’s another small self-contained project that is packed for every OS I have tried.

                                                                                    1. 1

                                                                                      I know it’s not really universal even with autotools. But it was really nice for a lot of systems I cared about. Thinking back on it, I think bakefiles had as much to do with me wanting cmake to work that way as autotools did.

                                                                                      I don’t consider cmake an onerous build dependency, at all, but have found that it causes friction getting people to try my stuff out before. That is decreasing over time, as it gets used for more and more things.

                                                                                  2. 1

                                                                                    Not my term. I think it came from the project Simon Peyton-Jones was doing on build systems a few years back but there may be older uses.

                                                                                    I’ve read that paper and thoroughly digested it, and I don’t remember that ever coming up…but thank you. I’ll look for where it appeared.

                                                                                    Honestly, it makes me a bit sad when I hear about new build systems. CMake isn’t my ideal build system, but it is something that could be incrementally evolved into my ideal build system and that has a lot more of a chance of becoming a universally adopted build system than something new.

                                                                                    I understand. I agree that there are too many build systems out there.

                                                                                    Personally, while I think a better language would make CMake vastly better, and that such a change could happen, the more I study build systems, include “Build Systems à la Carte,” the more I think CMake has other design flaws that cannot be readily addressed.

                                                                                    But to make you happier, this build system is getting built for features in my language, not for a desire to have another build system. Specifically, in my language, a user can have plugins for the compiler to define new keywords. Because it basically requires dynamic dependencies, doing so in CMake would be…hard, if not impossible.

                                                                                    And that’s why I don’t think CMake can be rescued long-term. In fact, I don’t even know if it’s possible to implement what you want in CMake without dynamic dependencies, though I very much acknowledge that I could be wrong and would love to be corrected.

                                                                                    By the way, I could use Shake as my build system, but pulling in Haskell seems…a bit much.

                                                                                    1. 2

                                                                                      I’ve read that paper and thoroughly digested it, and I don’t remember that ever coming up…but thank you. I’ll look for where it appeared.

                                                                                      Unfortunately the paper doesn’t give more than an overview of some of the things that the project did. This fed into the internal build system that Windows uses. This does a lot of caching in the cloud and so negative dependencies are a huge problem.

                                                                                      But to make you happier, this build system is getting built for features in my language, not for a desire to have another build system. Specifically, in my language, a user can have plugins for the compiler to define new keywords. Because it basically requires dynamic dependencies, doing so in CMake would be…hard, if not impossible.

                                                                                      Clang and GCC both also have plugins that can affect code generation, so I don’t think this problem is unique. I understand that CMake doesn’t support dynamic dependencies, I don’t know why you think that it couldn’t evolve support for them.

                                                                                      1. 1

                                                                                        Clang and GCC both also have plugins that can affect code generation, so I don’t think this problem is unique.

                                                                                        No, it’s not a unique problem to my language. In fact, I would argue that since the headers included by source files can be different based on certain conditions at build time, that C/C++ source files have dynamic dependencies, albeit they are dynamic dependencies that are sort of able to be handled by CMake.

                                                                                        But while it’s not a unique problem to my language, I want to solve it in what I believe is the right way.

                                                                                        I understand that CMake doesn’t support dynamic dependencies, I don’t know why you think that it couldn’t evolve support for them.

                                                                                        I believe that the whole “configure, then build” design is antithetical to dynamic dependencies. You can hack it in by detecting a situation where a target has different dependencies, running configure again, and then restarting the build. But it’s only a hack and can significantly increase the build time, especially if there are multiple targets with changed dependencies because either you will repeat the whole build -> reconfigure -> build process multiple times or hack in calculating the dynamic dependencies of every target in the reconfigure stage.

                                                                                        As long as CMake is a “meta-build system,” i.e., it generates the data used by another “dumb” build system, it will, in my opinion, never be able to support dynamic dependencies well (as in, able to always build correctly) nor efficiently because a build system with dynamic dependencies has to be able to start executing a target, calculate its dependencies, suspend the target, execute the dependencies, and come back. This is something a meta-build system and dumb build systems cannot do.

                                                                                        1. 3

                                                                                          As long as CMake is a “meta-build system,” i.e., it generates the data used by another “dumb” build system, it will, in my opinion, never be able to support dynamic dependencies well (as in, able to always build correctly) nor efficiently because a build system with dynamic dependencies has to be able to start executing a target, calculate its dependencies, suspend the target, execute the dependencies, and come back. This is something a meta-build system and dumb build systems cannot do.

                                                                                          I think there’s a lot riding on the ‘well’ part there. CMake builds can effectively use tail recursion, where a Ninja (or whatever the build tools is) can finish an incremental build step that then invoke cmake, which then re-invokes ninja after updating the ninja build file. This pattern already exists in CMake in a few contexts (or a variant where cmake or some other tool is run to generate dependency information as an explicit dependency of a build step). There are also some special cases of this, where CMake produces coarse-grained dependency information and then, during the build, emits some other files that Ninja consumes for finer-grained dependency information.

                                                                                          To me, this boils down to two questions:

                                                                                          • Can that flow represent arbitrary dynamic dependencies?
                                                                                          • Does it incur a significant performance penalty compared to merging the configure and build steps that makes it infeasible in practice, even if it is possible?

                                                                                          I haven’t seen any compelling evidence in either direction yet.

                                                                                          1. 1

                                                                                            I think there’s a lot riding on the ‘well’ part there….

                                                                                            I agree, and that’s why I defined it to mean “able to do the build correctly.” All it takes to prove my point is one example where CMake has been given the ability to calculate all of the dependencies in its tail recursion fashion and still gets the build wrong, whether during a clean build or an incremental build.

                                                                                            I don’t have an example yet, but I’m also no CMake expert. I’ll see if I can find one.

                                                                                            To me, this boils down to two questions:

                                                                                            • Can that flow represent arbitrary dynamic dependencies?
                                                                                            • Does it incur a significant performance penalty compared to merging the configure and build steps that makes it infeasible in practice, even if it is possible?

                                                                                            To be honest, I think it can represent arbitrary dynamic dependencies, but more research needed. If it can represent arbitrary dependencies, then it is plausible that tail recursive CMake would be able to have correct builds.

                                                                                            As for the performance penalty, that also needs testing, but I would guess that it can be significant.

                                                                                            Another significant factor, I believe, is the amount of work it would take to do that.

                                                                                            And yet another significant factor is the separation of the builds in a tail recursive CMake. Technically, the reinvocation of ninja is a separate build, though in some sense (the fact that the CMake reconfigure and its ninja invocation are child processes of the main build), it is still the same build. But then you basically have the problems of recursive Make. (I know you probably know about that paper; I’m linking to it just in case.)

                                                                                            The biggest problem of such a system that I see is that it would then become very hard to export targets and dependencies in the way that you mentioned upthread. Could it still be possible? I think so, but I suspect it would be manual, tedious, and hard in the sense that details would matter and be hard to get right.

                                                                                            However, I fully acknowledge that I have no hard numbers. I could do some tests on CMake with some fake builds and write a blog post about it. Would you want to see such a blog post?

                                                                                            So tl;dr: I think it may be possible to get build correctness in CMake with dynamic dependencies, but the cost of maintaining such build scripts and in increased build times would not be worth it.

                                                                                            1. 3

                                                                                              Would you want to see such a blog post?

                                                                                              Text does not allow me to express ‘yes’ as enthusiastically as I want to in response to this question.

                                                                                              So tl;dr: I think it may be possible to get build correctness in CMake with dynamic dependencies, but the cost of maintaining such build scripts and in increased build times would not be worth it.

                                                                                              So, my follow-on question is whether you could incrementally improve CMake to the point where this would be possible. If you start from a design that makes all of this easy, is there an incremental set of changes that you could apply to CMake that would allow it to evolve there? Remember that CMake has something of a history of filling in gaps in existing build systems by adding CMake tools that the build systems can invoke for things that they can’t do.

                                                                                              In my experience, incrementally improving a thing that people use is orders of magnitude more likely to result in widespread adoption than providing something new that is better.

                                                                                              1. 1

                                                                                                So, my follow-on question is whether you could incrementally improve CMake to the point where this would be possible. If you start from a design that makes all of this easy, is there an incremental set of changes that you could apply to CMake that would allow it to evolve there?

                                                                                                It depends on how you would evolve it.

                                                                                                Remember that CMake has something of a history of filling in gaps in existing build systems by adding CMake tools that the build systems can invoke for things that they can’t do.

                                                                                                If you evolve CMake like you suggest, no, it’s not possible. The evolution you are describing assumes that CMake will continue to be just a meta-build system. That will not work because the “configure then build” design just cannot work. Adding new tools to CMake makes a better leopard, but a leopard cannot change his spots. And it is necessary for not just a leopard with different spots, but a completely new animal: a tiger, per se.

                                                                                                What is needed is a build system where there is no configure step; instead, there would be just a build step that can execute arbitrary code, including registering targets. And those targets also need to execute arbitrary code. These two requirements mean two things:

                                                                                                1. You cannot know the dependencies of a target ahead of its execution, and
                                                                                                2. You cannot know all of the targets that will be used in the build ahead of the build itself.

                                                                                                These two things are the reason a meta-build system can’t just add tools because it assumes these two things are not true, generates and sets the build in stone, and passes it off to a dumb build system.

                                                                                                However, if CMake were to get rid of its “configure, then build” design and build straight from CMakeLists.txt, it might be possible.

                                                                                                Honestly, CMake’s language is bad enough that I would think it’s worth it to provide something new and strive for widespread adoption. The problem is deciding which programming language to use as the language for the build system because I personally would want the build system bootstrapped from C, leading to as few dependencies as possible.

                                                                                                To make widespread adoption, I don’t think it would be too hard.

                                                                                                Near where I live, a woman started a business making kolaches from 6am to noon. She did it to put herself through business school, and like any good businesswoman, she had a plan to bootstrap repeat customers.

                                                                                                Her first two weeks that she was open, all kolaches were free, although they were mini-kolaches.

                                                                                                Well, in a college town, free food is definitely a big hit, so people lined up for two weeks. And it turns out that many of them liked the kolaches.

                                                                                                At the end of two weeks, she had enough repeat customers to have lines out the door for half a block and sold out every morning. It was a bunch of extra up-front work for her, but now she has opened multiple locations around the area, and it is how she makes a living.

                                                                                                Similarly, I could see a new build system winning with a similar strategy. After making the build system good and stable, the author of the build system could find important projects using autoconf, redo their build scripts into the build system, and offer the new build scripts for free. Bonus points if the new build scripts are commented in such a way that they are easily understood and added to. And this would have two good side benefits:

                                                                                                1. The disruption to projects would be as little as possible because no effort from the project would have been spent on the conversion, and
                                                                                                2. The project’s mainainers would have the experts on the build system right there to answer questions as the need arises.

                                                                                                Not all projects will take up the offer, but because autoconf was almost dead and only recently revived, it does not seem wise to continue to bet on it, so I think there would be a critical mass of projects that would take the offer.

                                                                                                And once a critical mass takes the offer, it would snowball.

                                                                                                So basically, the strategy that would work would be to lay the responsibility of converting projects on the shoulders of the authors of the new build system, to do free work for projects.

                                                                                                Of course, like the kolach success depended on people liking them, such a strategy depends on people also liking the new build system better than autoconf. But that seems like a low bar.

                                                                                                Another reason I think this will work is because it worked for me when I wrote FreeBSD’s bc. I did a lot of work packaging the bc for various Linux distros and even helped with the FreeBSD port. Now, it’s used quite a bit, and I think the market for a bc is far smaller than the market for a build system.

                                                                                                (I will admit that in the case of FreeBSD, I also happened to run into a FreeBSD contributor that is interested in math algorithms and bc. It was fortuitous, but the point of such work would be to make such fortuitous chances more likely.)

                                                                                                Edit: Forgot to mention that I will get working on that blog post as soon as I can. I don’t know how useful it will be until I have a build system to compare it to, though…

                                                                                                1. 2

                                                                                                  Similarly, I could see a new build system winning with a similar strategy. After making the build system good and stable, the author of the build system could find important projects using autoconf, redo their build scripts into the build system, and offer the new build scripts for free.

                                                                                                  That largely worked for CMake, but the landscape has changed. CMake now has strong network effects because it’s now also exporting targets and displacing pkg-config as the way of exposing new projects. I don’t just use CMake because I want to build with CMake, I use CMake because I want to pull in a load of dependencies that export CMake targets and because I want to export targets for CMake to be able to consume, especially via things like vcpkg which have infrastructure that makes it trivial to generate packaged libraries for anything on GitHub that builds using CMake. Meson has some support for importing dependencies exported from CMake projects but I’m not sure if it goes both ways.

                                                                                                  So basically, the strategy that would work would be to lay the responsibility of converting projects on the shoulders of the authors of the new build system, to do free work for projects.

                                                                                                  Maintaining the build system for LLVM is almost a full time job. LLVM’s CMake build system is one of the most complex that I’ve seen, but the effort of rewriting the 100 most popular CMake projects’ build systems is tens of person years of effort.

                                                                                                  Another reason I think this will work is because it worked for me when I wrote FreeBSD’s bc. I did a lot of work packaging the bc for various Linux distros and even helped with the FreeBSD port. Now, it’s used quite a bit, and I think the market for a bc is far smaller than the market for a build system.

                                                                                                  It’s also a very self-contained program. Replacing the bc implementation in FreeBSD does not require anyone else to rewrite anything or to maintain any changes. It’s an easy win for FreeBSD to have a better bc. Since you bring up FreeBSD, take a look at the FreeBSD build system: even figuring out what it’s actually doing now is the point at which I gave up in the effort to replace it with something more modern and maintainable. FreeBSD’s build system is entirely self-contained and few other things depend on it, so it could be replaced with something else if there were similar advantages. Replacing it with CMake is attractive because LLVM and a bunch of other things in contrib use CMake and so FreeBSD could stop maintaining a parallel build system for them and just wrap import their CMake builds.

                                                                                                  1. 1

                                                                                                    Those are all good points, and I already have ideas about how to answer for them should I decide to try to take over the world, including being able to import CMake targets and export targets to CMake and other build systems.

                                                                                                    However, at this point, I think I’m okay with just making my build system the best I can and maybe seeing if autoconf projects want to switch because while what you say seem to be true about CMake, I think it might be good if autoconf dies.

                                                                                                    Thank you for the conversation. I’ll keep working on that blog post and post it on this website when I publish it.

                                                                                          2. 2

                                                                                            As long as CMake is a “meta-build system,” i.e., it generates the data used by another “dumb” build system, it will, in my opinion, never be able to support dynamic dependencies well (as in, able to always build correctly) nor efficiently because a build system with dynamic dependencies has to be able to start executing a target, calculate its dependencies, suspend the target, execute the dependencies, and come back. This is something a meta-build system and dumb build systems cannot do.

                                                                                            I broadly agree and I think it’s part of a more general (and counter-intuitive) observation: “divide and conquer” does not work well for build systems. Any kind of attempt to aggregate the build graph (e.g., recursive make) or segregate a build step (e.g., a meta build system) to make things more manageable leads to the inability to do things correctly and/or efficiently.

                                                                                            1. 1

                                                                                              Agreed.

                                                                                              More generally, build systems should try to do the opposite: unify and conquer.

                                                                                    2. 2

                                                                                      More generally, as bazel has shown, the best build systems

                                                                                      … since when is bazel considered the “best build system” ? It’s terrible when compared to CMake

                                                                                      1. 1

                                                                                        It’s not, but it has something that the best build systems should strive for: sandboxed builds.

                                                                                        1. 1

                                                                                          Do you know, in specific technical terms, what is a “sandboxed build” in Bazel? This term along with “hermetic builds” get thrown around a lot when Bazel comes up but my previous attempts to understand what exactly they mean didn’t lead to anything specific. So would appreciate any pointers.

                                                                                          1. 1

                                                                                            I’d start at https://docs.bazel.build/versions/main/hermeticity.html, but the tl;dr is that Bazel will copy all files that are dependencies to a completely new directory structure that matches the real source tree. Then it does the build there.

                                                                                            If the build works, you know all of the dependencies were properly declared. If not, there is one or more implicit dependencies that you need to make explicit.

                                                                                            There’s more detail than that, but that’s the gist. It’s really only a way to discover hidden dependencies, which can then be declared. After they are, it’s easier to get reproducible builds.

                                                                          1. 4

                                                                            This post obliquely hints at two of my pet peeves with ZFS:

                                                                            • The ZFS kernel interface allows you to atomically create a set of snapshots. This is fantastic for package updates if you want, for example, to keep /usr/local/etc and /usr/local/ in separate datasets so that you can back up the config files easily but not bother backing up the whole thing. This feature is so useful that the authors of the zfs command-line tool decided not to expose it to users.
                                                                            • The allow mechanism is incredibly coarse-grained. If I want to say ‘my backup user can create snapshots and delete them, but can’t delete snapshots that it didn’t create’ then I’m out of luck: that’s not something that the permissions model can express. Snapshots don’t come with owners. My backup system should be trusted with preserving the confidentiality of my data (it can, after all, read the data unless I’m backing up encrypted ZFS datasets directly) but it shouldn’t have the ability to destroy the datasets that it’s backing up. Yet that’s not something that zfs allow can express.
                                                                            1. 3

                                                                              Snapshotting /usr/local or /usr/local/etc can be sometimes useful but ZFS Boot Environments is whole a lot better - securing entire system and also both /usr/local and /usr/local/etc - more on that here:

                                                                              https://vermaden.files.wordpress.com/2018/11/nluug-zfs-boot-environments-reloaded-2018-11-15.pdf

                                                                              1. 1

                                                                                Snapshotting /usr/local or /usr/local/etc can be sometimes useful but ZFS Boot Environments is whole a lot better

                                                                                These solve different problems. BEs clone the base system. I want to snapshot /usr/local so that I can roll back if pkg upgrade goes wrong. I want a BE so that I can roll back if an upgrade of the base system goes wrong (and it’s super useful - I’ve occasionally made changes to libc that work fine in some tests, done a make installworld and after a reboot discovered that I broke something that init needs - being able to switch back to the old BE from loader is fantastic).

                                                                                You’ll note from that presentation that beadm is written in shell and uses zfs snapshot. This means that, as with the /usr/local/etc case, beadm requires zpool/ROOT to be either a single dataset or a tree of datasets. A BE doesn’t contain /usr/local. bectl is part of the base system and is actually written in C as a thin wrapper around libbe (also in the base system). This actually could snapshot multiple independent filesystems atomically (it uses libzfs, which wraps the ZFS ioctls, which take an nvlist of filesystems to snapshot), it just doesn’t.

                                                                                If you do bectl list -a then it will show exactly which filesystems are snapshotted. If you then compare this against the output of mount then you’ll see that there are a lot that are not in the tree of any BE. This is usually a useful feature: you can use BEs to test the same set of packages against different base system versions, you don’t normally want to roll back home directories if you discover problems in an update, and so on.

                                                                                1. 3

                                                                                  These solve different problems. BEs clone the base system. I want to snapshot /usr/local so that I can roll back if pkg upgrade goes wrong. I want a BE so that I can roll back if an upgrade of the base system goes wrong (and it’s super useful - I’ve occasionally made changes to libc that work fine in some tests, done a make installworld and after a reboot discovered that I broke something that init needs - being able to switch back to the old BE from loader is fantastic).

                                                                                  The default FreeBSD installation on ZFS INCLUDES also /usr/local for the BE. This is what I am trying to tell you.

                                                                                  You’ll note from that presentation that beadm is written in shell and uses zfs snapshot. This means that, as with the /usr/local/etc case, beadm requires zpool/ROOT to be either a single dataset or a tree of datasets. A BE doesn’t contain /usr/local. bectl is part of the base system and is actually written in C as a thin wrapper around libbe (also in the base system). This actually could snapshot multiple independent filesystems atomically (it uses libzfs, which wraps the ZFS ioctls, which take an nvlist of filesystems to snapshot), it just doesn’t.

                                                                                  I know because I am the author of beadm(8) command.

                                                                                  Both bectl(8) and beadm(8) require zpool/ROOT approach … and yes BE by default contains /usr/local directory. The beadm(8) uses zfs snapshot -r command that means that it does RECURSUVE snapshot. The same is done in bectl(8). Does not matter that bectl(8) uses libbe. They work the same.

                                                                                  If you do bectl list -a then it will show exactly which filesystems are snapshotted.

                                                                                  Please see the presentation again - especially the 42/43/44 pages which tell you exactly the info you need. The /usr/local IS INCLUDED in BE with the default FreeBSD install on ZFS.

                                                                                  1. 1

                                                                                    The default FreeBSD installation on ZFS INCLUDES also /usr/local for the BE. This is what I am trying to tell you.

                                                                                    The default ZFS install doesn’t put /usr/local in a separate ZFS dataset. This is one of the first things that I need to fix on any new FreeBSD install, before I install any packages. In the vast majority of cases, I don’t want /usr/local to be in my BE because if I change something in a package config and then discover I need to roll back to a prior BE then I’d lose that change. In my ideal world, /etc would not contain system-provided rc.d scripts, defaults, or any of the other system-immutable things that have ended up there and /etc would be omitted from the BE as well so that I didn’t lose config changes on BE rollback, but that’s not possible while /etc needs to be mounted before init runs and while it contains a mix of user- and system-provided things.

                                                                              2. 1

                                                                                Newish ZFS user here.

                                                                                What do you mean by:

                                                                                The ZFS kernel interface allows you to atomically create a set of snapshots.

                                                                                Specifically, what is a “set of snapshots”?

                                                                                1. 2

                                                                                  Not really a ZFS thing, actually - more of an ACID or data integrity concept. Each ZFS volume is a completely separate entity; it is configured separately, managed separately, and even has its own isolated IO (you have to copy-and-delete the entirety of a file to copy it to a different dataset even if it’s on the same zpool).

                                                                                  A regular snapshot doesn’t make any atomicity guarantees with regards to a snapshot of a different ZFS dataset: if your app writes to a ZFS dataset at /var/db/foo.db and logs to the separate ZFS dataset at /var/log/foo, if you snapshot both “regularly” and then restore, you might find that the log references data that isn’t found in the db, because the snapshots weren’t synchronized. An atomic set of snapshots would not run into that.

                                                                                  (But I thought recursive snapshots of / would give you atomic captures of the various child datasets, so it’s exposed in that fashion, albeit in an all-or-nothing approach?)

                                                                                  1. 1

                                                                                    I want to do zfs snapshot zroot/usr/local@1 zroot/usr/local/etc@1 zroot/var/log@1 or similar. It turns out I can do this now. Not sure when I was added, but very happy that it’s there now.

                                                                                  2. 1

                                                                                    This feature is so useful that the authors of the zfs command-line tool decided not to expose it to users.

                                                                                    Is that not what zfs snapshot -r does? (note the -r). I think it’s supposed to create a set of snapshots atomically. Granted, they all have to be descendants of some dataset, but that’s not necessarily a big issue because the hierarchy of ZFS datasets need not correspond to the filesystem hierarchy (you can set the mountpoint property to mount any dataset in whatever path you want).

                                                                                    Also, I think ZFS channel programs also allow you to do that atomically, but with a lot more flexibility (e.g. no need for the snapshots to be descendants of the same dataset, and you can also perform other ZFS administration commands in-between the snapshots if you want), since it basically allows you to create your own Lua script that runs at the kernel level, atomically, when ZFS is synchronizing the pools. See man zfs-program (8).

                                                                                    1. 1

                                                                                      -r creates a set of snapshots of a complete tree. It doesn’t allow you to atomically create a set of snapshots atomically for datasets that don’t have a strict parent-child relationship. For example, with Boot Environments, / is typically zroot/ROOT/${current_be_name} and /usr/local is zroot/usr/local so you can’t snapshot both together with the command-ine tool.

                                                                                      The ioctl that this uses doesn’t actually doesn’t do anything special for recursive snapshots. It just takes an nvlist that is a list of dataset names and snapshots them all. When you do zfs -r, the userspace code collects a set of names of datasets and then passes them to the ioctl. This is actually racy because if a dataset is created in the tree in the middle of the operation then it won’t be captured in the snapshot, so a sequence from another core of ‘create child dataset’ then ‘create symlink in parent to file in child’ can leave the resulting snapshots in an inconsistent state because they’ll capture the symlink but not the target. In practice, this probably doesn’t matter for most uses of -r.

                                                                                      Channel programs do allow this, but they don’t compose well with other ZFS features. In particular, they’re restricted to root (for good reason: it’s generally a bad idea to allow non-root users to run even fairly restricted code in the kernel because they can use it to mount side-channel attacks or inject handy gadgets for code-reuse attacks. This is why eBPF in Linux is so popular with attackers). This means that you can’t use them with ZFS delegated administration. It would be a lot better if channel programs were objects in some namespace so that root could install them and other users could then be authorised to invoke them.

                                                                                      On the other hand, the kernel interface already does exactly what I want and the libzfs_core library provides a convenient C wrapper, I’m just complaining that the zfs command-line tool doesn’t expose this.

                                                                                      1. 3

                                                                                        Actually, it looks as if I’m wrong. zfs snapshot can do this now. I wonder when that was added…

                                                                                        1. 1

                                                                                          It would be a lot better if channel programs were objects in some namespace so that root could install them and other users could then be authorised to invoke them.

                                                                                          Can’t you create a script as root and then let other users invoke it with doas or sudo?

                                                                                          1. 1

                                                                                            Not from a jail, no (unless you want to allow a mechanism for privilege elevation from a jail to the host, which sounds like a terrible idea). In general, maybe if you wanted to put doas or sudo in your TCB (I’d be more happy with doas, but sudo‘s security record isn’t fantastic). But now if a program wants to run with delegated administration and provide a channel script it also needs to provide all of this privilege elevation machinery and there are enough fun corner cases that it will probably get it wrong and introduce new security vulnerabilities. Oh, and the channel program code doesn’t know who it’s running as, so you end up needing some complex logic to check the allow properties there to make sure that the channel program is not run by a user who shouldn’t be allowed to use it.

                                                                                            I’d love to see installing and running channel programs completely separated so that only unjailed root could install them (though they could be exposed to jails) and then any user could enumerate them and invoke them with whatever parameters they wanted, but the channel program then runs with only the rights of that user, so they couldn’t do anything with a channel program that delegated administration didn’t let them do anyway.

                                                                                    1. 10

                                                                                      The saddest thing about this post is that it takes as axiomatic, without any thought, the idea that the core will be Linux. Today, GNOME runs well on *BSD and I think it even runs on Hurd these days but there’s a gradual trend towards Linux-only technologies. For example:

                                                                                      For example, Flatpak, love it or hate it, is an easy way to install any app on any platform

                                                                                      Sure, as long as your platform is either built on a Linux kernel or provides a VM that can run a Linux kernel. Otherwise… not so much.

                                                                                      Linux, *BSD, Solaris, and so on are all based on ‘70s minicomputer abstractions that have been gradually stretched to breaking point for modern hardware. If you give up even supporting a handful of kernels with almost identical abstractions then you have absolutely no chance of running on anything that’s designed for modern hardware. Once the Fuchsia + Flutter stack is a bit more mature, I can see it completely eating GNOME’s lunch and GNOME at that point being completely impossible to run on Fuchsia in anything other than a Linux ABI compat layer, where it’s segregated away from any platform-integration services and completely sidelined.

                                                                                      When I started writing Free Software, portability was a badge of code quality in the community. Now it’s seen as too much effort, a mindset led by companies like Red Hat and Google that don’t want developers to put effort into platforms that they don’t control.

                                                                                      1. 2

                                                                                        Totally agree, but this is one of those ‘inherent to human nature’ situations. Most people who run FLOSS operating systems run Linux. I’d wager that the author has VERY little experience with folks running anything but on the desktop.

                                                                                        Therefore, EVERYONE runs Linux, right?

                                                                                        1. 2

                                                                                          You are absolutely right, and I would like to add:

                                                                                          • Portability gives users more choice. If your program is their critical one, and it is portable to all platforms, they can change platforms with impunity. So the author’s very thesis argues against Linux-centrism.
                                                                                          • You alluded to this, but the corollary is that the more software is portable, the more freedom users have to get out of user-hostile platforms. If all programs were portable, Microsoft, Google, and Apple wouldn’t be able to have walled gardens because if they started bring user-hostile, users would just move.

                                                                                          Portability needs to make a comeback as a lauded attribute of software. I could write a blog post about this…

                                                                                        1. 2

                                                                                          I am a wannabe language designer.

                                                                                          This blog post explains to me why Python feels so much more right than other languages. It seems Guido understands, at least intuitively, something I learned the hard way: consistency makes good programming language design.

                                                                                          If a language uses something in two different places and they do two different things, from the perspective of the user, that is not consistent. A good example is static in C. It can mean “private to this file” when applied to a function and “private to this function and a global variable” when applied to a variable inside of a function.

                                                                                          Another thing he gets absolutely right is this:

                                                                                          Features of a programming language, whether syntactic or semantic, are all part of the language’s user interface.

                                                                                          People don’t think of programming languages as having a user interface, but they do because users use them. In fact, a programming language could be thought of as a user interface to a programming toolbox.

                                                                                          As such, programming language design is not about features (though they are important), it’s about presentation of orthogonal, composable tools in an interface that is intuitive and consistent to the user.

                                                                                          1. 1

                                                                                            I could imagine that a competitor to Qt and Electron could win if it had built in capabilities for GUI automation from the beginning because not only would it make it easier for users to automate tasks, it would make it easier to implement completely automatic testing of GUI’s.

                                                                                            However, if such a library appears, I would also like it to have:

                                                                                            • Accessibility
                                                                                            • Localization
                                                                                            • Internationalization
                                                                                            • Animation
                                                                                            • Multi-touch
                                                                                            • Pressure (trackpads)
                                                                                            • Input Methods

                                                                                            from the beginning. (See this blog post.)

                                                                                            Looking at that list, I think I know why no competitors have appeared…

                                                                                            1. 3

                                                                                              This is the type of blog post that should belong in some piece of documentation IMHO.

                                                                                              I don’t need this right now, but when I will need it, Duckduckgo/Google will give me SEOized search results which will be crappy, and I will never find this high quality piece of content again :(

                                                                                              1. 3

                                                                                                There’s this thing called a ‘bookmark’, it’s built in to your browser and can be synced between machines as well. :-)

                                                                                                1. 2

                                                                                                  I think it would be easier for me to find it in some crappy SEOized search results, than the bookmark mess I have accumulated over the years. :-)

                                                                                                  1. 1

                                                                                                    I understand. I had the same situation about two years back.

                                                                                                    One thing that helped me was putting all of my bookmarks into a text document with titles. I use Markdown, for several reasons:

                                                                                                    • I can open it as a pinned tab and have it display nicely.
                                                                                                    • All of the bookmarks can be links, allowing me to open them from that pinned tab.
                                                                                                    • And most importantly, I can use Markdown headings and subheadings to break my bookmarks down into categories and subcategories.

                                                                                                    I have an OpenBSD subcategory, and that’s what the bookmark for this article went into.

                                                                                                    If your bookmarks are in Chrome (or one of its descendants), you can export your bookmarks as JSON, and from there, you can turn them into a Markdown list.

                                                                                                  2. 1

                                                                                                    Right. But how much of this content wasn’t posted on lobste.rs, and I never stumbled across it, and am therefore missing?

                                                                                                1. 6

                                                                                                  I wonder if this is more or less the end of the road for the librem 5?

                                                                                                  This offers similar openness, same software capability and slightly better specs (I think…). And hardware privacy switches too. For half the price, assuming Chinese assembly meets your requirements. (If you need a US-manufactured device, Librem offers that for $2000-ish, and Pine does not.)

                                                                                                  And given the respective companies’ track records, it seems likely the ’Pro will ship in quantity well before the listed 52-week lead time on the non-US-manufactured Librem 5.

                                                                                                  I was on the fence between replacing my iPhone’s battery or just replacing the whole phone. I think this announcement has pushed me toward replacing the battery and revisiting in 8 - 12 months to see if this has developed into something that could be a daily driver for me.

                                                                                                  1. 14

                                                                                                    Purism is targeting a different market; they’re trying to make an ecosystem, a readymade device, that someone can use out of the box and be satisfied with. I don’t think they’re doing all too well with it, but it’s the intent that counts. What Pine does is make tinker toys for engineers. They save money on the device by punting software engineering to the users. (The battery management part made me queasy.)

                                                                                                    1. 7

                                                                                                      I agree with your characterizations of the two intents. What I meant to do in my comment is question whether, given that the software works as well on the pinephone as it does on the Librem, has Pine backdoored their way into (soon) hitting that “someone can use out of the box and be satisfied with” goal for technical users better than Purism has even though they were aiming for something else entirely.

                                                                                                    2. 8

                                                                                                      A big difference for me is that the L5 has privacy switches that are usable. That is, I want the camera and microphone off until I’m receiving a call, then I can flip the switch and answer. With the pinephone (and it looks like the pinephone pro) the switches are buried inside the back which make them interesting but not very usable in day-to-day life.

                                                                                                      Another point as mentioned in other comments is that Purism is funding the software development to make the ecosystem. Pinephone gets the benefit of that without much of the cost. I hope both succeed so that there is a gradient of capability from the low end to the high end, and a real move off of the duopoly we have now.

                                                                                                      1. 4

                                                                                                        Interesting point about the switches.

                                                                                                        I think Pine has done better than Purism working to get drivers for their components supported by the upstream kernel. I think they’ve also done better getting help out to the various distributions when it comes to supporting the platform. By not having their own distro but getting hardware into developers’ hands, there is a whole ecosystem now. I think if it had been left to purism, you’d have one distro (PureOS) whose development is mostly done behind closed doors plus a couple nice contributions to mobile GNOME.

                                                                                                        In particular, they seemed to have zero interest in upstreaming the PureOS kernel patchset before Pine came along.

                                                                                                        I also hope both succeed, but I’m glad to see a wide-open development model making more of a play.

                                                                                                        1. 7

                                                                                                          The development of PureOS appears to be done in the open; the contribution of libhandy for GNOME was essential to making most of the apps work well in that form factor, and Purism have been supportive of KDE and UbuntuTouch as well. Not sure where the impression of “zero interest in upstreaming the PureOS kernel patchset” comes from or that the pinephone had an influence on that… my impression was the opposite. It’s never fun to maintain forks of the kernel when it’s not necessary, and resources are already tight and heavily invested in the rest of the software stack.

                                                                                                          Purism has made a lot of missteps around communication especially with respect to shipping devices to backers. I haven’t observed any missteps around their commitment to using completely free software with no binary blobs required and getting RYF certification.

                                                                                                          1. 2

                                                                                                            What I meant when I said that development was behind closed doors was that when I visit

                                                                                                            https://pureos.net/

                                                                                                            I can’t find any source control repos, only a bunch of source packages. Which is fine, and still free, but not IMO open development. (That’s not some kind of moral failing, it’s just less interesting to me.)

                                                                                                            My impression about their interest in mainline kernels came from unresponsiveness to inquiries about just that. There didn’t seem to be much movement that direction until just after Pine started booting mainline kernels :). Lack of resources could masquerade as disinterest, for sure, though and maybe it was just that.

                                                                                                            They certainly do seem 100% committed to complete freedom and blob free operation, and that’s excellent. It’s important to have more hardware in the world that works that way, and I think the level of interest in their operation will only convince more people to try building that.

                                                                                                            1. 6

                                                                                                              pureos.net

                                                                                                              Check out https://source.puri.sm/public

                                                                                                              1. 3

                                                                                                                That is dramatically better than the source link on the front of the PureOS site. Thanks.

                                                                                                      2. 3

                                                                                                        I can relate to the hesitance about making one of these devices your daily driver. What in particular is stopping you? Personally, I’d really want to be sure I can get ample battery life and that all my favorite apps can run, like Discord and Gmail. Obvjously, it also shouldn’t drop calls, fail to receive texts, or anything like that, either

                                                                                                        1. 4

                                                                                                          Last time I checked in, the call, text and MMS functionality was just not ready for prime time. I know that’s been improving quickly, but I haven’t squinted too hard to see where it is. For me to make it a daily driver, I’d need:

                                                                                                          1. Rock solid phone calls
                                                                                                          2. Extremely reliable SMS/MMS receipt
                                                                                                          3. Good headset support
                                                                                                          4. Mostly reliable SMS/MMS sending
                                                                                                          5. Very good 4G data support
                                                                                                          6. Ability for another device to reliably tether/use the phone as a hotspot
                                                                                                          7. A battery that goes an entire workday without needing a charge when being used for voice calls, SMS/MMS and some light data usage

                                                                                                          I’ve heard 1,2,3 were not quite there. 4 is supposedly there for SMS but not MMS, which last time I looked would keep me from using it on some group threads. I believe 5 is there and suspect 6 is just fine. 7 is probably good enough given the swappable, easy-to-find battery.

                                                                                                          When it comes to apps on the phone itself, GPS would be nice to have, but as long as there’s a browser that is somewhat usable, I could stand to tether an iPad or Android tablet until app coverage was robust. I prefer to work from a laptop or tablet anyway. I’d also like to have a decent camera on my phone, but that’s not a hard requirement for me to daily drive one.

                                                                                                          1. 6

                                                                                                            As someone who has not used the sms, MMS, or voice features of any of my devices in a decade, it’s good to be reminded that some people still use these features.

                                                                                                          2. 2

                                                                                                            Can it run Employer mandated apps? Whether you’re delivering food or an engineer, they’re a thing now. Plus whatever app your local government mandates you put on your phone to check COVID-related restrictions.

                                                                                                            To be honest, I think that for most people, the possibility of not owning a phone running one of the two major platforms is long gone.

                                                                                                            1. 16

                                                                                                              A couple of points are that many employers are supportive of variations of GNU/Linux. If yours isn’t, then really consider finding one that better aligns with your values.

                                                                                                              When governments mandate apps there must really be a push to say loudly and clearly that proprietary applications are not acceptable. Quietly accepting and using a Google or Apple device means that the line will keep shifting in the wrong direction. For many (most? really all?) there is still the possibility of not owning a phone from Google or Apple and participating fully in society. It won’t stay that way unless people demand it.

                                                                                                              1. 8

                                                                                                                Of course employers are supportive of GNU/Linux - when it powers their servers. When it starts to interfere with their employees’ ability to log in to the network, review their schedule or attend meetings, you will see their support dry up quickly.

                                                                                                                Not owning a Googapple phone is equivalent to not owning a phone as far as policy makers are concerned. Yes, your accessibility is considered, along with that of the elderly, homeless and poor. The notion of an employable person not owning one is increasingly alien to them.

                                                                                                                1. 6

                                                                                                                  This comment should be boosted, especially for the fact that we are getting closer to the world where only Google or Apple is accepted. This is why I want to support Pine, even if their stuff is not ready.

                                                                                                                2. 11

                                                                                                                  Can it run Employer mandated apps?

                                                                                                                  I would strongly recommend refusing to allow any company stuff on your private property. Not only is it likely to be spyware, but like, it is also not your problem to do their IT provisioning for them.

                                                                                                                  1. 3

                                                                                                                    It’s not your problem to provision motor vehicles for your employer either, but for many people, using their private car for work isn’t just normal, it’s the cornerstone of their employability.

                                                                                                                    1. 3

                                                                                                                      At least with cars (except for commute to/from the workplace) you can generally get reimbursed for the mileage and it isn’t as likely to be the spyware.

                                                                                                                      But even then, I’d say take advantage of the labor market and start fighting back against them pushing the costs on you.

                                                                                                                  2. 5

                                                                                                                    I’ve never had an employer or government mandate any mobile app. They can’t even usually mandate that you have a mobile device, unless they are providing one.

                                                                                                                    I know lots of people who run various apps that make their employer or government interactions more convenient, but never were they mandatory.

                                                                                                                    1. 4

                                                                                                                      I’ve had an employer mandate that I either, at my option, accept their app on my device or carry their device and use it. I chose to carry two devices, but I understand why my colleagues chose to install the “mandated” apps instead.

                                                                                                                      1. 5

                                                                                                                        Yeah, if they offer me a device I’m always going to take it. No work shit on personal devices ever, but also why would I not take an extra device to have around?

                                                                                                                    2. 3

                                                                                                                      I don’t really have any mandated apps other than OTP authenticators, but there’s a lot I’d miss (i.e quickly sending a message on Slack, or whatever services I use for pleasure; plus stuff like decent clients for whatever service). I could go without, but it certainly wouldn’t be a daily driver.

                                                                                                                      What I might miss more is the stuff other than third-party apps/ecosystem: the quality of the phone and the OS itself, and if they meet my needs. I doubt Pine will make something sized like my 12 mini, or if Plasma Active/phosh will hit the same quality of mouthfeel as iOS (which as a Windows Phone 7/8 refugee, has good mouthfeel since they copied live tiles)

                                                                                                                      1. 1

                                                                                                                        I’m not sure. I remember hearing one of the Linux phones supported Android apps now

                                                                                                                        1. 1

                                                                                                                          I strongly suspect this “Pro” will have enough oomph to run anbox rather nicely. It runs on the current pinephones, but I don’t think it runs particularly well.

                                                                                                                          I don’t know how much of the sensors and other bits (that, say, a ridesharing driver’s app might need) are exposed via Anbox on a pinephone. I also don’t know how much of the google play services stack works in that environment.

                                                                                                                      2. 1

                                                                                                                        The decision to maintain the original PinePhone’s screen resolution of 1440×720 was made early on; higher resolution panels consume more power and increase SoC’s load, resulting in shorter battery life and higher average thermals. A few extra pixels aren’t worth it.

                                                                                                                        Immediately turned me off. High resolution displays are what make these phones satisfactory entertainment devices as well as just communications devices.

                                                                                                                    1. 3

                                                                                                                      It seems to me that proponents of dynamic linking kind of miss the forest for the trees. The system is incidental, the only thing that matters are the applications. The value of a binary that just works is enormously higher than the value of system-wide security patches or etc.

                                                                                                                      1. 2

                                                                                                                        I agree.

                                                                                                                        I think following your comment would be a great way to design a user-friendly OS: you want to make things “just work” as you say, while having security.

                                                                                                                      1. 6

                                                                                                                        Here to pick some nits:

                                                                                                                        Startup Times Ulrich Drepper also said:

                                                                                                                        With prelinking startup times for dynamically linked code is as good as that of statically linked code. Let’s see how true that is.

                                                                                                                        Emphasis mine. Did you actually use prelinking? Apologies if so and you just weren’t explicit about it. I’ve seen some impressive improvements with it, though in those cases I did not have the option of static vs. dynamic vs. prelink to really compare head to head, just prelinked vs non-prelinked.

                                                                                                                        [re ltrace] The truth is that you could do this with a proper build of a static library that tracked calls for you. In fact, that would give you more control because you could pick and choose which functions to trace at a fine-grained level.

                                                                                                                        For the record ltrace lets you pick and choose which functions to trace at a fine grained level.

                                                                                                                        1. 2

                                                                                                                          Emphasis mine. Did you actually use prelinking?

                                                                                                                          No, I did not. I was under the impression that glibc’s linker automatically did it. I will do it, update the post, and let you know.

                                                                                                                          For the record ltrace lets you pick and choose which functions to trace at a fine grained level.

                                                                                                                          Good to know. I will update the post.

                                                                                                                          1. 2

                                                                                                                            To update: I was not able to get prelinking to work, so I put a note in instead.

                                                                                                                            1. 2

                                                                                                                              I believe prelinking is now disabled because ASLR requires libraries to be linked at different addresses. I think OpenBSD has a prelinking-like thing that works with their form of address-space randomisation.

                                                                                                                              Note that static linking means that you are completely opting out of this kind of randomisation. Even if you build a statically linked PIE (mmm, PIE) then the displacements from any function pointer to any gadget are the same. I remain unconvinced that ASLR is anything other than security theatre but it is something you give up with most static linking designs (there are some half-way places where you compile with -ffunction-sections and -fdata-sections and randomise the binary layout in between each launch).

                                                                                                                        1. 5

                                                                                                                          This is an interesting article, but some things that stood out to me:

                                                                                                                          • I’d have liked to see the article engage more with the libc case. The article points out several arguments against statically linking libc, specifically - it’s the only supported interface to the OS on most *nix-ish systems that are not Linux, it’s a very widely used library (so disk/memory savings can be substantial), and as part of the system it should be quite stable. Also, Go eventually went back to (dynamically) linking against libc rather than directly making syscalls [EDIT: on many non-Linux systems]
                                                                                                                          • LLVM IR is platform-specific and not portable. And existing C code using e.g. #ifdef __BIG_ENDIAN__ (or using any system header using such a construction!) is not trivial to compile into platform-independent code. Yes, you can imagine compiling with a whole host of cross-compilers (and system headers!), but at some point just shipping the source code begins looking rather attractive…
                                                                                                                          • exploiting memory bugs is a deep topic, but the article is a bit too simplistic in its treatment of stack smashing. There’s a lot to dislike about ASLR, but ASLR is at least somewhat effective against - to think up a quick example - a dangling pointer in a heap-allocated object being use(-after-free)d to overwrite a function / vtable pointer on the stack.
                                                                                                                          • in general, there’s a lot of prior art that could be discussed; e.g. I believe that Windows randomizes (randomized?) a library’s address system-wide, rather than per-process.
                                                                                                                          1. 3

                                                                                                                            And existing C code using e.g. #ifdef __BIG_ENDIAN__ (or using any system header using such a construction!) is not trivial to compile into platform-independent code.

                                                                                                                            The article addresses this.

                                                                                                                            I would also note as an aside that any code utilising #ifdef __BIG_ENDIAN__ is just plain wrong. Yes, even /usr/include/linux/tcp.h. Just don’t do that. Write the code properly.

                                                                                                                            1. 2

                                                                                                                              I’ll bite. I have written a MC6809 emulator that makes the following assumptions of the host system:

                                                                                                                              • A machine with 8-bit chars (in that it does support uint8_t)

                                                                                                                              • A 2’s complement architecture

                                                                                                                              I also do the whole #ifdef __BIG_ENDIAN__ (effectively), check out the header file. How would you modify the code? It’s written that way to a) make it easier on me to understand the code and b) make it a bit more performant.

                                                                                                                              1. 3

                                                                                                                                I would write inline functions which would look like:

                                                                                                                                static inline mc6809byte__t msb(mc6809word__t word) { return (word >> 8) & 0xff; }
                                                                                                                                static inline mc6809byte__t lsb(mc6809word__t word) { return word & 0xff; }
                                                                                                                                

                                                                                                                                And I would use those in place of the current macro trick of replacing something->A with something->d.b[MSB] etc.

                                                                                                                                I don’t think this would significantly impact readability. Clang seems to produce identical code for both cases, although from a benchmark there’s still some minor (10% if your entire workload is just getting data out of the MSB and LSB) performance impact although this may be some issue with my benchmark. gcc seems to struggle to realize that they are equivalent and keeps the shr but the performance impact is only 13%.

                                                                                                                                If you need to support writing to the MSB and LSB you would need a couple more inline functions:

                                                                                                                                static inline set_msb(mc6809word__t *word, mc6809byte__t msb) { *word = lsb(*word) | ((msb & 0xff) << 8); }
                                                                                                                                static inline set_lsb(mc6809word__t *word, mc6809byte__t lsb) { *word = (msb(*word) << 8) | (lsb & 0xff); }
                                                                                                                                

                                                                                                                                I haven’t benchmarked these.

                                                                                                                                I think the point I would make is that you should benchmark your code against these and see whether there is a real noticeable performance impact moving from your version to this version. Through this simple change you can make steps to drop your assumption of CHAR_BIT == 8 and your code no longer relies on type punning which may or may not produce the results you expect depending on what machine you end up on. Even though your current code is not doing any in-place byte swapping, you still risk trap representations.

                                                                                                                                P.S. *_t is reserved for type names by POSIX.

                                                                                                                                1. 2

                                                                                                                                  I would definitely have to benchmark the code, but I can’t see it being better than what I have unless there exists a really magical C compiler that can see through the shifting/masking and replace them with just byte read/writes (which is what I have now). Your set_lsb() function is effectively:

                                                                                                                                  *word = ((*word >> 8) & 0xff) << 8 | lsb & 0xff;
                                                                                                                                  

                                                                                                                                  A 10-13% reduction in performance seems a bit steep to me.

                                                                                                                                  you still risk trap representations.

                                                                                                                                  I have to ask—do you know of any computer sold new today that isn’t byte oriented and 2’s complement? Or hell, any machine sold today that actually has a trap representation? Because I’ve been programming for over 35 years now, and I have yet to come across one machine that a) isn’t byte oriented; b) 2’s complement; c) has trap representations. Not once. So I would love to know of any actual, physical machines sold new that breaks one of these assumptions. I know they exist, but I don’t know of any that have been produced since the late 60s.

                                                                                                                                  1. 2

                                                                                                                                    a really magical C compiler that can see through the shifting/masking and replace them with just byte read/writes

                                                                                                                                    Yes, that’s what clang did when I tested it on godbolt. In fact I can get it to do it in all situations by swapping the order of the masking and the shifting.

                                                                                                                                    Here’s the result:

                                                                                                                                                            output[i].lower = in & 0xff;
                                                                                                                                      4011c1:       88 54 4d 00             mov    %dl,0x0(%rbp,%rcx,2)
                                                                                                                                                            output[i].upper = (in & 0xff00) >> 8;
                                                                                                                                      4011c5:       88 74 4d 01             mov    %dh,0x1(%rbp,%rcx,2)
                                                                                                                                    

                                                                                                                                    You underestimate the power of compilers, although I’m not sure why gcc can’t do it, it’s really a trivial optimisation all things considered.

                                                                                                                                    I just checked further and it seems the only reason that the clang compiled mask&shift variant performs differently is because of different amounts of loop unrolling and also because the mask&shift code uses the high and low registers instead of multiple movbs. The godbolt code didn’t use movbs, it was identical for both cases in clang.

                                                                                                                                    My point being that in reality you may get 10% (absolute worst case) difference in performance just because the compiler felt like it that day.

                                                                                                                                    I have to ask—do you know of any computer sold new today that isn’t byte oriented and 2’s complement? Or hell, any machine sold today that actually has a trap representation?

                                                                                                                                    I don’t personally keep track of the existence of such machines.

                                                                                                                                    For me it’s not about “any machine” questions, it’s about sticking to the C abstract machine until there is a genuine need to stray outside of that, a maybe 13% at absolute unrealistic best performance improvement is not worth straying outside the definitions of the C abstract machine.

                                                                                                                                    In general I have found this to produce code with fewer subtle bugs. To write code which conforms to the C abstract machine you just have to know exactly what is well defined. To write code which goes past the C abstract machine you have to know with absolute certainty about all the things which are not well defined.

                                                                                                                                    edit: It gets worse. I just did some benchmarking and I can get swings of +-30% performance by disabling loop unrolling. I can get both benchmarks to perform the same by enabling and disabling optimization options.

                                                                                                                                    This is a tight loop doing millions of the same operation. Your codebase a lot more variation than that. It seems more likely you’ll get a 10% performance hit/improvement by screwing around with optimisation than you will by making the code simply more correct.

                                                                                                                            2. 2

                                                                                                                              Wait, Go dynamically links to libc now? Do you have more details? I thought Go binaries have zero dependencies and only the use of something like CGO would change that.

                                                                                                                              1. 15

                                                                                                                                As the person responsible for making Go able to link to system libraries in the first places (on illumos/Solaris, others later used this technology for OpenBSD, AIX and other systems), I am baffled why people have trouble understanding this.

                                                                                                                                Go binaries, just like any other userspace binary depend at least on the operating system. “Zero dependency” means binaries don’t require other dependencies other than the system itself. It doesn’t mean that the dependency to the system cannot use dynamic linking.

                                                                                                                                On systems where “the system” is defined by libc or its equivalent shared library, like Solaris, Windows, OpenBSD, possibly others, the fact that Go binares are dynamically linked with the system libraries doesn’t make them not “zero dependency”. The system libraries are provided by the system!

                                                                                                                                Also note that on systems that use a shared library interface, Go doesn’t require the presence of the target shared library at build time, only compiled binaries require it at run time. Cross-compiling works without having to have access to target system libraries. In other words, all this is an implementation detail with no visible effect to Go users, but somehow many Go users think this is some kind of a problem. It’s not.

                                                                                                                                1. 3

                                                                                                                                  I (clarified) was thinking about e.g. OpenBSD, not Linux; see e.g. cks’ article.

                                                                                                                                  1. 2

                                                                                                                                    Under some circumstances Go will still link to libc. The net and os packages both use libc for some calls, but have fallbacks that are less functional when CGO is disabled.

                                                                                                                                    1. 6

                                                                                                                                      The way this is usually explained is a little bit backwards. On Linux, things like non-DNS name resolution (LDAP, etc), are under the purview of glibc (not any other libc!) with its NSCD protocol and glibc-specific NSS shared libraries.

                                                                                                                                      Of course that if you want to use glibc-specific NSS, you have to to link to glibc, and of course that if you elect not to link with glibc you don’t get NSS support.

                                                                                                                                      Most explanations of Go’s behavior are of the kind “Go is doing something weird”, while the real weirdness is that in Linux name resolution is not something under of the purview of the system, but of a 3rd party component and people accept this sad state of affairs.

                                                                                                                                      1. 3

                                                                                                                                        How is glibc a 3rd party component on, say, Debian? Or is every core component 3rd party since Debian does not develop any of them?

                                                                                                                                        1. 4

                                                                                                                                          Glibc is a 3rd party component because it is not developed by the first party, which is the Linux developers.

                                                                                                                                          Glibc sure likes to pretend it’s first party though. It’s not, a fact simply attested by the fact other Linux libc libraries exists, like musl.

                                                                                                                                          Contrast that with the BSDs, or Solaris, or Windows, where libc (or its equivalent) is a first party component developed by BSDs, Solaris, or Windows developers.

                                                                                                                                          I would hope that Debian Linux would be a particular instance of a Linux system, rather than an abstract system itself, and I could use “Linux software” on it but glibc’s portability quirks and ambitions of pretending to be a 1st party system prevent one from doing exactly that.

                                                                                                                                          Even if you think that Linux+glibc should be an abstract system in itself, distrinct from, say, pure Linux, or Linux+musl, irrespective of the pain that would instil on me, the language developer, glibc is unfeasible as an abstract interface because it is not abstract.

                                                                                                                                          1. 3

                                                                                                                                            Wait, how are the Linux developers “first party” to anything but the kernel?

                                                                                                                                            I would hope that Debian Linux would be a particular instance of a Linux system

                                                                                                                                            There’s no such thing as “a Linux system” only “a system that uses Linux as a component”. Debian is a system comparable to FreeBSD, so is RHEL. Now some OS are specifically derived from others, so you might call Ubuntu “a Debian system” and then complain that snap is an incompatible bolt on or something (just an example, not trying to start an argument about snap).

                                                                                                                                            1. 5

                                                                                                                                              Of course there is such a thing as a Linux system, you can download it from kernel.org, it comes with a stable API and ABI, and usually, in fact with the exception of NSS above, absolutely always does exactly what you want from it.

                                                                                                                                              Distributions might provide some kind of value for users, and because they provide value they overestimate their technical importance with silly statements like “there is no Linux system, just distributions”, no doubt this kind of statement comes from GNU itself, with its GNU+Linux stance, but from a language designer none of this matters at all. All that matter are APIs and ABIs and who provides them. On every normal system, the developer of the system dictates and provides its API and ABI, and in the case of Linux that’s not different, Linux comes with its stable API and ABI and as a user of Linux I can use it, thank you very much. The fact that on Linux this ABI comes though system calls, while on say, Solaris, comes from a shared library is an implementation detail. Glibc, a 3rd party component comes with an alternative API and ABI, and for whatever reason some people think that is more canonical than the first party API and ABI provided by the kernel itself. The audacity of glibc developers claiming authority over such a thing is unbelievable.

                                                                                                                                              As a language developer, and in more general as an engineer, I work with defined systems. A system is whatever has an API and an ABI, not some fuzzy notion defined by some social organization like a distribution, or a hostile organization like GNU.

                                                                                                                                              As an API, glibc is valuable (but so is musl), as an ABI glibc has negative value both for the language developer and for its users. The fact that in Go we can ignore glibc, means not only freedom from distributions’ and glibc’s ABI quirks, but it also means I can have systems with absolutely no libc at. Just a Linux kernel and Go binaries, a fact that plenty of embedded people make use of.

                                                                                                                                              1. 2

                                                                                                                                                Of course there is such a thing as a Linux system, you can download it from kernel.org,

                                                                                                                                                So is libgpiod part of the system too? You can download that from kernel.org as well. You can even download glibc there.

                                                                                                                                                it comes with a stable API and ABI, and usually, in fact with the exception of NSS above, absolutely always does exactly what you want from it.

                                                                                                                                                Unless you want to do something other than boot :)

                                                                                                                                  2. 2

                                                                                                                                    I’d have liked to see the article engage more with the libc case.

                                                                                                                                    Fair, though I feel like I engaged with it a lot. I even came up with ideas that would make it so OS authors can keep their dynamically-linked libc while not causing problems with ABI and API breaks.

                                                                                                                                    What would you have liked me to add? I’m not really sure.

                                                                                                                                    LLVM IR is platform-specific and not portable.

                                                                                                                                    Agreed. I am actually working on an LLVM IR-like alternative that is portable.

                                                                                                                                    And existing C code using e.g. #ifdef BIG_ENDIAN (or using any system header using such a construction!) is not trivial to compile into platform-independent code.

                                                                                                                                    This is a good point, but I think it can be done. I am going to do my best to get it done in my LLVM alternative.

                                                                                                                                    the article is a bit too simplistic in its treatment of stack smashing.

                                                                                                                                    Fair; it wasn’t the big point of the post.

                                                                                                                                    There’s a lot to dislike about ASLR, but ASLR is at least somewhat effective against - to think up a quick example - a dangling pointer in a heap-allocated object being use(-after-free)d to overwrite a function / vtable pointer on the stack.

                                                                                                                                    I am not sure what your example is. Could you walk me through it?

                                                                                                                                    1. 3

                                                                                                                                      Agreed. I am actually working on an LLVM IR-like alternative that is portable.

                                                                                                                                      This is not possible for C/C++ without either:

                                                                                                                                      • Significantly rearchitecting the front end, or
                                                                                                                                      • Effectively defining a new ABI that is distinct from the target platform’s ABI.

                                                                                                                                      The second of these is possible with LLVM today. This is what pNaCl did, for example. If you want to target the platform ABI then the first is required because C code is not portable after the preprocessor has run. The article mentions __BIG_ENDIAN__ but that’s actually a pretty unusual corner case. It’s far more common to see things that conditionally compile based on pointer size - UEFI bytecode tried to abstract over this and attempts to make Clang and GCC target it have been made several times and failed and that was with a set of headers written to be portable. C has a notion of an integer constant expression that, for various reasons, must be evaluated in the front end. You can make this symbolic in your IR, but existing front ends don’t.

                                                                                                                                      The same is true of C++ templates, where it’s easy to instantiate a template with T = long as the template parameter and then define other types based on sizeof(T) and things like if constexpr (sizeof(T) < sizeof(int)), at which point you need to preserve the entire AST. SFINAE introduces even more corner cases where you actually need to preserve the entire AST and redo template instantiation for each target (which may fail on some platforms).

                                                                                                                                      For languages that are designed to hide ABI details, it’s easy (see: Java or CLR bytecode).

                                                                                                                                      1. 2

                                                                                                                                        I believe you are correct, but I’ll point out that I am attempting to accomplish both of the points you said would need to happen, mostly with a C preprocessor with some tricks up its sleeve.

                                                                                                                                        Regarding C++ templates, I’m not even going to try.

                                                                                                                                        I would love to disregard C and C++ entirely while building my programming language, but I am not disregarding C at least because my compiler will generate C. This will make my language usable in the embedded space (there will be a -nostd compiler flag equivalent), and it will allow the compiler to generate its own C source code, making bootstrap easy and fast because unlike Rust or Haskell, where you have to follow the bootstrap chain from the beginning, my compiler will ship with its own C source, making bootstrap as simple as:

                                                                                                                                        1. Compile C source.
                                                                                                                                        2. Compile Yao source. (Yao is the name of the language.)
                                                                                                                                        3. Recompile Yao source.
                                                                                                                                        4. Ensure the output of 2 and 3 match.

                                                                                                                                        With it that easy, I hope that packagers will find it easy enough to do in their packages.

                                                                                                                                        1. 2

                                                                                                                                          If your input is a language that doesn’t expose any ABI-specific details and your output is C (or C++) code that includes platform-specific things, then this is eminently tractable.

                                                                                                                                          This is basically what Squeak does. The core VM is written in a subset of Smalltalk that can be statically compiled to C. The C code can then be compiled with your platform’s favourite C compiler. The rest of the code is all bytecode that is executed by the interpreter (or JIT with Pharo).

                                                                                                                                      2. 3

                                                                                                                                        Agreed. I am actually working on an LLVM IR-like alternative that is portable.

                                                                                                                                        If you’re looking for prior art, Tendra Distribution Format was an earlier attempt at a UNCOL for C.

                                                                                                                                        1. 1

                                                                                                                                          Thank you for the reference!

                                                                                                                                        2. 3

                                                                                                                                          With respect to libc - indeed, the article engaged quite a bit with libc! That’s why I was waiting for a clear conclusion.

                                                                                                                                          E.g. cosmopolitan clearly picks “all the world’s a x86_64, but may be running different OSes”, musl clearly picks “all the world’s Linux, but may be running on different architectures”. You flirt with both “all the world’s Linux” and something like Go-on-OpenBSD’s static-except-libc. Which are both fine enough.

                                                                                                                                          With respect to ASLR: I agree that this isn’t the main point of your article, and I don’t think I explained what I meant very well. Here’s some example code, where the data from main() is meant to represent hostile input; I mean to point out that merely segregating arrays and other data / data and code doesn’t fix e.g. lifetime issues, and that ASLR at least makes the resulting bug a bit harder to exploit (because the adversary has to guess the address of system()). A cleaned-up example can be found below, (actually-)Works-For-Me code here.

                                                                                                                                          static void buggy(void *user_input) {
                                                                                                                                              uintptr_t on_stack_for_now;
                                                                                                                                              /* Bug here: on_stack_for_now doesn't live long enough! */
                                                                                                                                              scheduler_enqueue(write_what_where, user_input, &on_stack_for_now);
                                                                                                                                          }
                                                                                                                                          
                                                                                                                                          static void victim(const char *user_input) {
                                                                                                                                              void (*function_pointer)() = print_args;
                                                                                                                                          
                                                                                                                                              if (scheduler_run() != 0)
                                                                                                                                                  abort();
                                                                                                                                          
                                                                                                                                              function_pointer(user_input);
                                                                                                                                          }
                                                                                                                                          
                                                                                                                                          int main(void) {
                                                                                                                                              buggy((void *)system);
                                                                                                                                              victim("/bin/sh");
                                                                                                                                          }
                                                                                                                                          
                                                                                                                                          1. 2

                                                                                                                                            With respect to libc - indeed, the article engaged quite a bit with libc! That’s why I was waiting for a clear conclusion.

                                                                                                                                            I see. Do you mean a conclusion as to whether to statically link libc or dynamically link it?

                                                                                                                                            I don’t think there is a conclusion, just a reality. Platforms require programmers to dynamically link libc, and there’s not much we can do to get around that, though I do like the fact that glibc and musl together give us the choice on Linux.

                                                                                                                                            However, I think the conclusion you are looking for might be that it does not matter because both suck with regards to libc!

                                                                                                                                            If you statically link on platforms without a stable syscall ABI, good luck! You can probably only make it work on machines with the same OS version.

                                                                                                                                            If you dynamically link, you’re probably going to face ABI breaks eventually.

                                                                                                                                            So to me, the conclusion is that the new ideas I gave are necessary to make working with libc easier on programmers. Right now, it sucks; with my ideas, it wouldn’t (I hope).

                                                                                                                                            Does that help? Sorry that I didn’t make it more clear in the post.

                                                                                                                                            Regarding your code, I think I get it now. You have a point. I tried to be nuanced, but I should not have been.

                                                                                                                                            I am actually developing a language where lifetimes are taken into account while also having separated stacks. I hope that doing both will either eliminate the possibility of such attacks or make them infeasible.

                                                                                                                                      1. 5

                                                                                                                                        Complaining about ABI stability causing language stagnation and design issues and then complaining about a lack of ABI stability in BSDs (because they don’t want to be held back) forcing the use of dynamic linking is a little interesting.

                                                                                                                                        And default statically-linked binaries is one of the reasons the Go language became popular. I don’t think that’s a coincidence.

                                                                                                                                        An appeal to popularity.

                                                                                                                                        Other than that, the possible solutions section at the end FINALLY addresses actual problems I have always had with statically linking everything with very practical and simple solutions. I applaud this. I’ve read all three related articles now (this one, Dynamic Linking Needs To Die and Static Linking Considered Harmful) and this really is the best of all three.

                                                                                                                                        I would note though that this doesn’t propose a solution for my most annoying problem with static linking in the C world: namespace pollution. My current solution is to give all functions, that are not part of the API, macros which suffix/prefix their name with something random to minimize the possibility of collisions.

                                                                                                                                        1. 1

                                                                                                                                          Complaining about ABI stability causing language stagnation and design issues and then complaining about a lack of ABI stability in BSDs (because they don’t want to be held back) forcing the use of dynamic linking is a little interesting.

                                                                                                                                          They’re different types of ABI’s. But then again, I also present ideas to get around the problems with dynamic linking, allowing BSD’s to have the lack of ABI stability. At that point, I’d be okay with them not having ABI stability too.

                                                                                                                                          An appeal to popularity.

                                                                                                                                          I understand your sentiment, but I used it because some people want their software to be popular. I’m just saying that this might be something that helps. I want my programming language to be popular, so this appeal to popularity works on me, at least.

                                                                                                                                          But I also had an implied argument there: that static linking is easiest for downstream developers. It’s implied because people usually gravitate to what’s easiest.

                                                                                                                                          Other than that, the possible solutions section at the end FINALLY addresses actual problems I have always had with statically linking everything with very practical and simple solutions.

                                                                                                                                          I apologize that it took so long to get there; that’s how it flowed in my mind. Should I have put them first?

                                                                                                                                          I applaud this. I’ve read all three related articles now (this one, Dynamic Linking Needs To Die and Static Linking Considered Harmful) and this really is the best of all three.

                                                                                                                                          Thank you. :) And I apologize for “Dynamic Linking Needs to Die”.

                                                                                                                                          I would note though that this doesn’t propose a solution for my most annoying problem with static linking in the C world: namespace pollution.

                                                                                                                                          I would like to talk to you more about this because my programming language uses C-like names and could suffer from namespace pollution. It’s helped somewhat because prefixes are automatically added for items in packages (i.e., for the function foo() in the package bar, its C name is bar_foo()), but I have not been able to ensure it avoids all of the problems.

                                                                                                                                          My current solution is to give all functions, that are not part of the API, macros which suffix/prefix their name with something random to minimize the possibility of collisions.

                                                                                                                                          Is this automatic? Should it be automatic?

                                                                                                                                          I kind of think I know what you’re going at here, and I think it’s about the same thing that gcc and other C compilers do for functions with static visibility. So in my language, any functions and types private to a package would be treated as though they have static visibility.

                                                                                                                                          I think that would be sufficient because I think the problem C has is that something does not have static visibility unless you mark it so, leading to a lot of functions that should be static polluting the namespace.

                                                                                                                                          I’m not entirely sure, so what do you think?

                                                                                                                                          1. 4

                                                                                                                                            But I also had an implied argument there: that static linking is easiest for downstream developers. It’s implied because people usually gravitate to what’s easiest.

                                                                                                                                            There was an article on here or the other website recently about people’s focus on what is easiest for developers. I personally think that we should be looking at what’s best for users not what’s easiest for developers.

                                                                                                                                            I apologize that it took so long to get there; that’s how it flowed in my mind. Should I have put them first?

                                                                                                                                            No, I just meant I read dozens of these “dynamic linking bad” articles which complain about dynamic linking and tout the benefits of static linking as if there were literally no benefits to dynamic linking (usually after claiming that because 90% of libraries aren’t shared as much as 10% that means we should ditch 100% of dynamic linking). This was also the vibe of your original article (and a similar vibe of the “static linking considered harmful” article except in the opposite direction).

                                                                                                                                            The problem is that while yes, on the face of it you could say that static linking as things currently stand is less of a shit show, nobody seemed to acknowledge that it might be worth trying to solve the problem in a way which keeps some of the key benefits of dynamic linking. Your article is the first I’ve read so far which actually took this into consideration. That’s why I said I finally have read an article which covers these things.

                                                                                                                                            Is this automatic? Should it be automatic?

                                                                                                                                            It’s not so easy to automate these things portably. I have not looked into automating it but at the same time I almost never write libraries (my opinion is on the other extreme from npm proponents, don’t write a library unless you’ve already written similar code in a number of places and know what the API should look like from real experience).

                                                                                                                                            I kind of think I know what you’re going at here, and I think it’s about the same thing that gcc and other C compilers do for functions with static visibility. So in my language, any functions and types private to a package would be treated as though they have static visibility.

                                                                                                                                            I think that would be sufficient because I think the problem C has is that something does not have static visibility unless you mark it so, leading to a lot of functions that should be static polluting the namespace.

                                                                                                                                            I’m not entirely sure, so what do you think?

                                                                                                                                            Fundamentally the problem is that in C and C++ to get a static library on linux you compile your code into object files and then you put them into an weird crusty old archive format. At the end of the day, unlike with a shared object, you can’t mark symbols as being local to the library since when it comes to linking, it’s as if you had directly added the list of object files to the linking command.

                                                                                                                                            Let me give you an example:

                                                                                                                                            ==> build <==
                                                                                                                                            #!/bin/sh -ex
                                                                                                                                            cc -std=c11 -c lib.c
                                                                                                                                            cc -std=c11 -c internal.c
                                                                                                                                            ar rc lib.a lib.o internal.o
                                                                                                                                            ranlib lib.a
                                                                                                                                            cc -shared lib.o internal.o -o liblib.so
                                                                                                                                            cc -std=c11 -I. -c prog.c
                                                                                                                                            cc prog.o lib.a -o prog_static
                                                                                                                                            cc prog.o -L. -llib -Wl,-rpath=. -o prog_dynamic
                                                                                                                                            
                                                                                                                                            ==> clean <==
                                                                                                                                            #!/bin/sh
                                                                                                                                            rm -f *.o *.a *.so prog_*
                                                                                                                                            
                                                                                                                                            ==> internal.c <==
                                                                                                                                            #include "internal.h"
                                                                                                                                            #include <stdio.h>
                                                                                                                                            void print(const char *message)
                                                                                                                                            {
                                                                                                                                            	puts(message);
                                                                                                                                            }
                                                                                                                                            
                                                                                                                                            ==> internal.h <==
                                                                                                                                            #ifndef INTERNAL_H
                                                                                                                                            #define INTERNAL_H
                                                                                                                                            __attribute__ ((visibility ("hidden")))
                                                                                                                                            void print(const char *message);
                                                                                                                                            #endif
                                                                                                                                            
                                                                                                                                            ==> lib.c <==
                                                                                                                                            #include "lib.h"
                                                                                                                                            #include "internal.h"
                                                                                                                                            __attribute__ ((visibility ("default")))
                                                                                                                                            void api(void)
                                                                                                                                            {
                                                                                                                                            	print("api");
                                                                                                                                            }
                                                                                                                                            
                                                                                                                                            ==> lib.h <==
                                                                                                                                            #ifndef LIB_H
                                                                                                                                            #define LIB_H
                                                                                                                                            __attribute__ ((visibility ("default")))
                                                                                                                                            void api(void);
                                                                                                                                            #endif
                                                                                                                                            
                                                                                                                                            ==> prog.c <==
                                                                                                                                            #include <lib.h>
                                                                                                                                            int main(void)
                                                                                                                                            {
                                                                                                                                            	api();
                                                                                                                                            	extern void print(const char *message);
                                                                                                                                            	print("main");
                                                                                                                                            }
                                                                                                                                            

                                                                                                                                            In the above example, lib.h exposes an API, this consists of the function api. internal.h is internal to the library and exposes a function print. The visibility has been marked but from the point of view of the linker, linking prog_static from prog.o and lib.a is equivalent to linking prog.o, lib.o and internal.o. This means the linking succeeds and print can be called from main. In the library case, the linking step happens first, the visibilities are honored and once you try to link to the library the print function is no longer exposed.

                                                                                                                                            There is no way to hide the print function in this case. And given how internal functions usually don’t have namespaced names, they have a higher chance of colliding. The solution would be to replace internal.h with the following:

                                                                                                                                            #ifndef INTERNAL_H
                                                                                                                                            #define INTERNAL_H
                                                                                                                                            #define print lib_internal_print
                                                                                                                                            __attribute__ ((visibility ("hidden")))
                                                                                                                                            void print(const char *message);
                                                                                                                                            #endif
                                                                                                                                            

                                                                                                                                            (Or something similar like #define print print_3b48214e)

                                                                                                                                            This would namespace the function without having the entire codebase end up having to call the function by the long name.

                                                                                                                                            But this is all hacks, and doesn’t solve the visibility problem.

                                                                                                                                            I hope you see what I mean now.

                                                                                                                                            1. 1

                                                                                                                                              There was an article on here or the other website recently about people’s focus on what is easiest for developers. I personally think that we should be looking at what’s best for users not what’s easiest for developers.

                                                                                                                                              I agree, but I think making certain things easier on developers would reduce mistakes, which would make things easier on users. So whenever I can reduce mistakes by developers by making things easier, I do so.

                                                                                                                                              No, I just meant I read dozens of these “dynamic linking bad” articles which complain about dynamic linking and tout the benefits of static linking as if there were literally no benefits to dynamic linking (usually after claiming that because 90% of libraries aren’t shared as much as 10% that means we should ditch 100% of dynamic linking). This was also the vibe of your original article (and a similar vibe of the “static linking considered harmful” article except in the opposite direction).

                                                                                                                                              Oh, I see. Yeah, sorry again about the first one. It was unprofessional.

                                                                                                                                              It’s not so easy to automate these things portably. I have not looked into automating it but at the same time I almost never write libraries (my opinion is on the other extreme from npm proponents, don’t write a library unless you’ve already written similar code in a number of places and know what the API should look like from real experience).

                                                                                                                                              I think this is a great guideline.

                                                                                                                                              Regarding the code, yes, I think I see what you mean.

                                                                                                                                              As much as it pains me, I think this means that I am going to have to “break” ABI with C enough to make such private functions not visible to outside code. (Maybe I’ll add a suffix to them, and the suffix will be the compiler-internal ID of their scope?) But that means that I can make it automatic.

                                                                                                                                              However, I can probably only do this in the compiler for my programming language, not for C itself.

                                                                                                                                              1. 2

                                                                                                                                                My recommendation is: make your own static linking format. But if you want static linking interop with C then there’s nothing stopping you from exposing that as a single object file (the issue comes from having multiple object files in the archive).

                                                                                                                                        1. 8

                                                                                                                                          One part of the analysis that I’m not confidently convinced by is this:

                                                                                                                                          So on average, executables use about 6% of the symbols in shared libraries on my machine.

                                                                                                                                          This whole section is using number of exported symbols referenced as a proxy for proportion of code reachable at runtime.

                                                                                                                                          The functions you call can transitively call other functions and expose more code through function pointers and things that resemble eval().

                                                                                                                                          For example you can exercise a really huge chunk of sqlite with just sqlite_open(), sqlite _exec(), sqlite_free() and sqlite_close()

                                                                                                                                          1. 1

                                                                                                                                            You have fair criticisms, and your example is correct. For that, you get an upvote.

                                                                                                                                            I adopted that method because it was the method used by Drew DeVault in his post, which I cite in mine.

                                                                                                                                          1. 1

                                                                                                                                            The style of introducing the plots breaks the reading flow. You first share the code to generate these graphs and then shows the graphs without axes description. So I need to read the code to understand what it does and can then understand what the graph is telling me. After that I need to reread the argument to check if the graph supports the argument and to mentally get back to the actual argument. It would be nice if you help the reader understand what your graph says. I would also suggest to just link the generation code instant of embedding it, because this reduces the disruption.

                                                                                                                                            1. 1

                                                                                                                                              I think you have good points.

                                                                                                                                              I did it the way I did to 1) make the post self-contained, and 2) to follow the style of Drew DeVault’s post.

                                                                                                                                              I think I could keep them in while still helping people not break flow.

                                                                                                                                              Currently, the little JavaScript I do have on the site is to expand and contract the Table of Contents. What if I added a tiny bit of JavaScript to expand the code and charts? And when not expanded, it puts a small “alt” text instead.

                                                                                                                                              Would that be sufficient?

                                                                                                                                            1. 3

                                                                                                                                              Everyone trots out “DLL hell”, but I don’t think I’ve ever seen this on Windows and it’s purely rhetoric. If anything, dynamic linking is how Windows has maintained herculean backwards compatibility for so long.

                                                                                                                                              1. 8

                                                                                                                                                In 16-bit versions of Windows, a DLL is identified by its filename (for example, C:\WINDOWS\SYSTEM\MCI.DLL is identified as “MCI”), and is reference-counted. So if one application starts up and loads the MCI library, then another application starts up and loads the same library, it’s given a handle to the same library that the first application is using… even if that’s a different file on disk than the application would have loaded if it had started first.

                                                                                                                                                Thus, even if your application shipped the exact version of the DLL you wanted to use, and installed it beside the executable so it wouldn’t be clobbered by other applications, when your application loaded the library you could still wind up with any version of the library, with any ABI, without any bugs or bug-fixes your application relied on… and because every library in the universe had to share the 8-character identifier space, collisions with entirely unrelated libraries weren’t unheard of either.

                                                                                                                                                This is truly DLL Hell.

                                                                                                                                                Modern versions of Windows are less prone to these problems, since Windows 98 allowed different applications to load different DLLs with the same name, since the convention to ship DLLs beside the executable rather than install them to C:\WINDOWS, since modern PCs have enough RAM to occasionally load duplicate copies of a library.

                                                                                                                                                1. 3

                                                                                                                                                  On 16- and 32-bit x86, position-independent code was fairly expensive: there’s no PC-relative addressing mode and so the typical way of faking it is to do a PC-relative call (which does exist) to the next instruction and then pop the return address (which is now your PC) from the stack into a register and use that as the base for loads and stores. This burns a general-purpose register (of which you have 6 on 32-bit x86).

                                                                                                                                                  Windows avoided this entirely by statically relocating DLLs. I think on 16-bit Windows each DLL was assigned to a segment selector[1] on 286s, on real-mode or 32-bit Windows there was a constraint solver that looked at every .EXE on the system and the set of .DLLs that it linked and tried to find a base address for each DLL that didn’t conflict with other DLLs in any EXE that linked that DLL. If the solver couldn’t find a solution then you’d end up with some DLLs that would need to be resident twice with different relocations applied. I think in some versions of Windows it would just fail to load an EXE.

                                                                                                                                                  [1] 16-bit Windows is really 24-bit Windows. The exciting 8086 addressing mode is a 16-bit segment base right shifted by 8 and added to the address, giving a 24-bit address space. As I recall, .COM files were true 16-bit programs in a 64KiB segment, Windows provided an ABI where pointers were 16-bit segment-relative things but far pointers were stored as 32-bit integers that expanded to either a 24- or 32-bit address depending on whether you were on real-mode, standard-mode (286) or 386-enhanced mode (protected mode) Windows.

                                                                                                                                                2. 1

                                                                                                                                                  Even if it’s not a thing on Windows (anymore), and it probably isn’t, problems with dynamic linking are a problem on Linux.

                                                                                                                                                  1. 3

                                                                                                                                                    I’ve seen DLL hell on windows, never on linux. Package managers are awfully effective at avoiding it in my experience. If you go behind your distro package manager’s back, that’s on you.

                                                                                                                                                    1. 1

                                                                                                                                                      I agree with you, but I’ve seen it with package managers too.

                                                                                                                                                      1. 2

                                                                                                                                                        “I’ve seen it” is quite annecdotal. I think the point is, like u/atk said, it’s very rare and not a problem on linux, that most people writing code for linux would have to deal with.

                                                                                                                                                        1. 1

                                                                                                                                                          That’s fair.

                                                                                                                                                1. 9

                                                                                                                                                  The argument against system call stability is not merely one of numbers, but of the entire shape of the system call interface. This includes things like which calls to even provide, and how to make the calling convention work with respect to register use or stack layout or even which mode switch mechanism to use. There’s a lot of detail in there that provides room for improvement, but only if you’re not committed to the specifics of what is really an internal implementation detail. The proposed solution of just deprecating calls you want to be rid of is not especially empathetic toward users, and if your OS has a strong culture of backwards compatibility for binaries it’s not really an option at all.

                                                                                                                                                  1. 4

                                                                                                                                                    I upvoted you because you have good points. But…

                                                                                                                                                    I would argue that it is still only one of numbers.

                                                                                                                                                    If an OS wants to try a new calling convention, or register use, or stack layout, they can add new syscalls that they define as using that new ABI while the existing syscalls use the old one.

                                                                                                                                                    If it pans out, the old ones can be deprecated. If not, the new ones can be.

                                                                                                                                                    In 32 bits, we have 4 billion numbers to use for syscalls. I doubt ABI’s are changing often enough to fill that. If it does, use 64 bits.

                                                                                                                                                    In fact, I doubt syscall ABI’s really change that much anymore because there’s only so many options, and many OS’s have explored the space. We see this at a higher level with the slow convergence of OS’s on certain features and with the convergence of programming languages on certain features.

                                                                                                                                                    This (in my opinion) kind of makes your argument moot, though we could certainly debate whether the convergence has happened enough yet to declare existing syscall ABI’s stable.

                                                                                                                                                    The proposed solution of just deprecating calls you want to be rid of is not especially empathetic toward users, and if your OS has a strong culture of backwards compatibility for binaries it’s not really an option at all.

                                                                                                                                                    Even Linux, the paragon of backwards compatibility for syscalls, is removing old syscalls.

                                                                                                                                                    Also, I specifically said that removing the syscall should only happen after a deprecation period. I think this period should be 10-15 years. If software has not been updated in that time, then it really should only be run on old versions of the OS anyway, in my opinion.

                                                                                                                                                    But regardless of all of that, I actually laid out some ideas that would allow OS’s to get rid of the problems with dynamic system libraries while keeping them. I personally think that using those ideas with dynamic system libraries would be better than our current system with a stable syscall ABI. That’s one reason I put out those ideas: to convince OS authors to adopt them. Implement my ideas, and I’ll lose my desire for a stable syscall ABI.

                                                                                                                                                    So I don’t really agree, but I also think you are correct on some level.

                                                                                                                                                    1. 9

                                                                                                                                                      I would argue that it is still only one of numbers.

                                                                                                                                                      I have been working for a long time on an OS that has some users, though likely not in the millions. It’s pretty rare that anything is as simple an issue as you’re making this out to be.

                                                                                                                                                      I think this period should be 10-15 years. If software has not been updated in that time, then it really should only be run on old versions of the OS anyway, in my opinion.

                                                                                                                                                      Maybe? Our (dynamically linked) ABI has been sufficiently backwards compatible that I can still run binaries from 2006 (15 years ago) without recompiling. I suspect I can probably run binaries from earlier than that as well, but that’s your 15 years and it’s also the oldest binary I can see in $HOME/bin right now. Pushing people to rebuild their software puts them on a treadmill of busy work at some frequency, and that doesn’t feel like it adds a tremendous amount of value. At some point clearly you can draw the line; e.g., it will likely not be possible for existing 32-bit binaries to be 2038 safe in any sense, for the obvious reasons about integer width. Because these binaries are only built against dynamic libraries with stable sonames and library versions (not the same as symbol versions), and don’t directly encode any system call specifics or other internals, we have at least a tractable, if not easy, time of making this promise.

                                                                                                                                                      I have heard the noisy trumpeting of the static linking people for a pretty long time now, and it just doesn’t speak to me at all. It’s not like I hate them, I just don’t really share or even viscerally appreciate the problem they claim I must have, because I link a lot of software dynamically. There are, of course, exceptions!

                                                                                                                                                      I agree that libraries like OpenSSL have, especially in the not-that-distant past, presented a capriciously moving target, both at the ABI level and even at the source compatibility level. That’s not really anything to do with dynamic (or static) linking, though, it’s a question of what the people who work on that project value – and it was, at least historically, not stable interfaces really at all.

                                                                                                                                                      As we do a lot of our “linking” nowadays not merely statically or dynamically through ELF, but increasingly through IPC (e.g., sigh, D-Bus) and RPC (e.g., REST), the question of interface stability is the interesting part to me, rather than the specific calling mechanism. Once I put software together I’d really rather spend the minimum amount of time keeping it working in the future. Many, many software projects today don’t value stability, so they just don’t have it.

                                                                                                                                                      But some projects do value stability – and thus, in my estimation, the time and energy of their consumers and community – and so I try my level best to depend only on those bodies of software. Dynamic linking just isn’t the problem I have.

                                                                                                                                                      1. 7

                                                                                                                                                        I get that you don’t have the problem. But Docker containers exist precisely because a majority of software developers do have that problem and solved it the only way they were able to because no one was meeting that need except Containers.

                                                                                                                                                        1. 7

                                                                                                                                                          IME docker containers largely contain ruby code or python code or JavaScript code: things that have no concept of linking at all.

                                                                                                                                                          1. 5

                                                                                                                                                            They frequently contain C/C++ code as well. They get used for python/ruby/javascript frequently for the same reason though. Those languages have a form of dynamic linking that causes the exact same problem as dynamic linking causes for C code.

                                                                                                                                                        2. 1

                                                                                                                                                          I have been working for a long time on an OS that has some users, though likely not in the millions. It’s pretty rare that anything is as simple an issue as you’re making this out to be.

                                                                                                                                                          Fair, although I talked about why I thought it could be that simple.

                                                                                                                                                          In my opinion, our job as programmers is to make things as simple as we can without hindering our users.

                                                                                                                                                          Maybe? Our (dynamically linked) ABI has been sufficiently backwards compatible that I can still run binaries from 2006 (15 years ago) without recompiling. I suspect I can probably run binaries from earlier than that as well, but that’s your 15 years and it’s also the oldest binary I can see in $HOME/bin right now. Pushing people to rebuild their software puts them on a treadmill of busy work at some frequency, and that doesn’t feel like it adds a tremendous amount of value.

                                                                                                                                                          I don’t think it’s fair to say that 15 years of deprecation is a “treadmill of busy work,” even if it is at higher frequencies.

                                                                                                                                                          At some point clearly you can draw the line; e.g., it will likely not be possible for existing 32-bit binaries to be 2038 safe in any sense, for the obvious reasons about integer width.

                                                                                                                                                          This is true, and this is actually an example of an ABI break.

                                                                                                                                                          Because these binaries are only built against dynamic libraries with stable sonames and library versions (not the same as symbol versions), and don’t directly encode any system call specifics or other internals, we have at least a tractable, if not easy, time of making this promise.

                                                                                                                                                          I don’t believe you. You said that you have one or more 15-year-old binaries. Are you running on Linux or another platform? Regardless, I highly doubt there has not been an ABI break in 15 years, which means there might be some hidden bug(s) in those old binaries.

                                                                                                                                                          Sure, it might work, but if there have been any ABI breaks, you don’t know for sure.

                                                                                                                                                          I have heard the noisy trumpeting of the static linking people for a pretty long time now, and it just doesn’t speak to me at all. It’s not like I hate them, I just don’t really share or even viscerally appreciate the problem they claim I must have, because I link a lot of software dynamically. There are, of course, exceptions!

                                                                                                                                                          This, I understand. I don’t think everyone will have the same problems, and it seems that static linking won’t solve yours. That’s fine. That’s why my post from January is harmful: because it doesn’t leave room for that.

                                                                                                                                                          I agree that libraries like OpenSSL have, especially in the not-that-distant past, presented a capriciously moving target, both at the ABI level and even at the source compatibility level. That’s not really anything to do with dynamic (or static) linking, though, it’s a question of what the people who work on that project value – and it was, at least historically, not stable interfaces really at all.

                                                                                                                                                          I also agree that static linking is not going to help with software that is a moving target.

                                                                                                                                                          1. 6

                                                                                                                                                            I don’t believe you. You said that you have one or more 15-year-old binaries. Are you running on Linux or another platform? Regardless, I highly doubt there has not been an ABI break in 15 years, which means there might be some hidden bug(s) in those old binaries.

                                                                                                                                                            It’s common for large organizations using Windows to have a few ancient apps; there’s a reason that Microsoft only recently dropped support for 16bit Windows and DOS applications. API breaks aren’t an issue on Win32; Microsoft goes to heroic means to keep even poorly written applications running.

                                                                                                                                                            Here’s one example of Windows backward compatibility from Strategy Letter II: Chicken and Egg Problems

                                                                                                                                                            Jon Ross, who wrote the original version of SimCity for Windows 3.x, told me that he accidentally left a bug in SimCity where he read memory that he had just freed. Yep. It worked fine on Windows 3.x, because the memory never went anywhere. Here’s the amazing part: On beta versions of Windows 95, SimCity wasn’t working in testing. Microsoft tracked down the bug and added specific code to Windows 95 that looks for SimCity. If it finds SimCity running, it runs the memory allocator in a special mode that doesn’t free memory right away.

                                                                                                                                                            1. 1

                                                                                                                                                              That’s a thing on Windows.

                                                                                                                                                              Also, reading about how Windows managed to do that, I would argue that the complexity required makes it not worth it. At a certain point, you have to let buggy code remain buggy and don’t help it keep running.

                                                                                                                                                              1. 10

                                                                                                                                                                Disclaimer: I’m a Microsoft employee.

                                                                                                                                                                reading about how Windows managed to do that, I would argue that the complexity required makes it not worth it

                                                                                                                                                                Can I ask what you’re referring to with this? From what I’ve seen, most of the issue is just about following practices that make a maintainable ABI, and with that done, the complexity is not really that much. Stories like Simcity stand out specifically because they’re exceptional.

                                                                                                                                                                The issue in open source is that a lot of developers don’t know or don’t care to make a stable ABI, and once any common package takes that position, it forces downstream developers to work around it in some form, whether static linking, containers, whatever. Some are actively opposed to the concept of ABI stability because they believe in source code availability, so API stability is all that they’re aiming for. It’s gotten to the point where people equate “ABI” and “syscall” as equivalent concepts, but a stable ABI can exist at any layer, and it just so happens that Linus is aware of the need for it and capable of delivering it, so that layer has a stable ABI. (Frankly, it’d be very hard to build a non-ABI compatible kernel, because every kernel change would require rebuilding usermode to test the kernel; ain’t nobody got time for that.)

                                                                                                                                                                What we see with Linux distributions today is a pattern where upstream developers might make ABI incompatible changes in their development, then fix a security problem, then distributions take that patch and apply it to an older version in a way that doesn’t break the ABI, then offer 10 years of LTS support. The distributions can’t issue a security update that breaks the ABI, since they’re not Gentoo and don’t want to rebuild every program, and yet they issue updates on a regular cadence. Unfortunately this code flow means the learnings and practices of the distributions don’t necessarily flow back up to the original authors.

                                                                                                                                                                As I can tell, ABI compatibility is basically a necessity today. If you have a platform, applications run on it, and you want to be able to patch security updates every month (or similar), then it implies patches need to be able to be made without disturbing applications. And if you have a system that’s designed to do that every month, 15 years just starts to look like a large number of non-breaking changes. IMHO, the more this happens the easier it gets, because it forces patterns that allow non-breaking changes to be made repeatedly and reliably.

                                                                                                                                                                1. 4

                                                                                                                                                                  Stories like Simcity stand out specifically because they’re exceptional.

                                                                                                                                                                  Oh yeah! Stories like Simcity stand out, and not just because they’re exceptional, but also because they seemingly solve a trivial problem – all that work “just” to keep a “stupid game” running. It’s puzzling and challenges you to think a bit (and realize that supporting a popular game was really a significant problem in terms of adoption!), which is why it makes such a good story.

                                                                                                                                                                  But it’s crazy just how much of the wheels that keep the real world spinning survive on ABI (and, to some degree, API) compatibility, and I think the FOSS community derives the wrong lessons out of this.

                                                                                                                                                                  Lots of people in the Linux space think about it in terms of just how much effort goes into maintaining compatibility with legacy apps just because megacorps hate progress, lazy programmers won’t switching to the new and better versions of everything, evil companies would rather suck money out of their customer’s pockets than improve their products and so on.

                                                                                                                                                                  Absurdly few people think of it in terms of just how much time that makes available to solve more important problems and add functionality that helps people do more things with their computers.

                                                                                                                                                                  Yes, it looks stupid that so much effort went into keeping Sim City running – but that’s also one of the reasons why Sim City is the impressive franchise it is today, and why modern Sim City games are light years ahead of their old selves (nostalgia/personal preferences notwithstanding, I also like Sim City 2000 the most because that’s what I played when I was young): because Will Wright & co. could spend time working on new Sim City games, instead of perpetually impedance-matching Sim City to make it work on the latest DOS and Windows machines, the way we do with Linux software today.

                                                                                                                                                                  1. 2

                                                                                                                                                                    The other part is if someone’s favourite game didn’t work, they’d just blame it on Windows, even if it’s clearly the application’s fault. (Or if it was an actually critical business application.)

                                                                                                                                                                  2. 1

                                                                                                                                                                    Can I ask what you’re referring to with this? From what I’ve seen, most of the issue is just about following practices that make a maintainable ABI, and with that done, the complexity is not really that much. Stories like Simcity stand out specifically because they’re exceptional.

                                                                                                                                                                    I’m referring to Tales of Application Compatibility.

                                                                                                                                                                    For the record, I applaud the work that Microsoft does to maintain backwards compatibility. Of my many complaints about Microsoft, that is emphatically not one of them.

                                                                                                                                                                    The issue in open source is that a lot of developers don’t know or don’t care to make a stable ABI, and once any common package takes that position, it forces downstream developers to work around it in some form, whether static linking, containers, whatever. Some are actively opposed to the concept of ABI stability because they believe in source code availability, so API stability is all that they’re aiming for.

                                                                                                                                                                    I agree. Upstream developers rarely have any empathy for their users, who are downstream developers.

                                                                                                                                                                    It’s gotten to the point where people equate “ABI” and “syscall” as equivalent concepts, but a stable ABI can exist at any layer, and it just so happens that Linus is aware of the need for it and capable of delivering it, so that layer has a stable ABI.

                                                                                                                                                                    Agreed; I just think that the syscall layer is the best place to put the stability.

                                                                                                                                                                    As I can tell, ABI compatibility is basically a necessity today. If you have a platform, applications run on it, and you want to be able to patch security updates every month (or similar), then it implies patches need to be able to be made without disturbing applications. And if you have a system that’s designed to do that every month, 15 years just starts to look like a large number of non-breaking changes. IMHO, the more this happens the easier it gets, because it forces patterns that allow non-breaking changes to be made repeatedly and reliably.

                                                                                                                                                                    I agree that ABI compatibility is basically a necessity today. The ideas I put in my post are there to make it unnecessary tomorrow because even if you have great reliability about not having non-breaking changes, 15 years is long enough to assume it happened somewhere.

                                                                                                                                                                    I guess, in essence, I agree with you, and you got an upvote; I just want to push us all in a direction where we don’t have to worry about such things.

                                                                                                                                                              2. 5

                                                                                                                                                                I don’t believe you. You said that you have one or more 15-year-old binaries. Are you running on Linux or another platform? Regardless, I highly doubt there has not been an ABI break in 15 years, which means there might be some hidden bug(s) in those old binaries.

                                                                                                                                                                This is on an illumos system now. Some of the oldest binaries I probably built on Solaris systems before we parted ways. In my experience, outside of perhaps RHEL which has sought traditionally to supplant systems like Solaris in enterprises with an understandably low tolerance for churn, there aren’t many Linux-based systems that have had as strong a focus on keeping old binaries working. The Linux kernel project has a strong focus on system call stability, but this ethos does not appear to extend to many distributions.

                                                                                                                                                                I can’t unequivocally state that the ABI hasn’t changed, but I can say that if we are made aware of a problem we’ll fix it – and that when we extend the system we are careful to avoid breakage. Some examples of ABI-related issues that I can recall include:

                                                                                                                                                                • The addition of open_memstream(3C) and related stdio functionality; the person who did this work wrote fairly extensively on some of the compatibility mechanisms in our stdio implementation

                                                                                                                                                                • We found that there had been a shift in certain alignment decisions made by the compiler over time, and some sigsetjmp(3C) calls were causing crashes; this was fixed to ensure that we accepted now-incorrectly aligned sigjmp_buf arguments, which will ensure that things continue to work for both old and new binaries.

                                                                                                                                                                1. 2

                                                                                                                                                                  In my experience, outside of perhaps RHEL which has sought traditionally to supplant systems like Solaris in enterprises with an understandably low tolerance for churn, there aren’t many Linux-based systems that have had as strong a focus on keeping old binaries working. The Linux kernel project has a strong focus on system call stability, but this ethos does not appear to extend to many distributions.

                                                                                                                                                                  I agree with this, and it makes me sad! Linux should be the best at this, but distros ruined it.

                                                                                                                                                                  I can’t unequivocally state that the ABI hasn’t changed, but I can say that if we are made aware of a problem we’ll fix it

                                                                                                                                                                  This is what I am afraid of. I hope there are not any unfixed issues in your systems, but there is no guarantee of that.

                                                                                                                                                                  and that when we extend the system we are careful to avoid breakage.

                                                                                                                                                                  This is good, but it is also the target of one section of my post: “ABI Breaks Impede Progress”. There is always going to be some place where ABI breaks cause problems. In fact, it your story about the compiler’s alignment decisions, it seems the compiler was defining your ABI and shifting it out from under you.

                                                                                                                                                                  To minimize that, people reduce the progress they make, which is why the fear of ABI breaks impedes progress.

                                                                                                                                                                  The point I was trying to make regarding syscalls was that I believe that they are the best place to guarantee ABI because you can always replace syscalls with bad ABI’s with different syscalls with good ABI’s, where you can’t really replace a bad dynamic library ABI without a lot of pain.

                                                                                                                                                        1. 3

                                                                                                                                                          If you like static linking and LLVM IR then you’ll love ALLVM.

                                                                                                                                                          1. 1

                                                                                                                                                            That does look like the sort of thing I am talking about.