1.  

    I’d love to see something generic like this. Any open port at the moment is subject to a load of attempts and it just takes one exploitable vulnerability in a server that holds any private data for everything to be lost. A mobile app that took an OTP key and did port knocking for a particular server would make it fairly easy to disable all inbound ports except for users who ran the right sequence.

    That said, port knocking is only really interesting because it lets something simple and (hopefully) easy to secure sit in front of the server process. Port knocking is about the simplest protocol that you can create and secure: the attacker either connects to a given port or doesn’t. To do it well, you need to listen on a lot of ports and block any remote that connects to the wrong one in the sequence. For example, reserve ports 1024 - 2048, listen on all of them, encode a one-time password as a sequence of 10-bit digits, where each port is one digit, and if a remote IP connects to any of the ports other than the next one in the sequence then block them. This is really hard to get right if two computers on the same IP try at the same time: you need to make sure that you track the remote IP and port (and hope that there aren’t any stateless NATs in the way that make sequential packets appear to come from different ports).

    It then opens up a lot. Any attacker on the same NAT’d network as the legitimate user (which can be a lot of users with carrier-grade NAT) can then attack your server. The real solution is probably to:

    • Aggressively privilege-separate your server applications, so nothing that an attacker can compromise can do anything bad.
    • Use strong credentials (SSH keys, client certs, WebAuthn, whatever) for client authentication, so that an attacker can’t brute-force client credentials.

    The second of these is increasingly possible. Most devices now come with an isolated key store: a TPM on most x86 PCs, the Secure Element on Apple devices, and either an isolated store or a TrustZone enclave (which is at least protected from the main OS) on Android devices, which offer signing services. These allow a live attack with a compromised client / client OS, but don’t let anyone extract the keys.

    Beyond that, Facebook is doing some interesting research on pattern recognition to ramp up security (I think others are now that they’ve published their initial work). For Facebook, ‘logged in’ is not a binary state. For low-value things, they let you in with just a password or cookie from an old login, but as you get to more sensitive bits of access they require more signal that you’re really you (which includes all of the creepy stalkerware that you’d expect from Facebook, such as ‘have you visited the same sites you normally do?’). I’d love to see something in Dovecot / NextCloud / whatever that would ramp up security if you were doing unusual searches or trying to download a lot of data and require two-factor auth then. None of these systems really have a good story for protecting users against a compromised client device.

    1. 9

      My main thought is that we should stop using corporation-friendly licenses. We don’t have to be so easily exploited.

      A surprising amount of OSS is made by former big tech developers. They can afford to subsist on meagre revenue—for a time—because their pay and stock options have left them free of debt and with well-stocked savings accounts. … Scratch away at the surface of pretty much any active OSS project that has no discernible revenue, and you either get a burnout waiting to happen, or you’ll find a formerly well-paid dev coasting on savings.

      Why not both? Google’s interference in my FLOSS projects and spare-time activities burned me out, but at the same time they had made me a financial offer I couldn’t refuse, and so they ended up funding my work even as they tried to prevent me from releasing it.

      1. 7

        Why not both? Google’s interference in my FLOSS projects and spare-time activities burned me out, but at the same time they had made me a financial offer I couldn’t refuse, and so they ended up funding my work even as they tried to prevent me from releasing it.

        Can you share the details of this interference, and throffer?

        1.  

          Yes, but I don’t like talking about it, so I’m going to only give an abbreviated summary.

          • I was a starving-artist university student working on university campus
          • Google offered me an unreasonably good salary if I were to drop out
          • After I’ve worked at Google for a year, they demanded that I relicense repositories like Bravo and Typhon and assign copyright to them, and threatened me with legal action
          • The resulting stress accelerated my burnout and I eventually left Google
        2. 5

          My main thought is that we should exclude non-people from licenses. I’ve added a clause to the GPL for my software that only allows the software to be run and distributed by and on behalf of natural persons.

          If you want limited liability for whatever it is you’re doing it with you need to pay me.

          1. 9

            Which makes it neither Free Software nor Open Source.

            I very much doubt the FSF appreciates you using their license as base for yours.

            1.  

              The FSF also holds the copyright to the text of their licenses, and does not allow modification. From the GPLv3:

              Copyright © 2007 Free Software Foundation, Inc. https://fsf.org/

              Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

              1. 10

                You can legally use the GPL terms (possibly modified) in another license provided that you call your license by another name and do not include the GPL preamble, and provided you modify the instructions-for-use at the end enough to make it clearly different in wording and not mention GNU (though the actual procedure you describe may be similar).

                https://www.gnu.org/licenses/gpl-faq.html#ModifyGPL

              2.  

                And yet it’s both for people. Raytheon is welcome to ask for a license.

                1. 5

                  If the license does not allow them to use the code to begin with, it is neither an Open Source nor Free Software license. Not even freeware.

                  You’re free to use whatever license you choose (as long as you respect the licenses of any code you use, of course). But, like that, it will amount to yet another form of shareware, and nothing more.

                  I suspect your intent would be better served by the Affero license, but ultimately I’m neither you nor know what your intent is; I can only guess.

                  1. 5

                    My intent is to give power to people and not corporations. That it seems impossible for a huge number of people to understand why someone who values freedom of people might not value freedom of corporations would be worrying if not completely expected given the amount of corporate propaganda we are exposed to.

                    My license is open source for people. That it isn’t for corporations is a feature, not a bug.

                    1.  

                      Your license is not Open Source. Stop using these words.

                      Other than that, you’re free to use whatever shareware terms you’ve come up with.

                      1.  

                        It’s open source for people 😇

                        1.  

                          Open Source has a definition - https://opensource.org/osd

                          Free Software has a definition - https://www.gnu.org/philosophy/free-sw.en.html

                          A license that restricts who can use the program is not a Free nor Open Source software license.

                          1. 6

                            “Open Source” was in usage long before OSI came up with the term, a quick Usenet search and you’ll find usage of it throughout the 90s. Besides, I reject the notion that any organisation or person has the authority to single-handedly define language in the first place. I’m sure that if Google or Apple were to try to the same thing people would be up in arms about it.

                            1.  

                              Google nor Apple are charities nor community initiatives? It’s not like it’s the Technical Dictionary of Dianetics and Scientology, they’re just explicitly defining the terms “Open Source” or “Free Software” to avoid confusion. The reason the OSD was put together is to stop the confusion on what it actually means - otherwise the MIT license and “ethical” source licenses are in the same category, when they’re designed to achieve completely different things.

                              1. 8

                                Google nor Apple are charities nor community initiatives?

                                So? Why does that matter?

                                This reminds me of the time when I asked the Dutch cancer foundation how they got my phone number, which they are legally obliged to tell me, as I was sure I never gave it to them yet they called me anyway. Their explanation was “we’re a charity”. Good for you, but that doesn’t mean the law doesn’t apply to you and can harvest/buy my data from unknown sources (I never did find out where they got it from) or engage in other dubious behaviour.

                                Besides, the OSD was written by a single person in 1997 (or 1998? I forgot), and the Free Software definition was written by a single person as well (who is also notoriously impervious to feedback I might add). The reality of the matter is that both terms are and always have been frequently used in ways that was not the intention of those authors as both are a common adjective + common noun, the combination of which has been independently coined dozens of times, if not more frequently, before the so-called “official definition” was written down. If they wanted to “avoid confusion” they should have picked something which isn’t confusing. It’s nice that a community formed around these terms, but that doesn’t mean you get the authority to decide how ~8 billion people use the language.

                                otherwise the MIT license and “ethical” source licenses are in the same category

                                I don’t see a problem with that.

                                1.  

                                  The whole charity anecdote is a red herring as this isn’t a legal issue nor one of “dubious behaviour”. They’re both in software generally well-supported definitions as demonstrated by people’s willingness to define their licenses as Free Software or Open Source respectively.

                                  otherwise the MIT license and “ethical” source licenses are in the same category

                                  I don’t see a problem with that.

                                  You don’t see a problem with the MIT license being bunched in with the Cooperative Non-Violent Public License, two completely different licensing models entirely?

                                  1.  

                                    My point was that being a charity isn’t important and a, well, red herring. Red herrings for everyone! 🐟

                                    Why should the license be problematic? For a lot of people “open source” just means “access to the source code”. That this means something different to you is okay, but that discussions like this are held on a regular basis on e.g. HN demonstrates that this definition is very far from universal.

                                    The licensing obligations between e.g. MIT and GPL are vastly different as well, and even between fairly similar licenses there can be details that differ and have a large impact. Basically, you need to read the license text (or a summary thereof) anyway to know what you can and can’t do.

                                    At any rate, I already wrote an article about this entire thing last year which has a Lobsters discussion, so I won’t repeat it all here.

                          2.  

                            Your “license” sounds like one of those ethical source licenses. It is neither free nor open source — don’t misuse those terms.

                            1.  

                              Your “license” sounds like one of those ethical source licenses.

                              This license is based on the BSD 3-clause license, but with specific exclusions for using licensed code to promote or profit from:

                              violence, hate and division,

                              environmental destruction,

                              abuse of human rights,

                              the destruction of people’s physical and mental health

                              Yeah, no.

                              As I said, it’s open source for people 😇

                              1.  

                                I meant it’s in the same vein as ethical source licenses.

                                As I said, it’s open source for people 😇

                                Okay, assuming you aren’t trolling by incessantly repeating yourself, aren’t corporations made up of people? They aren’t like, some mechatronic evil beast or something. How will your license hold up legally?

                                1.  

                                  I am made of cells yet I’m not a cell. The law still has the distinction between a natural and legal person.

                                  How will your license hold up legally?

                                  The same way the GPL did.

                                  If they manage to win a court case that shows it’s not legally binding then they have been using software without a license and will need to negotiate for one going forward. I have no idea why so many programmers thing that if you ‘hack’ the legal system you magically remove all protections it provides.

                                  1.  

                                    There’s precedent in the USA that you don’t lose rights through association. If I, as an individual, have a right then I don’t lose that right by being a member of a company. What does it mean it a license grants right to a person, to all employees of a company, but not to the company as a whole?

                                    I presume since you want to be like open source, you’re talking about licenses that cover distribution and modification, not end-user license agreements (EULAs)? If I can modify your code and distribute the result as an individual, a company can pay me to modify and distribute your code. If you want to place restrictions on use then that’s a very different matter. There have been a load of ‘free for non-commercial use’ licenses and they’re always a bit tricky because the boundary between commercial and non-commercial use is difficult to define. Is someone working for a charity but paid a salary engaged in commerce? Is someone who writes a blog and sticks ads on to cover some of the hosting costs engaged in commerce? Am I engaged in commerce when I update an open source project that counts some companies among its downstream users?

                                    1.  

                                      All the questions you ask here are quite easily answered by looking at what assets are owned by who on an assets sheet. This is not esoteric gnostic scripture open to interpretation, it is a basic document provided by any half decent accountant.

                                      If you are using a piece of software on your corporate laptop, during your work day, doing work that is being charged to the corporate account than you are clearing not acting on the behalf of a person.

                                      You can still provide services to a corporation by running the software for them on your own machine, one owned by John Doe as clearly stated on his tax form. Since you are, as stated, providing them a service and not distributing the software to them.

                                      If you are redistributing the software it is quite easy to tell if you can by looking at who owns the copyright on it. If it is “John Doe, Person” then knock your self out, if it is “Doe Corp Inc.” then it’s a corporation and needs to negotiate a license.

                                      Your tax rates very much depend on exactly which entity is doing what and as you can expect there is a very rich set of legal precedents for anything you can imagine.

                                      There have been a load of ‘free for non-commercial use’ licenses and they’re always a bit tricky because the boundary between commercial and non-commercial use is difficult to define.

                                      Which is why I’m making the distinction between natural persons and legal ones. A distinction that we have been able to make for close to 500 years and one that is at the basis of our tax law. It is something you have to legally declare to the government each year and trivial to check. And impacts the tax rate you pay greatly.

                2.  

                  Liability is the key concept here. A corporation sheds liability from its employees and managers, which empowers them to act beyond the typical ethical context of a human. (There’s not really a better explanation for how the lowest-level employees of a corporation can directly commit crimes on a regular basis; the story of Uber is instructive.) Because these folks are not necessarily operating according to societal norms, we should be careful about extending the legal fiction of personhood to their employer.

                  The sibling argument is an excellent example of how this entire philosophical exploration can be easily preempted if we allow ourselves to agree with corporations that corporations are just like ordinary humans. There is some powerful memetic blindness inflicted by corporate propaganda.

                  The main reason that Free Software cannot exclude corporations, in the USA, is because of corporate personhood in the context of copyleft as a strategy for ensuring that Free Software stays free. Non-copylefted Free Software is still Free Software, but can be made unfree by selfish folks; copyleft is a legal option for preventing those folks from acting. However, since corporate personhood is a legal fiction, it is enforced by the same system as copyleft, which requires copyleft licenses to consider corporations as people.

                  1.  

                    However, since corporate personhood is a legal fiction, it is enforced by the same system as copyleft, which requires copyleft licenses to consider corporations as people.

                    I’m not sure if you’re being descriptive of current popular licenses, in which case you’re correct, or prescriptive, in which case you’re not. A software license can easily be formulated to only apply to natural persons. Since biological existence, or lack there of, is not a protected group you can enforce contracts that exclude legal persons from your license.

                    I largely agree with the rest of your post.

              1. 3

                I’m not entirely convinced a new model is needed. We already have memory mapped files in all the major operating systems. And file pages can already be as small as 4KiB, which is tiny compared to common file sizes, these days. Perhaps it would make sense to have even smaller pages for something like Opteron, but do we really need to rethink everything? What would we gain?

                1. 4

                  What we’d gain is eliminating 50+ years of technical debt.

                  I recommend the Twizzler presentation mentioned a few comments down. It explains some of the concepts much better than I can. These people have really dug into the technical implications far deeper than me.

                  The thing is this: persistent memory blows apart the computing model that has prevailed for some 60+ years now. This is not the Von Neumann model or anything like that; it’s much simpler.

                  There are, in all computers for since about the late 1950s, a minimum of 2 types of storage:

                  • primary storage, which the processor can access directly – it’s on the CPUs’ memory bus. Small, fast, volatile.
                  • secondary storage, which is big, slow, and persistent. It is not on the memory bus and not in the memory map. It is held in blocks, and the processor must send a message to the disk controller, ask for a particular block, wait for it to be loaded from 2y store and place into 1y store.

                  The processor can only work on data in 1y store, but everything must be fetched from it, worked on, and put back.

                  This is profoundly limiting. It’s slow. It doesn’t matter how fast the storage is, it’s slow.

                  PMEM changes that. You have RAM only RAM, but some of your RAM keeps its contents when the power is off.

                  Files are legacy baggage. When all your data is in RAM all the time, you don’t need files. Files are what filesystems hold; filesystems are an abstraction method for indexing blocks of secondary storage. With no secondary storage, you don’t need filesystems any more.

                  1. 7

                    I feel like there are a bunch of things conflated here:

                    Filesystems and file abstractions provide a global per-device namespace. That is not a great abstraction today, where you often want a truly global namespace (i.e. one shared between all of your devices) or something a lot more restrictive. I’d love to see more of the historical capability systems research resurrected here: for typical mobile-device UI abstractions, you really want a capability-based filesystem. Persistent memory doesn’t solve any of the problems of naming and access. It makes some of them more complicated: If you have a file on a server somewhere, it’s quite easy to expose remote read and write operations, it’s very hard to expose a remote mmap - trying to run a cache coherency protocol over the Internet does not lead to good programming models.

                    Persistence is an attribute of files but in a very complicated way. On *NIX, the canonical way of doing an atomic operation on a file is to copy the file, make your changes, and then move the old file over the top. This isn’t great and it would be really nice if you could have transactional updates over ranges of files (annoyingly, ZFS actually implements all of the machinery for this, it just doesn’t expose it at the ZPL). With persistent memory, atomicity is hard. On current implementations, atomic operations with respect to CPU cache coherency and atomic operations with respect to committing data to persistent storage are completely different things. Getting any kind of decent performance out of something that directly uses persistent memory and is resilient in the presence of failure is an open research problem.

                    Really using persistent memory in this way also requires memory safety. As one of the The Machine developers told me when we were discussing CHERI: with persistent memory, your memory-safety bugs last forever. You’ve now turned your filesystem abstractions into a concurrent GC problem.

                    1. 1

                      Excellent points; thank you.

                      May I ask, are you the same David Chisnall of “C is not a low-level language” paper? That is probably my single most-commonly cited paper. My compliments on it.

                      Your points are entirely valid, and that is why I have been emphasizing the “just for fun” angle of it. I do not have answers to some of these hard questions, but I think that at first, what is needed is some kind of proof of concept. Something that demonstrates the core point: that we can have a complex, rich, capable environment that is able to do real, interesting work, which in some ways exceeds the traditional *nix model for a programmer, which runs entirely in a hybrid DRAM/PMEM system, on existing hardware that can be built today.

                      Once this point has been made by demonstration, then perhaps it will be possible to tackle much more sophisticated systems, which provide reliability, redundancy, resiliency, and all that nice stuff that enterprises will pay lots of money for.

                      There is a common accusation, not entirely unjust, that the FOSS community is very good at imitating and incrementally improving existing implementations, but not so good at creating wholly new things. I am not here to fight that battle. What I was trying to come up with was a proposal to use some existing open technology – things that are already FOSS, already out there, and not new and untested and immature, but solid, time-proven tools that have survived despite decades in obscurity – and assemble them into something that can be used to explore new and largely uncharted territory.

                      ISTM, based on really very little evidence at all, that HPE got carried away with the potential of someting that came out of their labs. It takes decades to go from a new type of component to large-scale highly-integrated mass production. Techies know that; marketing people do not. We may not have competitive memristor storage until the 2030s at the earliest, and HPE wanted to start building enterprise solutions out of it. Too much, too young.

                      Linux didn’t spring fully-formed from Torvalds’ brow ready to defeat AIX, HP-UX and Solaris in battle. It needed decades to grow up.

                      The Machine didn’t get decades.

                      Smalltalk has already had decades.

                      1.  

                        Reply notifications are working again, so I just saw this!:

                        May I ask, are you the same David Chisnall of “C is not a low-level language” paper? That is probably my single most-commonly cited paper. My compliments on it.

                        That’s me, thanks! I’m currently working on a language that aims to address a lot of my criticisms of the C abstract machine.

                        Something that demonstrates the core point: that we can have a complex, rich, capable environment that is able to do real, interesting work, which in some ways exceeds the traditional *nix model for a programmer, which runs entirely in a hybrid DRAM/PMEM system, on existing hardware that can be built today.

                        I do agree with the ‘make it work, make it correct, make it fast’ model, but I suspect that you’ll find with a lot of these things that the step from ‘make it work’ to ‘make it correct’ is really hard. A lot of academic OS work fails to make it from research to production because they focus on making something that works for some common cases and miss the bits that are really important in deployment. For persistent memory systems, how you handle failure is probably the most important thing.

                        With a file abstraction, there’s an explicit ‘write state for recovery’ step and a clear distinction in the abstract machine between volatile and non-volatile storage. I can quite easily do two-phase commit to a POSIX filesystem (unless my disk is lying about sync) and end up with something that leaves my program in a recoverable state if the power goes out at any point. I may lose uncommitted data, but I don’t lose committed data. Doing the same thing with a single-level store is much harder because caches are (as their name suggests) hidden. Data that’s written back to persistent memory is safe, data in caches isn’t. I have to ensure that, independent of the order that things are evicted from cache, my persistent storage is in a consistent state. This is made much harder on current systems by the fact that atomic with respect to other cores is done via the cache coherency protocol, whereas atomic with respect to main memory (persistent or otherwise) is done via cache evictions and so guaranteeing that you have a consistent view of your data structures with respect to both other cores and persistent storage is incredibly hard.

                        The only systems that I’ve seen do this successfully segregated persistent and volatile memory and provided managed abstractions for interacting with it. I particularly like the FaRM project from some folks downstairs.

                        There is a common accusation, not entirely unjust, that the FOSS community is very good at imitating and incrementally improving existing implementations, but not so good at creating wholly new things.

                        I think there’s some truth to that accusation, though I’m biased from having tried to do something very different in an open source project. It’s difficult to get traction for anything different because you start from a position of unfamiliarity when trying to explain to people what the benefits are. Unless it’s solving a problem that they’ve hit repeatedly, it’s hard to get the message across. This is true everywhere, but in projects that depend on volunteers it is particularly problematic.

                        ISTM, based on really very little evidence at all, that HPE got carried away with the potential of someting that came out of their labs. It takes decades to go from a new type of component to large-scale highly-integrated mass production. Techies know that; marketing people do not. We may not have competitive memristor storage until the 2030s at the earliest, and HPE wanted to start building enterprise solutions out of it. Too much, too young.

                        That’s not an entirely unfair characterisation. The Machine didn’t depend on memristers though, it was intended to work with the kind of single-level store that you can build today and be ready to adopt memrister-based memory when it became available. It suffered a bit from the same thing that a lot of novel OS projects do: they wanted to build a Linux compat layer to make migration easy, but once they have a Linux compat layer it was just a slow way of running Linux software. One of my colleagues likes to point out that a POSIX compatibility layer tends to be the last piece of native software written for any interesting OS.

                    2. 4

                      I think files are more than just an abstraction over block storage, they’re an abstraction over any storage. They’re crucial part of the UX as well. Consider directories… Directories are not necessary for file systems to operate (it could just all be flat files) but they exist, purely for usability and organisation. I think even in the era of PMEM users will demand some way to organise information and it’ll probably end up looking like files and directories.

                      1. 2

                        Most mobile operating systems don’t expose files and directories and they are extremely popular.

                        1. 3

                          True, but those operating systems still expose filesystems to developers. Users don’t necessarily need to be end users. iOS and Android also do expose files and directories to end users now, although I know iOS didn’t for a long time.

                          1. 3

                            iOS also provides Core Data, which would be a better interface in the PMEM world anyway.

                            1. 1

                              True, but those operating systems still expose filesystems to developers.

                              Not all of them don’t, no.

                              NewtonOS didn’t. PalmOS didn’t. The reason being that they didn’t have filesystems.

                              iOS is just UNIX. iOS and Android devices are tiny Unix machines in your pocket. They have all the complexity of a desktop workstation – millions of lines of code in a dozen languages, multiuser support, all that – it’s just hidden.

                              I’m proposing not just hiding it. I am proposing throwing the whole lot away and putting something genuinely simple in its place. Not hidden complexity: eliminating the complexity.

                            2. 2

                              They tried. Really hard. But in the end, even Apple had to give up and provide the Files app.

                              Files are an extremely useful abstraction, which is why they were invented in the first place. And why they get reinvented every time someone tries to get rid of them.

                              1. 4

                                Files (as a UX and data interchange abstraction) are not the same thing as a filesystem. You don’t need a filesystem to provide a document abstraction. Smalltalk-80 had none. (It didn’t have documents itself, but I was on a team that added documents and other applications to it.) And filesystems tend to lack stuff you want for documents, like metadata and smart links and robust support for updating them safely.

                                1. 1

                                  I’m pretty sure the vast majority of iOS users don’t know Files exist.

                                  I do, but I almost never use it.

                                2. 1

                                  And extremely limiting.

                          1. 18

                            A lot of this boils down to the same economic motivation for any kind of outsourcing: if something is not your core competency and is not part of your competitive advantage then you can reduce costs by sharing the fixed overheads with other companies in your market. This is especially true for zero-marginal-cost goods such as software where the only costs are the fixed overheads. The cost of developing a program that has a single user is the same as the cost of developing a program that has a million identical users. For something like a game engine, where the users aren’t identical and have some different requirements, there are some incremental costs, but they’re far lower than the cost of developing the core technology.

                            This composes with the other broad trend towards complexity in the tech industry. The game engine for Wolfenstein 3D was developed by a couple of people. The team that developed the Quake engine was (I think) about half a dozen people. A modern game engine is many hundreds of person years of development work. Id probably recouped the cost of developing the Quake engine with Quake itself, any other licensing royalties were free money on top. A fairly successful modern game like The Witcher 3, for example, almost certainly didn’t make enough money to fund the development of an in-house game engine and so companies that don’t make the decision to outsource either make massively profitable games or go out of business.

                            I think it’s going to be interesting to see how things like Godot change the economics here, because it may well be cheaper to pay a couple of in-house developers to add features to an open source game engine (and upstream them, if you don’t want to pay the cost of maintaining a fork) than to pay for an Unreal license. Unreal is trying quite hard to counter this with a pricing model that is very cheap for small studios, which then get locked in (and even when they’re paying more, it’s not a large fraction of their revenue).

                            1. 3

                              I thought CD Projekt Red made their own engine

                              1. 2

                                They did, but according to Wikipedia it was first used for The Witcher 2, which was released in 2011, which means they probably started engine development back in 2005ish (you hear “it takes 5+ years to make a game engine” a whole lot on e.g. /r/gamedev). They also used a ton of middleware.

                                https://en.wikipedia.org/wiki/CD_Projekt#REDengine

                                1. 2

                                  I worked at the game studio which developed Snowdrop- I can confirm that it’s about 5-7 years for an engine.

                              2. 2

                                See, this is the thing. Quake was only 3 programmers, Quake 2 3 programmers, Build engine was one guy (with 1 additional programmer credites on Duke 3D)…and Witcher 3 was an in-house engine.

                                Modern game engines as easier than you think.

                                The tricky part is when you want to get into tooling and things to support your artists and designers.

                                1. 5

                                  See, this is the thing. Quake was only 3 programmers, Quake 2 3 programmers, Build engine was one guy (with 1 additional programmer credites on Duke 3D)…and Witcher 3 was an in-house engine.

                                  Modern game engines as easier than you think.

                                  I’m not sure how that supports your thesis, considering all but The Witcher 3 are games from the 90’s, and id’s games were mostly tech demos post-Doom.

                                  1. 3

                                    The Witness is a custom engine, Source and Source 2 were custom engines, Unreal itself was originally a custom engine, Minecraft was a custom engine, Payday and Payday 2 were custom engines…like, it’s a whole thing.

                                    1. 2

                                      Witness is a small-scale project, less time pressure.

                                      Source 2 is derived from Source which is derived from GoldSrc which is derived from Quake. They didn’t start from scratch.

                                      Minecraft is a bit of an outlier; again, single person, but it’s something you couldn’t really do with Unreal. (But it’s also Java and notorious for its technical flaws, so….)

                                      I believe Payday’s engine was used for other games by the studio/publisher, which amortized costs.

                                2. 1

                                  Godot is likely to fail every time you want to target a console platform :( Or do they have support for any of that yet?

                                  1.  

                                    It has been able to target the Xbox for a while. That’s the only console I own, so I haven’t paid attention to whether it can support anything else.

                                1. 19

                                  I really dislike the containers == performance meme. Containers on Linux have never been about performance. They were designed for packing as many jobs into each machine as possible. With enough machines, reducing overhead is worth the performance hit.

                                  VMs can’t share physical memory without invasive changes in the guest OS. And the guest OS itself has its own memory overhead. Containers solve these problems, at the expense of stressing scalability of the host OS internal data structures and mechanisms.

                                  Unless job packing saves you enough resources to hire at least one engineer to manage whatever orchestrator you choose, containers just add operational overhead.

                                  If containers make sense for your developer productivity, that’s great. Modern tooling affords plenty of conveniences. But they offer most people nothing for performance or scalability.

                                  1. 5

                                    I really dislike the containers == performance meme. Containers on Linux have never been about performance. They were designed for packing as many jobs into each machine as possible. With enough machines, reducing overhead is worth the performance hit.

                                    FWIW, I’ve never heard of this meme, and this idea runs contrary to almost every performance test we’ve done at $WORK while containerizing our microservices. Yes, putting multiple docker containers on a single box does provide a nice benefit (though there’s nothing stopping you from shipping multiple static binaries and managing them through a service manager), but this is just a bit of an extension to Amdahl’s Law, where you’re still limited by other factors on the box.

                                    At least with the folks I’ve talked to, containers have always been an ergonomic benefit, one often taken at the expense of some performance. At $WORK, we thoroughly performance test containers to see if the degradation is actually worth it for a given microservice .

                                    1. 4

                                      Containers are wonderful from an ops perspective where you can focus on putting software “somewhere” because it’s nicely contained.

                                      1. 20

                                        The big problem is that ‘containers’ overload a bunch of loosely-related things:

                                        • The abstract idea of a program, its configuration, and its dependencies all packaged together.
                                        • The concrete representation of that abstraction as a distribution format created from a bunch of composed layers.
                                        • A mechanism for building things in that format.
                                        • A set of isolation mechanisms for running things distributed in that format.

                                        The last bit is particularly bad because it’s completely different on different platforms. On Linux, it uses namespaces + cgroups + seccomp + string + duct tape + prayer. On macOS it use xhyve on top of the hypervvisor framework. On Windows it uses Hyper-V. On FreeBSD it uses jails or bhyve. These have very different performance characteristics. The original motivation for jails was that there would be less overhead from being able to share kernel services than start an entire new copy of the kernel, but the most recent versions of jails support having a separate copy of the network stack per jail because contention in the network stack was making it slower to run large numbers of jails than to run separate VMs. With memory ballooning / hot-plug, it’s quite feasible to spin up a load of VMs and have them dynamically adapt the amount of memory that they have. With modern container-optimised VM systems, each VM is actually a copy-on-write snapshot of a single image and so shares a load of kernel memory that doesn’t change between runs.

                                        The building mechanism is a somewhat unfortunate conflation because people often mean ‘Docker’, which is an awful imperative build system that makes all of the non-trivial things in software packaging (reproduceable builds, provenance auditing, and so on) harder than they should be.

                                        The distribution format isn’t great either, because it’s based on tarballs and so isn’t naturally amenable to any kind of live validation. You can check the hash matches for the tarball itself, but for reusable read-only layers you really want something like dm-verity that is able to check every block against tampering as you use it.

                                        1. 2

                                          The building mechanism is a somewhat unfortunate conflation because people often mean ‘Docker’, which is an awful imperative build system that makes all of the non-trivial things in software packaging (reproduceable builds, provenance auditing, and so on) harder than they should be.

                                          Practically speaking, how many folks don’t use Docker when deploying? Much like Linux is the de-facto OS for cloud services, Docker containers are the understood context for “containers”, and so the bullet points you list already have choices made for them.

                                          1.  

                                            To the best of my knowledge most large-scale deployments don’t use Docker. They use something like Kubernetes + containerd + runc, rather than anything using the Docker daemon or the Docker containerd shim. Docker is commonly used as the build tool, but other things are gaining popularity. In terms of deployment, VM-based isolation is increasingly common relative to the pile of namespace + cgroup + seccomp hacks, though Google is still pushing gVisor.

                                          2. 1

                                            Is there some reason these container based systems aren’t using shared memory for data transfer between machine-local components?

                                            Are they relying on the network stack for security & process isolation?

                                            1.  

                                              Using the network stack isn’t actually that bad for jail-to-jail communication: messages over the loopback interface can bypass a lot of the stack and have a bunch of fast paths. The contention issues are more to do with network-facing services. Whether you use fine-grained locking, RCU, or anything else, there are bits of the network stack that become contention points. With the VNET work, the only shared state between two jails’ network stacks is at the ethernet layer and that’s just a simple set of ring buffers, so scales much better than anything above it. Now that SSDs are common, containerised environments are starting to see the same problems with the storage stack: anything that’s giving ordering guarantees between bits of a global view of a filesystem can introduce contention and you’ll often see better performance from giving each container a separate block device and private filesystem.

                                          3. 8

                                            Just like an uberjar or a static binary. Ohh wait those do not require a 1million line additional runtime.

                                        1. 1

                                          I was just reading K&R C the other day. It seems in the first version C, declarations were optional. Object model in C is surprisingly elegant. If data and methods are separate then they can be evolved separately - only C allows this. On a trivial note, copy paste is better than Inheritance because the copied code can evolve separately instead of changing every time the base class changes.

                                          In terms of generality,

                                          Pointers > Lexical Scope

                                          Function Pointers > Closures, Virtual Methods

                                          Gotos > Exceptions

                                          Arrays, Structs > Objects

                                          Co-routines > Monads

                                          C with namespaces, pattern matching, garbage collection, generics, nested functions and defer is the C++ that I wish had happened. Go is good but I miss the syntax of C. I recently came across Pike scripting language which looks surprisingly clean.

                                          1. 6

                                            It seems in the first version C, declarations were optional.

                                            Yup, which sucked. It combined the lack of compiler checks of a dynamic language, with the data-corruption bugs of native code. For instance, what happens when you pass a long as the third argument, to a function whose implementation takes an int for that parameter? 😱

                                            Object model in C is surprisingly elegant. If data and methods are separate then they can be evolved separately - only C allows this.

                                            Maybe I’m unsure what you’re getting at, but many languages including Objective-C, Swift and Rust allow methods to be declared separately from the data, including adding more methods afterwards, even in separate binaries.

                                            copy paste is better than Inheritance because the copied code can evolve separately instead of changing every time the base class changes.

                                            But it’s worse than inheritance because, when you fix a bug in the copied code, you have to remember to also fix it every place it was pasted. I had a terrible time of this in an earlier job where I maintained a codebase written by an unrepentant copy/paster. This is the kind of nightmare that led to the DRY principle.

                                            1. 2

                                              For instance, what happens when you pass a long as the third argument, to a function whose implementation takes an int for that parameter? 😱

                                              Usually nothing, or rather, exactly what you would want 😀. Last I checked, K&R C requires function parameters to be converted to the largest matching integral type, so long and int get passed the same way. All floating point parameters get passed as double. In fact, I remember when ANSI-C came out that one of the consequences was that you could now have actual float parameters. Pointers are the same size anyway, no struct by value parameters.

                                              It still wasn’t all roses: messing up argument order or forgetting a parameter. Oops. So function prototypes: 👍😎

                                              #include <stdio.h>
                                              
                                              int a( a, b )
                                              int a;
                                              int b;
                                              {
                                                   return a+b;
                                              }
                                              
                                              
                                              int main()
                                              {
                                                  long c=12;
                                                  int b=3;
                                                  printf("%d\n",a(c,b));
                                              }
                                              
                                              [/tmp]cc -Wall hi.c
                                              [/tmp]./a.out 
                                              15
                                              
                                              1. 2

                                                Usually nothing, or rather, exactly what you would want 😀.

                                                Except, of course, when the sizes differed.

                                                1.  

                                                  No. The sizes do differ in the example. Once again: arguments are passed (and received) as the largest matching integral type.

                                                  I changed the printf() of the example to show this:

                                                  	printf("sizeof int: %ld sizeof long: %ld result: %d\n",sizeof b, sizeof c,a(c,b));
                                                  

                                                  Result:

                                                   sizeof int: 4 sizeof long: 8 result: 15
                                                  
                                              2. 1

                                                I don’t mean copy paste everything, use functions for DRY ofcourse … just to get the effect of inheritance copy paste is better. Inheritance, far from the notions of biology or taxonomy is similar to a lawyer contract that states all changes of A will be available to B just like land inheritance. Every time some maintainer changes a class in React, Angular, Ruby, Java, C++, Rust, Python frameworks and libraries everyone has to change their code. If for every release of a framework you have to rewrite your entire code, calling that code reuse is wrong and fraudulent. If we add any method, rename any method, change any implementation of any method that is not a trivial fix; we should create a new class instead of asking millions of developers to change their code.

                                                when you fix a bug in the copied code, you have to remember to also fix it every place it was pasted.

                                                If instead we used copy paste, there would be no inheritance hierarchy but just flattened code if that makes sense and you can modify it without affecting other developers. If we want to add new functionality to an existing class we should use something like plugins/delegation/mixins but never modify the base class … but absolutely no one uses or understands this pattern and everyone prefers to diddle with the base class.

                                                In C such massive rewrites won’t happen, because everything is manually wired instead of automatically being inherited. You can always define new methods without bothering if you are breaking someone’s precious interface. You can always nest structs and cast them to reuse code written for the previous struct. Combined with judicious use of function pointers and vtables you will never need to group data and code in classes.

                                                1. 6

                                                  Every time some maintainer changes a class in React, Angular, Ruby, Java, C++, Rust, Python frameworks and libraries everyone has to change their code.

                                                  That is simply not true. There are a lot of changes you can make to a class without requiring changes in subclasses. As a large-scale example, macOS and iOS frameworks (Objective-C and Swift) change in every single OS update, and the Apple engineers are very careful not to make changes that require client code to change, since end users expect that an OS update will not break their apps. This includes changes to classes that are ubiquitously subclassed by apps, like NSView, NSDocument, UIViewController, etc. I could say exactly the same thing about .NET or other Windows system libraries that use OOP.

                                                  I’m sure that in many open source projects the maintainers are sloppy about preserving source compatibility (let alone binary), because their ‘customers’ are themselves developers, so it’s easy to say “it’s easier to change the signature of this method and tell people to update their code”. But that’s more laziness (or “move fast and break stuff”) than a defining feature of inheritance.

                                                  In C such massive rewrites won’t happen

                                                  Yes, because everyone’s terrified of touching the code for fear of breaking stuff. I’ve used code like that.

                                                  1. 1

                                                    That is simply not true.

                                                    How ?

                                                    In C you would just create a new function and rightfully touching working code except for bug fixes is taboo. I can probably point to kernel drivers that use C vtables that haven’t been touched in 10 years. If you want to create an extensible function, use a function pointer. How many times has the sort function been reused ?

                                                    OO programmers claim that the average joe can write reusable code by simply using classes. If even the most well paid, professional programmers can’t write reusable code and writing OO code requires high training then we shouldn’t lie about OO being for the average programmer. Even if you hire highly trained programmers, code reuse is fragile requiring constant vigilance of the base classes and interfaces. Why bother with fragile base classes at all ?

                                                    Technically you can avoid this problem by never touching the base class and always adding new classes and interfaces. I think classes should have a version suffix but I don’t think it will be a popular idea and requires too much discipline. OO programmers on average prefer adding a fly method to a fish class as a quick fix to creating a bird class and thats just a disaster waiting to happen.

                                                    1. 5

                                                      I don’t understand why you posted that link. Apple release notes describe new features, and sometimes deprecations of APIs that they plan to remove in a year or two. They apply only to developers, of course; compiled apps continue to work unchanged.

                                                      OO is not trivial, but it’s much better than resorting to flat procedural APIs. Zillions of developers use it on iOS, Mac, .NET, and other platforms.

                                                      1. 1

                                                        My conclusion - OO is fragile and needs constant rewrites by developers who use OO code and procedural apis are resilient.

                                                        1. 9

                                                          Your conclusion is not supported by evidence. Look at a big, widely used, C library, such as ICU or libavcodec. You will have API deprecations and removals. Both of these projects do it nicely so you have foo2(), foo3() and so on. In OO APIs, the same thing happens, you add new methods and deprecate the old ones over time. For things like glib or gtk, the churn is even more pronounced.

                                                          OO covers a variety of different implementation strategies. C++ is a thin wrapper around C: with the exception of exceptions, everything in C++ can be translated to C (in the case of templates, a lot more C) and so C++ objects are exactly like C structs. If a C/C++ struct is exposed in a header then you can’t add or remove fields without affecting consumers because in both languages a struct can be embedded in another and the size and offsets are compiled into the binary.

                                                          In C, you use the opaque pointers idiom to avoid this. In C++ you use the pImpl pattern, where you have a public class and a pointer to an implementation. Both of these require an extra indirection. You can also avoid this in C++ by making the constructor for your class private and having factory methods. If you do this, then only removing fields modifies your ABI, because nothing outside of your library can allocate it. This lets you put fast paths in the header that directly access fields, without imposing an ABI / API contract that prevents adding fields.

                                                          In C++, virtual methods are looked up by vtable offset, so you can’t remove virtual functions and you can’t add virtual functions if your class is subclassed. You also can’t change the signature of any existing virtual methods. You can; however, add non-virtual methods because these do not take place in dynamic dispatch and so are exactly the same as C functions that take the object pointer as the first parameter.

                                                          In a more rigid discipline, such as COM, the object model doesn’t allow directly exposing fields and freezes interfaces after creation. This is how most OO APIs are exposed on Windows and we (Microsoft) have been able to maintain source and binary compatibility with programs using these APIs for almost three decades.

                                                          In Objective-C, fields (instance variables) are looked up via an indirection layer. Roughly speaking, for each field there’s a global variable that tells you its offset. If you declare a field as having package visibility then the offset variable is not exposed from your library and so can’t be named. Methods are looked up via a dynamic dispatch mechanism that doesn’t use fixed vtable offsets and so you are able to add both fields and methods without changing your downstream ABI. This is also true for anything that uses JIT or install-time compilation (Java, .NET).

                                                          You raise the problem of behaviour being automatically inherited, but this is an issue related to the underlying problem, not with the OO framing. If you are just consuming types from a library then this isn’t an issue. If you are providing types to a library (e.g. a way of representing a string that’s efficient for your use or a new kind of control in a GUI, for example), then the library will need to perform operations on that type. A new version of the library may need to perform more operations on that type. If your code doesn’t provide them, then it needs to provide some kind of default. In C, you’d do this with a struct containing callback function pointers that carried its size (or a version cookie) in the first field, so that you could dispatch to some generic code in your functions if the library consumer didn’t provide an implementation. If you’re writing in an OO language then you’ll just provide a default implementation in the superclass.

                                                          Oh, and you don’t say what kernel you’re referring to. I can point to code in Linux that’s needed to be rewritten between minor revisions of the kernel because a C API changed. I can point to C++ code in the XNU kernel that hasn’t changed since the first macOS release when it was rewritten from Objective-C to C++. Good software engineering is hard. OO is not a magic bullet but going back to ‘70s-style designs doesn’t avoid the problems unless you’re also willing to avoid writing anything beyond the complexity of things that were possible in the ’70s. Software is now a lot more complex than it was back then. The Version 6 UNIX release was only about 83KLoC: individual components of clang are larger than that today.

                                                          1. 0

                                                            Your conclusion is not supported by evidence.

                                                            It absolutely is. Please reuse code from an earlier version of any framework released in the last 50 years. OO was sold as the magic bullet that will solve all reuse and software engineering problems.

                                                            Do you think homeopathy is medicine just because people dress up and play the role of doctors doing science ?

                                                            How many times has the sort function been reused by using function pointers ? Washing machines don’t make clothes dirtier than the clothes you put in.

                                                            Both of these projects do it nicely so you have foo2(), foo3() and so on.

                                                            If they are doing it that way, then thats the way to go. Function signatures are the only stable interface you need. Don’t use fragile interfaces, classes and force developers to rewrite every time a new framework is released because someone renamed a method.

                                                            For the rest of your arguments, why even bother with someone else’s vtables when you can build your own, trivially.

                                                            My point is simply this - How is rewriting code, code reuse ?

                                                            1. 5

                                                              It absolutely is. Please reuse code from an earlier version of any framework released in the last 50 years.

                                                              This is what Windows and Mac OS programmers do every day. My experience with COM is the Windows APIs built on it have great API/ABI stability.

                                                              1. 1

                                                                I don’t know much about COM but if it provides API/ABI stability then that’s great and thats what I am complaining about here. It seems to be an IPC of sorts, how would it compare to REST which can be implemented on top of basic functions ?

                                                                1. 5

                                                                  COM is a language-agnostic ABI. for exposing object oriented interfaces. It has been used to provide stable ABIs for object oriented interfaces for around 30 years to Windows APIs. It is not an IPC mechanism, it is a binary representation. It is a strong counter example to your claim that OO APIs cannot be made stable (and one that I mentioned already in the other thread).

                                                                  1. 4

                                                                    I’m not sure about the IPC parts (there is a degree of “hosting”); however, DCOM provides RPC with COM.

                                                                2. 6

                                                                  It absolutely is. Please reuse code from an earlier version of any framework released in the last 50 years. OO was sold as the magic bullet that will solve all reuse and software engineering problems.

                                                                  I’ve reused code written in C, C++, and Objective-C over multiple decades. Of these, Objective-C is by a very large margin the one that caused the fewest problems. Your argument is ‘OO was oversold, so let’s use the approach that was used back when people found the problems that motivated the introduction of OO’.

                                                                  How many times has the sort function been reused by using function pointers ? Washing machines don’t make clothes dirtier than the clothes you put in.

                                                                  I don’t know what this means. Are you trying to claim that C standard library qsort is the pinnacle of API design? It provides a compare function, but not a swap function so if your structures require any kind of copying between a byte-by-byte copy then it’s a problem. How do you reuse C’s qsort with a data type that isn’t a contiguous buffer? With C++‘s std::sort (which doesn’t use function pointers), you can sort any data structure that supports random access iteration.

                                                                  If they are doing it that way, then thats the way to go. Function signatures are the only stable interface you need.

                                                                  That’s true, if your library is producing types but not consuming them. If code in your library needs to call into code provided by library consumers, then this is not the case. Purely procedural C interfaces are easy to keep backwards compatible if they are not doing very much. The zlib interface, for example, is pretty trivial: consume a buffer, produce a buffer. The more complex a library is, the harder it is to maintain a stable API. OO gives you some tools that help, but it doesn’t solve the problem magically.

                                                                  Don’t use fragile interfaces, classes and force developers to rewrite every time a new framework is released because someone renamed a method.

                                                                  Absolutely none of that is intrinsic to OO. If you rename a C struct field or a function, people will need to rewrite their code. The set of things that you can break without breaking compatibility is strictly larger in an OO language than in a purely procedural language.

                                                                  For the rest of your arguments, why even bother with someone else’s vtables when you can build your own, trivially.

                                                                  Why use any language feature when you can just roll your own in macro assembly?

                                                                  • Compilers are aware of the semantics and so can perform better optimisations.
                                                                  • Compilers are aware of the semantics and so can give better error messages.
                                                                  • Compilers are aware of the semantics and so can do better type checking.
                                                                  • Consistency across implementations: C library X and C library Y use different idioms for vtables (e.g. compare ICU and glib: two completely different vtable models). Library users need to learn each one, increasing their cognitive burden. Any two libraries in the same OO language will use the same dispatch mechanism.

                                                                  My point is simply this - How is rewriting code, code reuse ?

                                                                  Far better in OO languages (and far better in hybrid languages that provide OO and generic abstractions) than in purely procedural ones. This isn’t the ’80s anymore. No one is claiming that OO is a magic bullet that solves all of your problems.

                                                                  1. 1

                                                                    Are you trying to claim that C standard library qsort is the pinnacle of API design?

                                                                    Personal attacks are not welcomed in this forum or any forum. If you can’t use technical arguments to debate you are never going to win.

                                                                    It is an example of code reuse that absolutely doesn’t break.

                                                                    Absolutely none of that is intrinsic to OO. If you rename a C struct field or a function, people will need to rewrite their code.

                                                                    It is absolutely intrinsic to OO because interfaces, classes are multiple level deep. It is a fractal of bad design. Change one thing everything breaks.

                                                                    There is a strong culture of not breaking interfaces in C and using versioning but the opposite is true for OO where changing the base class and interface happens for every release. Do you actually have fun rewriting code between every new release of an MVC framework ?

                                                                    Why use any language feature when you can just roll your own in macro assembly?

                                                                    Again, personal attacks are not welcomed in this forum or any forum.

                                                                    Vtables are trivial. They are not a new feature. All your optimzations can equally apply to vtables.

                                                                    This isn’t the ’80s anymore.

                                                                    Lies don’t become truths just because time has passed.

                                                                    If code in your library needs to call into code provided by library consumers, then this is not the case.

                                                                    Use function pointers to provide hooks or I am missing something.

                                                                    OO is fragile. Procedural code is resilient.

                                                                    1. 6

                                                                      Are you trying to claim that C standard library qsort is the pinnacle of API design?

                                                                      Personal attacks are not welcomed in this forum or any forum. If you can’t use technical arguments to debate you are never going to win.

                                                                      That was not an ad hominem, that was an attempt to clarify your claims. It was unclear what you were claiming with references to a sort function. An ad hominem attack looks more like this:

                                                                      Do you think homeopathy is medicine just because people dress up and play the role of doctors doing science ?

                                                                      This is an ad hominem attack and one that I ignored when you made it, because I’m attempting to have a discussion on technical aspects.

                                                                      It is an example of code reuse that absolutely doesn’t break.

                                                                      It’s also an example of an interface with trivial semantics (it’s covered in the first term of most undergraduate computer science course) and whose requirements have been stable for longer than C has been around. The C++ std::sort template is also stable and defaults to using OO interfaces for defining the comparison (overloads of the compare operators). The Objective-C -sort family of methods on the standard collection classes are also unchanged since they were standardised in 1992. The Smalltalk equivalents have remained stable since 1980.

                                                                      You have successfully demonstrated that it’s possible to write stable APIs in situations where the requirements are stable. That’s orthogonal to OO vs procedural. If you want to produce a compelling example, please present something where a C library has changed the semantics of how it interacts with a type provided by the library consumer (for example a plug-in filter to a video processing library, a custom view in a GUI, or similar) and an OO library making the same change has required more code modification.

                                                                      Absolutely none of that is intrinsic to OO. If you rename a C struct field or a function, people will need to rewrite their code.

                                                                      It is absolutely intrinsic to OO because interfaces, classes are multiple level deep. It is a fractal of bad design. Change one thing everything breaks.

                                                                      This is an assertion, but it is not supported by evidence. I have provided examples of the same kinds of breaking changes being required in widely used C libraries that do non-trivial things. You have made a few claims here:

                                                                      • Something about interfaces. I’m not sure what this is, but COM objects are defined in terms of interfaces and Microsoft is still able to support the same interfaces in 2021 that we were shipping for Windows 3.1 (though since we no longer support 16-bit binaries these required a recompile at some point between 1995 and now).
                                                                      • Classes are multiple levels deep. This is something that OO enables, but not something that it requires. The original GoF design patterns book recommended favouring composition over inheritance and some OO languages don’t even support inheritance. Most modern C++ style guides favour composition with templates over inheritance. Inheritance is useful when you want to define a subtype relationship with code reuse.
                                                                      • Something (OO in general? A specific set of OO patterns? Some OO library that you don’t like?) is a fractal of bad design. This is an emotive and subjective claim, not one that you have supported. Compare your posts with the article that I believe coined that phrase: It contains dozens of examples of features in PHP that compose poorly.

                                                                      There is a strong culture of not breaking interfaces in C and using versioning but the opposite is true for OO where changing the base class and interface happens for every release. Do you actually have fun rewriting code between every new release of an MVC framework ?

                                                                      You’re comparing culture, not language features. You can write code today against the OpenStep specification from 1992 that will compile and run fine on modern macOS with Cocoa (I know of some code that has been through this process). That’s an OO MVC API that’s retained source compatibility for almost 30 years. The only breaking changes were the switch from int to NSInteger for better support for 64/32-bit compatibility and these changes also affected the purely procedural APIs. They were not breaking changes for code targeting 32-bit platforms. The changes over the ’90s in the Classic MacOS Toolbox (C APIs) were far more invasive.

                                                                      A lot of JavaScript frameworks and pretty much everything from Google make breaking API changes every few months but that’s an issue of developer culture, not one of the language abstractions.

                                                                      Why use any language feature when you can just roll your own in macro assembly?

                                                                      Again, personal attacks are not welcomed in this forum or any forum.

                                                                      This is not a personal attack. It is your point. You are saying that you should not use a feature of a language because you can implement it in a lower-level language. Why stop at vtables?

                                                                      Vtables are trivial. They are not a new feature. All your optimzations can equally apply to vtables.

                                                                      No they can’t. It is undefined behaviour to write to the vtable pointer in a C++ object for the lifetime of an object. Modern C++ compilers use this optimisation for devirtualisation. If the concrete type of a C++ object is known at compile time (after inlining) then calls to virtual functions can be replaced with direct calls.

                                                                      Here is a reduced example. The C version with custom vtables is called in the function can_not_inline the C++ version using C++ vtables is called in the function can_inline. In both cases, the object is passed to a function that the compiler can’t see before the call. In the C case, the language semantics allow this to modify the vtable pointer, in the C++ case they do not. This means that the C++ version knows that the foo call has a specific target, the C version must be conservative. The C++ version can then inline the call, which doesn’t do anything in this trivial example and so elides it completely.

                                                                      This isn’t the ’80s anymore.

                                                                      Lies don’t become truths just because time has passed.

                                                                      No, but claims that were believed to be true and were debunked are no longer claimed. In the ’80s, OO was claimed to be a panacea that solved all problems. That turned out to be untrue. Like many other things in programming, it is a set of useful tools that can be applied to make things better or worse.

                                                                      If code in your library needs to call into code provided by library consumers, then this is not the case.

                                                                      Use function pointers to provide hooks or I am missing something.

                                                                      You are missing a lot of detail. Yes, you can provide function pointers as hooks. Now what happens when a new version of your library needs to add a new hook? What happens when that hook interacts in subtle ways with the others? These are the kinds of problems that make OO APIs fragile, but they also make procedural APIs fragile.

                                                                      OO is fragile. Procedural code is resilient.

                                                                      Assertions are not evidence. Assertions that contradict the experience of folks who have been working with these APIs for decades need strong evidence.

                                                                      1. 0

                                                                        The only breaking changes were the switch from int to NSInteger for better support for 64/32-bit compatibility and these changes also affected the purely procedural APIs.

                                                                        And that doesn’t count as evidence. Please read what I wrote. OO programmers constantly rename things to break backwards compatibility for no good reason at all. Code rewrite is not code reuse, by definition. Do C programmers do this ?

                                                                        We are discussing how C does things and maintains backwards compatibility not COM. You say COM and I say POSIX / libc which is older. The fact that you cite COM is in-itself proof that objects are insufficient.

                                                                        In Python3 … print was made into a function and almost overnight 100% code was made useless. This is the daily life of OO programmers for the release of every major version of a framework.

                                                                        In database how many times do you change the schema ? Well structs and classes are like schema. Inheritance changes the schema. Interface renames change the schema. Changing method names is like changing the column name. Just like in database design you should not change the schema but use foreign keys to extend the tables with additional data. Perhaps OO needs a new “View” layer like SQL.

                                                                        No, but claims that were believed to be true and were debunked are no longer claimed …. like many other things in programming, it is a set of useful tools that can be applied to make things better or worse.

                                                                        The keyword is “debunked” like snake oil.

                                                                        I propose mandatory version suffix for all classes to avoid this. The compiler creates a new class for every change made to a class, no matter how small. If you are changing the class substantially create a completely new name, don’t ship it by the same name and break all code. For ABI do something like COM if that worked.

                                                                        These are the kinds of problems that make OO APIs fragile, but they also make procedural APIs fragile.

                                                                        You are right. They make procedural APIs using vtables fragile, not to mention slow. So use it sparingly ? 99% of code should be procedural. I only see vtables being useful in creating bags of event handlers.

                                                                        1. 7

                                                                          The only breaking changes were the switch from int to NSInteger for better support for 64/32-bit compatibility and these changes also affected the purely procedural APIs.

                                                                          And that doesn’t count as evidence. Please read what I wrote. OO programmers constantly rename things to break backwards compatibility for no good reason at all. Code rewrite is not code reuse, by definition. Do C programmers do this ?

                                                                          You’ve now changed your argument. You were saying that OO is fragile, now you’re saying that OO programmers (which OO programmers?) rename things and that breaks things. Okay, but if procedural programmers rename things that also breaks things. So now you’re not talking about OO in general, you’re talking about some specific examples of OO (but you’re not naming them). You’ve been given examples of widely used rich OO APIs that have retained huge degrees of backwards compatibility, so your argument seems now to be nothing to do with OO in general but an attack on some unspecified people that you don’t like who write bad code.

                                                                          We are discussing how C does things and maintains backwards compatibility not COM. You say COM and I say POSIX / libc which is older. The fact that you cite COM is in-itself proof that objects are insufficient.

                                                                          Huh? COM is a standard for representing objects that can be shared across different languages. I also cited OpenStep / Cocoa (the latter is an implementation of the former), which uses the Objective-C object model.

                                                                          POSIX provides a much simpler set of abstractions than either of these. If you want to compare something equivalent, how about GTK? It’s a C library that’s a bit newer than POSIX but that lets you do roughly the same set of things as OpenStep. How many GTK applications from even 10 years ago work with a modern version of GTK without modification? GTK 1 to GTK 2 and GTK 2 to GTK 3 both introduced significant backwards compatibility breaks.

                                                                          In Python3 … print was made into a function and almost overnight 100% code was made useless. This is the daily life of OO programmers for the release of every major version of a framework.

                                                                          Wait, so your argument is that a procedural API, in a multi-paradigm language changed, which broke everything, and that’s a reason why OO is bad?

                                                                          In database how many times do you change the schema ? Well structs and classes are like schema. Inheritance changes the schema. Interface renames change the schema. Changing method names is like changing the column name. Just like in database design you should not change the schema but use foreign keys to extend the tables with additional data. Perhaps OO needs a new “View” layer like SQL.

                                                                          I don’t even know where to go with that. OO provides away of expressing the schema. The schema doesn’t change because of OO, the schema changes because the requirements change. OO provides mechanisms for constraining the impact of that change.

                                                                          Again, your argument seems to be:

                                                                          1. There exists a set of things in OO that, if modified, break backwards compatibility.
                                                                          2. People who write OO code will change these things
                                                                          3. OO is bad.

                                                                          But it’s also possible to say the same thing with OO replaced with procedural, functional, generic, or any other style of programming. If you want to make this point convincingly then you need to demonstrate that the set of things that break backwards compatibility in OO are more likely to be changed than in another style. So far, you have made a lot of assertions, but where I have presented examples of OO APIs with a long history of backwards compatibility and procedural APIs performing equivalent things with weaker guarantees, you have failed to present any examples.

                                                                          I propose mandatory version suffix for all classes to avoid this.

                                                                          So, like COM?

                                                                          The compiler creates a new class for every change made to a class, no matter how small. If you are changing the class substantially create a completely new name, don’t ship it by the same name and break all code.

                                                                          So, like COM?

                                                                          For ABI do something like COM if that worked.

                                                                          So, you want COM? But you want COM without OO? In spite of the fact that COM is an OO standard?

                                                                          These are the kinds of problems that make OO APIs fragile, but they also make procedural APIs fragile.

                                                                          You are right. They make procedural APIs using vtables fragile, not to mention slow. So use it sparingly ? 99% of code should be procedural. I only see vtables being useful in creating bags of event handlers.

                                                                          It’s not just about vtables, it’s about any kind of rich abstraction that introduces coupling between the producers and consumers of an interface.

                                                                          Let’s go back to the C sort function that you liked. There’s a C standard qsort. Let’s say you want to sort an array of strings by their locale-aware order. It has a callback, so you can define a comparison function. Now you want to sort an array that has an external indexing structure for quickly finding the first entry with a particular prefix. Ooops, qsort doesn’t have any kind of hook for defining how to do the move or for receiving a notification when things are moved, so you can’t keep the data structure up to date, you need to recalculate it after the sort. After a while, you realise that resizing the array is expensive and so you replace it with a skip list. Oh dear, qsort can’t sort anything other than an array, so you now have to implement your own sorting function.

                                                                          Compare that to C++‘s std::sort. It is given two random-access iterators. These are objects that define how to access the start and end of some collection. If I need to update some other data structure when entries in the list move, then I overload their copy or move constructors to do this. The iterators know how to move through the collection, so when I move to a skip list I don’t even have to modify the call to std::stort, I just modify the begin() and end() methods on my data structure.

                                                                          I am lazy. I regularly work on projects with millions of lines of code. I want to write the smallest amount of code possible to achieve my goal and I want to have to modify the smallest amount of code when the requirements change. Object orientation gives me some great tools for this. So does generic programming. Pure procedural programming would make my life much harder and I don’t like inflicting pain on myself, so I avoid it where possible.

                                                                          1. 5

                                                                            You have the patience of a saint to continue arguing with this person as they continue to disregard your experience. I certainly don’t have the stamina for it, but despite the bizarreness of the slapfight, your replies are really insightful when it comes to API design.

                                                                            1.  

                                                                              I had a lot of the same misconceptions (and complete conviction that I was right) in my early ‘20s, and I am very grateful to the folks who had the patience to educate me. In hindsight, I’m astonished that they put up with me.

                                                                            2. 1

                                                                              This page lists all the changes in Objective C since the last 10 years. Plenty of renames.

                                                                              I think more languages could benefit from COM’s techniques but I don’t think it is a part of the C++ core. I would use a minimal and flexible version of it but it seems to be doing way too many Win32 specific things.

                                                                              1. 4

                                                                                As @david_chisnall has pointed out many times already, this has nothing to do with OO. GTK has exhibited the exact same thing. GCC has done something similar with its internals. Renaming things such that code that relies on the API having to change has nothing at all to do with any specific programming paradigm.

                                                                                Please stop your screed on this topic. It’s pretty clear from the discussion you are not grasping what is being said. I urge you to spend some time and study the replies above.

                                                                                1.  

                                                                                  Fine. I would compare GUI development with Tk which is more idiomatic in C.

                                                                                  As I have pointed out if people used versioning for interfaces things won’t break every time an architecture astronaut or an undisciplined programmer change a name, amplifying code rewrites. It is clear that the problem applies to vtables as well and naming in general and not solved within OO which exasperates the effects of simple changes.

                                                                3. 6

                                                                  You can conclude whatever you like, but after taking a look at your blog, I’m going to back away slowly from this discussion and find a better use for my time. Best of luck with your jihad.

                                                                  1. 2

                                                                    Glad you discovered my blog. I’d recommend you start with Simula the Misunderstood. The language is a bit coarse though. The entire discussion has however inspired me to write - Interfaces a fractal of bad design. I see myself more like James Randi exposing homeopathy, superstitions, faith healers and fortune telling.

                                                    1. 3

                                                      A very welcome development.

                                                      I am hopeful this will translate to support for the imminent very cheap first wave of SBCs based on Allwinner D1.

                                                      These will not have much RAM, and the likes of Linux will crawl on them. If alternatives work any better (as should be the case), this will help build understanding of Linux not being the “end it all” in OS design, a silly mentality that’s nonetheless widespread.

                                                      1. 3

                                                        I suspect that attitude has a lot to do with software (mostly browsers, but other things as well) being available on Linux. Last I checked, Haiku is stuck on some ancient version of Firefox. To me, and most other people, I suspect, that relegates to it “toy operating system” status. It’s a very impressive and fascinating toy, but a toy nonetheless.

                                                        1. 1

                                                          Favorite piece of software not available

                                                          “toy operating system” status

                                                          I disagree, but that’s fair. We don’t all see the world the same way. That is a good thing.

                                                          1. 4

                                                            Characterising a modern web browser as ‘someone’s favourite piece of software’ is a bit misleading. For a desktop operating system, you can work around pretty much everything except a modern browser being available.

                                                            1. 2

                                                              Sure, but it’s not the OS developer’s fault that web standards got so complicated or that those who make browsers do not care about operating systems that aren’t already mainstream.

                                                              Branding these unfavored OSs with Toy status isn’t just insulting these efforts to advance the field, but also defeatist: A VM, or a remote desktop or VNC client connected somewhere with a web browser will make do. I am not alien to doing this, and it ends up being quite survivable. This is how web browsing was done on e.g. Genode, until they got a port of a webkit-based browser running.

                                                              For that matter, you’d have to give up on RISC-V too (and call it a Toy ISA), as even if you were to run Linux on a RISC-V processor, Chrome/Firefox aren’t running, yet.

                                                        2. 2

                                                          I’ve had access to a 1.0 GHz D1 Eval Board with 512 MB for a couple of weeks. With just people ssh’ing into it there is plenty of free memory:

                                                          Tasks:  70 total,   1 running,  69 sleeping,   0 stopped,   0 zombie
                                                          %Cpu(s): 40.9 us, 59.1 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
                                                          MiB Mem :    488.3 total,    199.0 free,     36.7 used,    252.7 buff/cache
                                                          MiB Swap:      0.0 total,      0.0 free,      0.0 used.    440.4 avail Mem 
                                                          
                                                            PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND   
                                                            729 root      20   0  199984   2496   2088 S  84.2   0.5   2845:18 tt        
                                                          23793 sipeed    20   0    5152   2544   2004 R  10.5   0.5   0:00.05 top       
                                                              1 root      20   0    2092    600    212 S   0.0   0.1   0:07.89 procd     
                                                          
                                                          sipeed@MaixLinux:~$ uptime 
                                                           07:35:09 up 2 days, 22:52,  2 users,  load average: 2.16, 2.15, 2.10
                                                          

                                                          For command line use it’s fine with 512 MB. I expect for basic X with emacs, xterms etc that would be fine too. Of course a real web browser would eat RAM and you’d want a $50 board with 2 GB (my guess) not a $10 or $12 board with 256 MB or 512 MB. Those will be excellent competitors to the Pi Zero.

                                                          1. 1

                                                            Those will be excellent competitors to the Pi Zero.

                                                            Depending on power efficiency, they should compete well against Pi 1 and Pi 2, which I understand are still being made.

                                                            These boards are the better Raspberries in many regards, such as their reliability (they are very mature and well understood by now) and power draw.

                                                        1. 8

                                                          When I first learned about ZFS snapshots, it really bugged me that I can’t atomically snapshot a set of filesystems and have to do it one at a time (unless they are all children of a single parent). I recently learned that this is purely a limitation of the userspace tools. The underlying ioctl takes an nvlist of snapshots and will atomically create them all at once. I have no idea why this wasn’t exposed in the tooling. The advice on FreeBSD is to have everything in /usr/local as a single ZFS dataset for exactly this reason: you want to atomically snapshot it before and after upgrades.

                                                          1.  

                                                            Would a recursive snapshot followed by a delete of the unwanted snapshots do the same?

                                                          1. 1

                                                            I think using plain char for something that is not text was the first mistake. If you use unsigned char you can also cast the value to the size you need and shift.

                                                            static uint32_t read32be(const unsigned char *p)
                                                            {
                                                            	return ((uint32_t) p[0] << 24)
                                                            	     | ((uint32_t) p[1] << 16)
                                                            	     | ((uint32_t) p[2] << 8)
                                                            	     | ((uint32_t) p[3]);
                                                            }
                                                            

                                                            Interestingly (unlike Clang and GCC) MSVC does not appear to be able to recognize these patterns and generate the bswap.

                                                            1. 2

                                                              If you want bytes, use uint8_t, not unsigned char. See, sizeof(char) is not fully specified in C. Some actual architectures in current use (DSP) do not support byte level addressing, and on those machines the width of char can actually be 32 bits. (Of course, on those machines uint8_t would not even compile, but that’s kind of the point: if you can’t have bytes, you need to rethink your serialization code.)

                                                              1. 1

                                                                While I agree in theory, I believe the standard does not guarantee that uint8_t is a character type, which means you could get in trouble with strict aliasing if a compiler vendor goes crazy. For storing bytes uint8_t is great, but for accessing bytes (like in the function above), unsigned char is safer. You can always check if CHAR_BIT is 8.

                                                                1. 3

                                                                  I believe the standard does not guarantee that uint8_t is a character type,

                                                                  It indeed does not guarantees that, and in practice sanitisers do warn me about careless casting from uint8_t.

                                                                  which means you could get in trouble with strict aliasing if a compiler vendor goes crazy.

                                                                  It can indeed be a problem if we do something like this:

                                                                  transform(uint32_t *out, const uint8_t *in); // strict aliasing
                                                                  
                                                                  uint8_t data[32];
                                                                  read(file, data, 32);
                                                                  transform(data, data); // strict aliasing violation!!
                                                                  

                                                                  To get to that however, we’d have to be a little dirty. And to be honest, as much as I hate having way too much undefined behaviour in C, I do like the performance improvements that come with strict aliasing. Besides, while we could turn off strict aliasing by using unsigned char here, there’s no way we could turn it off in a case like this:

                                                                  transform2(struct foo *out, const uint32_t *in);
                                                                  

                                                                  Now some C user might indeed be surprised by the fact that strict aliasing applies to uint8_t, even though it invariably has the same representation as unsigned char (at least on 2’s complement machines, which comprise every single machine in active use). That is indeed unfortunate. An API designer however may still set those expectations right:

                                                                  transform(uint32_t *out, const uint8_t * restrict in);
                                                                  
                                                                  1. 1

                                                                    Where is that written? “A typedef declaration does not introduce a new type” and is “for syntactic convenience only” quoth ANSI X3.159-1988. The uint8_t type isn’t the uint8_least_t type so if it’s available then it must be char, unless your environment defines char as fewer than 8 bits and defines either short int or long to be 8-bits, which is about as likely as your code being compiled on a Setun.

                                                                    1. 2

                                                                      You’d have to guarantee that uint8_t comes from a typedef in the first place, and the standard provides no such guarantee. Yes, in practice this will be a typedef, but that typedef is defined in a standard header, so I’m not sure that actually counts. As far as I know, compilers are allowed to special-case this type and pretend it does not come from a typedef, so they can enable strict aliasing.

                                                                      1. 1

                                                                        Where is that written?

                                                                        1. 1

                                                                          It’s not, that I know of. And with the C standard, if it’s not written, it’s not guaranteed.

                                                                          1. 1

                                                                            How would you know? You’re only speaking for yourself.

                                                                            1. 1

                                                                              You go find that place in the standard that says uint8_t is a character type. I’m not going to copy & paste 700 pages to show you it’s not there. You wouldn’t read them even if I could. You on the other hand could easily disprove my claim with a couple short citations. Please take the effort to do so.

                                                                              Ninja Edit: what do you know, it looks like we can disprove my claim after all. From the C99 standard, §7.20.4:

                                                                              For each type described herein that the implementation provides, <stdint.h> shall declare that typedef name and defined the associated macros. […]

                                                                              That seems to mean that we have to use typedef to represent an uint8_t, and as far as I know your reasoning that it has to be unsigned char is sound as far as I can tell. I’ve tested the following under all sanitizers I could find (including the TIS interpreter), they find nothing wrong with it:

                                                                              #include <stdio.h>
                                                                              #include <string.h>
                                                                              #include <inttypes.h>
                                                                              
                                                                              int main()
                                                                              {
                                                                                  uint32_t x = 42;
                                                                                  uint32_t y;
                                                                                  uint8_t t8[4];
                                                                                  memcpy(t8, &x, sizeof(uint32_t));
                                                                                  memcpy(&y, t8, sizeof(uint32_t));
                                                                                  printf("x = %" PRIu32 "\n", x);
                                                                                  printf("y = %" PRIu32 "\n", y);
                                                                                  return 0;
                                                                              }
                                                                              

                                                                              (Now my problem is that they find nothing wrong with it even if I replace the uint8_t buffer with an uint16_t buffer.)

                                                                              I stand corrected, my apologies.

                                                                              1. 2

                                                                                Thanks for researching that. I did a bit more research and I think uint8_t being non-char is unlikely for different reasons now. The standard says char can be 7+ bits[1] that short/int/long can be any size greater than or equal to char, but must be a multiple of the size of char. Therefore uint8_t and uint_least8_t can only be defined in an environment where char is eight bits. Because if char was 7 bits then short wouldn’t be able to be 8 bits since it’s not a multiple of 7. The only legal way for uint8_t to be short would be if the environment defined both char and short as being 8-bits and then the C library author chose to use short when defining the typedef instead to torture us. Here is the relevant text from the standard:

                                                                                 * Byte --- the unit of data storage in the execution environment
                                                                                   large enough to hold any member of the basic character set of the
                                                                                   execution environment.
                                                                                ...
                                                                                   Both the basic source and basic execution character sets shall
                                                                                have at least the following members: the 26 upper-case letters of
                                                                                the English alphabet [...] the 26 lower-case letters of the English
                                                                                alphabet [...] the 10 decimal digits [...] the following 29 graphic
                                                                                characters [...] the space character, and control characters
                                                                                representing horizontal tab, vertical tab, and form feed. In both
                                                                                the source and execution basic character sets, the value of each
                                                                                character after 0 in the above list of decimal digits shall be one
                                                                                greater than the value of the previous. [...] In the execution
                                                                                character set, there shall be control characters representing alert,
                                                                                backspace, carriage return, and new line.
                                                                                ...
                                                                                 * Character --- a single byte representing a member of the basic
                                                                                   character set of either the source or the execution environment.
                                                                                ...
                                                                                   There are four signed integer types, designated as signed char,
                                                                                short int, int, and long int.
                                                                                ...
                                                                                In the list of signed integer types above, the range of values of
                                                                                each type is a subrange of the values of the next type in the list.
                                                                                ...
                                                                                   For each of the signed integer types, there is a corresponding (but
                                                                                different) unsigned integer type (designated with the keyword unsigned)
                                                                                that uses the same amount of storage (including sign information)
                                                                                ...
                                                                                2 The sizeof operator yields the size (in bytes) of its operand, which may
                                                                                be an expression or the parenthesized name of a type. The size is
                                                                                ...
                                                                                3 When [sizeof is] to an operand that has type char, unsigned char, or
                                                                                signed char, (or a qualified version thereof) the result is 1.
                                                                                ...
                                                                                requirement that objects of a particular type be located on storage
                                                                                boundaries with addresses that are particular multiples of a byte address
                                                                                

                                                                                [1] char must be 7+ bits because the standard specifies exactly 100 values which it says needs to be representable in char. Fun fact: that set of legal characters per ANSI X3.159-1988 is basically everything you’d expect from ASCII except $@` which the standard defines as undefined behavior lool Maybe C20 or whatever the next one is should use those for bsr, bsf, and popcnt

                                                                                Edit: It makes sense that @` weren’t required since their positions in the ASCII table kept being redefined between the ASA X3.4-1963 and USAS x3.4-1967 standards. Not sure what the rationale is for dollar. The ANSI C89 standard also has text saying that dollar may be used in identifiers, along with anything else, but it isn’t mandatory. GNU lets us use dollar identifiers which is cool although I wish they let us use UNICODE symbols too.

                                                                                1. 2

                                                                                  Note that, although it has to be a typedef, it doesn’t have to be a typedef of a standard type. For example, in CHERI C, intptr_t is a typedef of the built-in type __incap_t. This is permitted by the standard (as far as we could tell) in the same way that it’s permitted for intmax_t to be __int128_t or __int256_t or whatever on systems that expose these as non-standard types.

                                                                                  1. 1

                                                                                    Shit, so that means we could have a build in __unsigned_octet_t type that’s not unsigned char, and and alias uint8_t to that?

                                                                                    That would invalidate the whole aliasing assumption.

                                                                                    1. 1

                                                                                      intmax_t being 64-bit in gnu system five environments always seemed to me like the biggest contradiction with the wording of the standard. cosmopolitan libc defines intmax_t as __int128 for that reason but i’ve often wondered if that’s wise. Do you know off hand if there’s any other environments doing that?

                                                                                      1. 1

                                                                                        intmax_t is defined as int64_t because __int128 didn’t exist in the ‘90s (when most things were 32-bit and a lot of platforms that GNU and BSD systems supported couldn’t even do 64-bit arithmetic without calling out to a software implementation) and it’s an ABI-breaking change to change it. It’s a shame that it exists at all, because it’s predicated on the assumption that your library will never be run linked to a program compiled for a newer system that supports a wider integer value. On a modern x86 system with AVX-512, you could store a 512-bit integer in a register and write a fairly fast set of operations on it, so should intmax_t be 512 bits?

                                                                                        1. 1

                                                                                          __int128 is a recycling of the 32-bit code for having 64-bit integers. Why throw away all that effort after the move to 64-bit systems? As for AVX-512 as far as I know SSE and AVX do not provide arithmetic types that are wider than 64-bits.

                                                                                          1. 1

                                                                                            Most 64-bit ABIs were defined before __int128 came along. AVX-512 doesn’t natively support 512-bit integers, but it does support 512-bit data in registers. You can implement addition by doing vector addition and then applying the carry bits. You can implement in-register multiply in a similar way. This makes a 512-bit integer a more realistic machine type than __int128, which is generally stored in a pair of registers (if you’re going to have an integer type that doesn’t fit in one register, why stop at two registers? Why not have a type split between four or more integer registers?).

                                                                                            1. 1

                                                                                              Could you teach me how to add the carry bits in SSE vectors? I know how to do it with VPTERNLOGD but it sounds like you know a more general approach than me.

                                                                    2. 1

                                                                      See, sizeof(char) is not fully specified in C.

                                                                      I think what you mean is CHAR_BIT (the number of bits in a byte) that is not fully specified. sizeof(char)==1 by C11 6.5.3.4p4:

                                                                      When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

                                                                      1. 1

                                                                        Whoops, my bad. Makes sense. Oh my, I guess that means sizeof(uint32_t) might be like 1. Goodness.

                                                                      2. 1

                                                                        Changing jibsen’s function to use uint8_t* instead will simply make the code refuse to compile in those kinds of environments. That’s why the blog post recommends mask+shift. The linked macros would work on those DSPs provided you store one octet per word.

                                                                        1. 2

                                                                          As I said in italics, refusing to compile was the point.

                                                                          1. 1

                                                                            That sort of attitude leads to about a third of all github issues for C projects like STB last time I checked. There’s always some finger wagger who adds a compiler warning that breaks code due to things like unused parameters because they feel it’s time for us to rethink things. If it’s possible and legal it should be permitted.

                                                                            1. 1

                                                                              Monocypher uses uint8_t for every buffer, and many (possibly most) of its users are in the embedded space.

                                                                              I don’t recall having even a single complaint about it.

                                                                              1. 1

                                                                                Yeah if you’re writing crypto libraries I can see the reluctance to accommodate weird architectures. Is valiant wolf your legal name? Makes the sheep like me a bit afraid to trust a library like that.

                                                                                1. 1

                                                                                  It is my legal name, believe it or not. And if you don’t trust my work, trust its audit.

                                                                                  1. 1

                                                                                    Carry on then. I looked through your code and it looked like you’re doing things the right way.

                                                                        2. 1

                                                                          See, sizeof(char) is not fully specified in C.

                                                                          So wrong. N1570 §6.5.3.4p4. sizeof (char) and sizeof (unsigned char) are defined to be 1.

                                                                          Of course, on those machines uint8_t would not even compile, but that’s kind of the point: if you can’t have bytes, you need to rethink your serialization code.

                                                                          People generally don’t run their code on DSPs, but let’s say that a popular machine architecture came out with 9 bit bytes. It would be incredibly unusual if that architecture exposed data streams coming over the internet by spreading a sequence of 9 8 bit bytes across 8 9 bit bytes. It’s more likely that this architecture would put the 9 8 bit bytes in 9 9 bit bytes with the MSB unset. It’s entirely possible to write code which handles this correctly and portably.

                                                                          That being said, if you’re of the opinion that it’s not worth worrying about machines where uint8_t is not defined then you probably don’t care about this hypothetical scenario in which case your entire point about using uint8_t over unsigned char is moot since it won’t matter anyway.

                                                                          1. 1

                                                                            N1570 §6.5.3.4p4. sizeof (char) and sizeof (unsigned char) are defined to be 1.

                                                                            Yeah, I was confusing sizeof and the width of bytes. (An interesting consequence is that sizeof(uint32_t) is one on machines with 32-bit bytes.)

                                                                            People generally don’t run their code on DSPs

                                                                            The reason I’ve even heard of DSPs with 32-bit bytes was because a colleague of mine had to write a C program for it, and he ran into all sorts of interesting problems because of that unusual byte size. Sure, the purpose of the chip was probably to do some simple and very fast signal processing, but if you can get away with cramming in more general purpose code in there as well, you could lower the manufacturing costs.

                                                                            It would be incredibly unusual if that architecture exposed data streams coming over the internet by spreading a sequence of 9 8 bit bytes across 8 9 bit bytes.

                                                                            It would indeed. I was more thinking of the (real) machines that have 32-bit bytes. It makes more sense for them to pack 4 network octets into a single 32-bit byte.

                                                                            That being said, if you’re of the opinion that it’s not worth worrying about machines where uint8_t is not defined

                                                                            I’m on the opinion that we should worry about them. Which is why I advocate explicit exclusion by using uint8_t.

                                                                      1. 7

                                                                        As long as we’re generalizing past the specific example of Django documentation in Italian, I think that it’s worth reflecting on the active maintenance required to keep any sort of diversity effort in effect. Maintaining multiple different ports for different architectures and operating systems, for example, has all of the same dynamics, due to the need to find maintainers who “speak Windows”, “speak ARM”, etc.

                                                                        1. 2

                                                                          Maintaining multiple different ports for different architectures and operating systems, for example, has all of the same dynamics, due to the need to find maintainers who “speak Windows”, “speak ARM”, etc.

                                                                          I think that’s a great analogy, because supporting different architectures and platforms also has the benefit of finding bugs that are hidden or difficult to trigger on some platforms. When I started writing open source code, the recommendation was to make sure it ran on both an i386 BSD variant and SPARC64 Solaris and then it would run anywhere: you had 32/64-bit, big/little endian, strong/weak memory order, strong/weak alignment requirements, BSD/SysV userland and kernel APIs covered. Now it’s much harder to find a pair of platforms that have that kind of coverage.

                                                                          Similarly, if you have translators then they will be the ones most likely to pick up on ambiguous phrasing or poor explanations. The best feedback that I’ve received on any of my books came from the Japanese translator. I was very fortunate for one that he was writing the Japanese translation based on my near-final draft and so I was able to fix all of the things in the final print version that he found. He did a far better job that any of the copyeditors because he wasn’t just reading the words passively, he was actively trying to express the same ideas and so couldn’t just skim past things that didn’t make sense.

                                                                          You need some solid infrastructure for this to be easy. In particular, you need to be able to pull out each sentence from your docs so that translators can just fix things that have changed. A style of one sentence per line in TeX / Markdown / whatever sources works well for this, but there are some tools that make it better.

                                                                        1. 3

                                                                          I’ve seen register used and even used correctly. It doesn’t mean what people think it means though. The register keyword just means ‘this variable may not alias, taking the address of it is ill-formed’. The compiler ignores it after the front end (unless you also use the GNU extension to place the value in a specific register), but the non-aliasing definition is important.

                                                                          It sounds as if volatile is deprecated for use in any situations other than the ones that volatile was intended for, which is great. I wonder what this will do for C interoperability though. C11 introduced _Atomic as both a type qualifier and a type specifier but then did something monumentally stupid and made all of the functions in stdatomic.h take a volatile (not _Atomic) qualified pointer. This was a disaster for ABIs because it meant that _Atomic(T) had to have the same representation as T, whereas C++‘s std::atomic<T> is explicitly allowed to have a different representation (so that it can use an inline lock for types that don’t have atomic hardware operations, without that you have to use a lock pool and an _Atomic(T) / std::atomic<T> in shared memory may not actually be atomic). Maybe C2x will fix this.

                                                                          1. 2

                                                                            All of their uses of const std::string & look like things that should be std::string_view in modern C++. I’m curious why they refactored their code to use C++98 idioms in 2021.

                                                                            1. 9

                                                                              std::string_view does not guarantee a null-terminated string, std::string does. We have to use a few C libraries, so constructing a temporary string each time instead of using a const reference to a .c_str() (to get a const char*) would require more changes and thus more testing, whereas we tried to keep this refactoring change as small and scoped to as possible, not changing behavior if not absolutely required. That will come in a next round of refactoring. Go read that part on the Big Redesign In The Sky for our refactoring workflow.

                                                                              Why are we doing this right now and not way earlier? We only just got an updated compiler for the specific hardware we use that supports modern C++ 17, before that we had a half-baked C++ 11 with big parts either missing or not finished. Now we have a newer compiler, thus we can take advantage of newer features.

                                                                              1. 3

                                                                                Note that std::string, like std::string_view, may contain null characters in the middle of the string. It’s therefore quite dangerous to rely on c_str() in any string that may contain attacker-controlled data: anything doing comparisons on the std::string will see the null byte, anything looking at the C string will see only the characters up to the first null.

                                                                                If I were doing this refactoring, I’d start by pulling out the operations that folks are doing on C strings and turning them into things that took a pair of CharT iterators, then incrementally rewrite the code that takes C strings to use the new versions, then switch from const char* to std::string_view. LLVM actually went through this exact refactoring about a decade ago, when we still built with a C++98 compiler, so the lack of C++17 support wasn’t an obstacle. The flow was:

                                                                                • Introducing llvm::StringRef (before std::string_view existed), as a generic wrapper around some non-owned contiguous range of characters.
                                                                                • Introducing operations that we needed on StringRef, including adding an llvm::Twine type for intermediate results of string modifications. A twine is sequence of string refs, so you can cheaply concatenate a bunch of strings in a Twine and do a single allocation for the final result if you need it in a contiguous buffer (often you don’t - many of the common operations on strings are defined for twines).
                                                                                • Refactor APIs to use StringRef instead of const std::string &, which meant that the same API could be called with a std::string or a char* (or anything else that can construct a StringRef, including a substring view from the middle of another buffer).

                                                                                I’m confused because you seem to be, in 2021, refactoring your codebase to look like ours was at the start of our refactoring, around 2008.

                                                                              2. 7

                                                                                Also, string_view is annoyingly limited compared to string. Ideally every read-only operation on string should be available on string_view, but in reality it seems the C++ committee lost interest along the way and only added the most common ones.

                                                                                I had high hopes for refactoring to use string_view, but I kept running into places where I had to create a temporary string for no reason except to get to some const method.

                                                                                (Sorry I can’t list the missing methods here; I don’t remember them offhand and I’m not near a compiler right now.)

                                                                                1. 1

                                                                                  I believe that this is because of a belief that string and string_view should represent storage with some guarantees on the contents. Operations on strings live in <algorithm> so that other string representations can be added easily.

                                                                              1. 3

                                                                                From the comparison page of the linked compiler it looks like the free edition can only be used to create GPL licensed software. Is there a good free Ada compiler that does not restrict how you license what you write?

                                                                                1. 6

                                                                                  You can use GCC. GNAT Pro is not GCC.

                                                                                  Edit: Explanation of how this works with GPL’s “no further restrictions”. GNAT standard library is copyrighted by FSF and distributed under GPL with linking exception. Thanks to linking exception, there is no restriction to linking standard library. AdaCore redistributes FSF-copyrighted GNAT standard library under GPL, without linking exception. Just as any other software licensed under GPL, your application should be GPL if you link AdaCore distributed GNAT standard library.

                                                                                  GPL forbids further restrictions, but GPL has no problem whatsoever with dropping exceptions. You get the same code with same copyright, but only FSF gives you exception, AdaCore does not give you exception unless you pay. Instead of paying, you could get exception from FSF, but while AdaCore spends efforts to advertise, FSF does not. AdaCore especially does not advertise you can get the same exception from FSF without paying AdaCore. As you have shown, apparently this “revenue by obscurity” is very effective in practice.

                                                                                  Edit 2: Explanation of how this works with GPL’s “you must show users these terms so they know their rights”. GPL requires you to include GPL in any redistributions. But even if you received it with exceptions, there is no requirement whatsoever you should let your users know about exceptions. GPL exceptions are not rights, so they are not covered by “know their rights” provision.

                                                                                  1. 13

                                                                                    As you have shown, apparently this “revenue by obscurity” is very effective in practice.

                                                                                    Disclaimer: I work at AdaCore.

                                                                                    The reason companies pay AdaCore is that they want support, not that they want to be able to write proprietary software. According to the people in our sales team, “Why should we pay you instead of using the FSF’s GNAT?” is a question that often comes up during negotiations and the answer always is “Because you won’t get access to experts that can answer your questions, fix your bugs in a timely manner and provide insurance in case something goes wrong if you don’t”. Companies chose to pay because in the safety-critical world you can’t afford to not have support for the tools you rely on.

                                                                                    It’s true that the license situation is confusing though. As far as I understand this is why AdaCore is planning on discontinuing GNAT community and instead will start helping linux distributions ship up to date versions of its tools.

                                                                                    1. 2

                                                                                      AdaCore is planning on discontinuing GNAT community

                                                                                      Don’t forget the Windows users. Given the current dependency on Makefiles, support already isn’t super great, and this would probably incentivize me to go back to writing Rust or C++17.

                                                                                      Companies chose to pay because in the safety-critical world you can’t afford to not have support for the tools you rely on.

                                                                                      This is true in all software, not just safety-critical. I’ve seen this happen many times, and companies don’t understand that it’s better to not burn $50-100+/hr per developer on a team when tools don’t work as intended.

                                                                                      1. 2

                                                                                        AdaCore is planning on discontinuing GNAT community

                                                                                        Don’t forget the Windows users.

                                                                                        First please don’t take my word as an absolute truth - I am not involved in any of the circles that actually decide what should happen regarding AdaCore’s involvement with the Ada community and I may have misunderstood some of the things I heard. In the end, I don’t really know what’s going to happen wrt Gnat Community.

                                                                                        If what I understood is correct, the plan for windows users would be to support them with Mingw (or maybe another distribution of linux tools for windows whose name escapes me). I remember that one of the other alternatives discussed was to rely on Alire for toolchain management (kind of like rustup/cargo). The other tools (gprbuild, GNAT Studio) that won’t be shipped with MinGW would still be available from AdaCore’s website. I think one of the other goals (aside from clearing the license confusion) of this move is to have the Ada community be more self-reliant, so that people would stop seeing AdaCore as “owning” the ecosystem and rather as just one of its actors.

                                                                                        1. 3

                                                                                          Disclaimer: I don’t work for AdaCore, or anyone Ada-related. I’m a C++ grunt.

                                                                                          aside from clearing the license confusion

                                                                                          This is a great move since one of my major gripes is the usage of GPL, of which many companies like to steer well clear.

                                                                                          so that people would stop seeing AdaCore as “owning” the ecosystem and rather as just one of its actors.

                                                                                          I think this is a really good goal. Alire is neat. Mingw doesn’t really cut it though, WSL is ok, but native support would be the best. Yeah, I get it’s a lot of toolchain work. It’d be nice if Microsoft built it into Visual Studio, which would be a legitimate option since there’s a formal spec and the ACATS test suite.

                                                                                          I wouldn’t be working at all in Ada if not for the work groundwork for an “Ada Renaissance” so to speak, laid by AdaCore in the last few years (language server, llvm compiler, libadalang, ada language server, the learning site, and quality youtube videos.

                                                                                          Anyone who felt like they missed the bus on getting involved on the ground floor of a language (like on Alire) definitely has huge opportunities in Ada.

                                                                                          1. 1

                                                                                            What’s wrong with Mingw?

                                                                                        2. 1

                                                                                          companies don’t understand that it’s better to not burn $50-100+/hr per developer on a team when tools don’t work as intended.

                                                                                          I think this is somewhat situational. Having good support for a tool can indeed save a lot of developer time. But paying for support doesn’t guarantee that the support will be useful when you need it.

                                                                                          Low-quality support is one problem, but response time can be an issue too: as a developer, if a tool issue is blocking me from getting my work done, it’s often the case that I can dig into it and figure out a workaround or a fix in less time than it takes to get an initial response, let alone a resolution, from a vendor’s support people. I’m costing the company just as much money when I’m twiddling my thumbs waiting for vendor support as I would if I were digging into the problem myself.

                                                                                          Obviously that depends hugely on which tools we’re talking about and on my level of expertise; it won’t be true for all tools and all developers.

                                                                                          That said, it’s been my experience that tool issues generally fall into three buckets: things I can figure out on my own in a reasonable amount of time, things I can’t figure out on my own but the vendor could solve in a reasonable amount of time, and things that would take the vendor a long time to solve. And the middle bucket is almost always much smaller than the other two.

                                                                                          None of which is to say that paying for support is a waste of money. But I think depending on the situation, it can also be rational to decide that the net cost is lower if developers figure out tool issues on their own.

                                                                                        3. 1

                                                                                          Thank you for your comment, it put things a bit in perspective for me.

                                                                                          I don’t purport to know if getting free tools in the hands of as many potential developers as possible is a good business decision or not, but I fear that if you manage to scare away even a percentage of potential new users, it may hurt the chances for Ada, the language, to grow.

                                                                                        4. 2

                                                                                          GPL forbids further restrictions, but GPL has no problem whatsoever with dropping exceptions

                                                                                          This is not true. The GPL explicitly forbids redistributing with weaker restrictions. If this were not the case, you could combine a GPL’d file in a BSDL library and distribute the result under the BSDL, then incorporate this in a proprietary product, defeating the point of the GPL. You can; however, create a new license that is the GPL + some exceptions. This is what the FSF does with the GCC linking exemption, for example[1].

                                                                                          This distinction is important. Your phrasing suggests that it would be possible to take an existing GPL’d file and incorporate it into the GNAT standard library. Doing so would violate the GPL (if you distributed the result) unless the copyright holder agreed to the relicensing. The FSF works around this by requiring copyright assignment.

                                                                                          [1] Note that you have to do this by adding exemptions to the end of the GPL, rather than modifying the GPL because, somewhat ironically, the text of the GPL is copyrighted by the FSF and distribution with modifications is not permitted by the license of the license text itself.

                                                                                        5. 4

                                                                                          The GNAT version from the FSF (which your distro ships) allows writing software under any license. The GNAT version from AdaCore and the one from FSF are basically the same.

                                                                                        1. 5

                                                                                          Ada is missing: A concept of “move”.

                                                                                          Isn’t that what Spark’s support for pointers provides?

                                                                                          1. 2

                                                                                            That list is based on my current understanding.

                                                                                            Move is different than borrowing, iirc move is just the default behavior in Rust for non-copyable types. In C++ it’s done via r-value references and the type system, std::move doesn’t have behavior other than to force something to be perceived as an r-value reference so the overload system dispatches the right behavior. The “move” concept is used to transfer resources efficiently in well-defined ways, such as in move assignment/constructor, and is a complicated subject.

                                                                                            If Ada implemented “move” the same way, A := B (or some equivalent A := Move(B)) would transfer the contents of B into A and would work even for limited (uncopyable) types. E.g. for std::vector in C++, move transfers the ownership of the internal contents without a copy, but it looks like Ada.Containers.Vectors.Move is just an array assign and hence a copy.

                                                                                            1. 5

                                                                                              https://blog.adacore.com/using-pointers-in-spark

                                                                                              The main idea used to enforce single ownership for pointers is the move semantics of assignments. When a pointer is copied through an assignment statement, the ownership of the pointer is transferred to the left hand side of the assignment. As a result, the right hand side loses the ownership of the object, and therefore loses the right to access it, both for writing and reading.

                                                                                              1. 2

                                                                                                That works for the trivial case. At least in C++, the idea of move semantics is to handle the general case. For (a fairly simple) example, moving one string to another means that the new string takes ownership of the internal buffer and the old string is left in an undefined state. Ideally, you wold have static verification that no uses of the old object other than deallocation occur.

                                                                                                Linear ownership of pointers is one of the key building blocks for implementing move semantics (when you move from object A to object B, you must transfer ownership of all pointers in A to B), but it’s not sufficient by itself.

                                                                                                C++ actually implements these the other way around: std::unique_ptr provides linear ownership for pointers (with an unsafe escape hatch via get()) and is built on top of move semantics by defining move constructors and move-assignment operators. An object whose fields are all move-constructible gets a default move constructor that move constructs all of its fields, so if an object has only std::unique_ptr and primitive value types as fields then it will get a default move constructor that works in the same way as the string example above. One of the biggest gotchas in C++ is that bare pointers are move constructible as a copy operation. You need to be very careful to avoid having bare pointers that need explicit memory management held by anything other than a smart pointer class that does the right thing.

                                                                                                I believe Rust has something similar. In Rust, the use of traits to advertise (and infer) compliance with specific properties makes this kind of thing easier.

                                                                                          1. 8

                                                                                            TL;DR: The LLVM build system generates Ninja files where almost every step can be executed in parallel. The serial steps are links and, if you’re using LLD then linking uses multiple threads. If you have an embarrassingly parallel workload then a machine with a load of cores (160 here) can run it very fast.

                                                                                            1. 3

                                                                                              Can someone explain what’s going on step by step? I don’t get how it works.

                                                                                              1. 5

                                                                                                I was also a bit confused. My confusion was around how strong the word “bootstrap” is I suppose. This process lets them bootstrap an OCaml compiler from gcc without needing the OCaml build toolchain, but they do still need an existing OCaml runtime that can run OCaml bytecode, as a step in the process.

                                                                                                The steps, as I understand them:

                                                                                                1. There is a compiler, written in Guile Scheme, that compiles a subset of OCaml (called miniml) to OCaml bytecode.

                                                                                                2. There is an interpreter, written in miniml, that can interpret the full OCaml language. The Scheme compiler in step 1 can therefore compile a full OCaml interpreter from miniml source code to OCaml bytecode.

                                                                                                3. [This step requires an OCaml runtime.] The interpreter produced in the previous steps can now be used to (very slowly) run the full/regular OCaml compiler. This interpreted compiler will be fed its own source code as input, and will compile itself, using the interpreted OCaml compiler to produce a compiled OCaml compiler, which completes the bootstrapping.

                                                                                                1. 3

                                                                                                  Thanks. The bit that I was missing was step 3: using these tools to provide run the OCaml compiler that’s written in OCaml. Without that, it just sounds like an OCaml implementation that isn’t written in OCaml.

                                                                                                  For bootstrapping, I really like the way that Squeak does it. The core of the Squeak VM is written in a subset of Smalltalk that can be statically translated to C. To bring up on a new platform, you translate the core to C, and that gives you enough to run the rest (which is Smalltalk bytecode). You generate the image (the bytecode) on another machine and then just bring it across. I believe Pharo can use a JIT compiler to replace some of the bits that were translated through C, but once you’ve got a working Smalltalk VM you’ve done the bootstrap process and can use that to run any other Smalltalk tools.

                                                                                                  Bootstrapping is far less important if you have a good cross compiler. Clang, for example, can compile itself targeting any platform that it can target as a native compiler. This means that, as long as I have a VM image somewhere that contains any clang version, I can always use that to bootstrap a modern clang for any supported target.

                                                                                                  1. 1

                                                                                                    OCaml runtime is written in C, so that makes sense.

                                                                                                1. 2

                                                                                                  How does ninja shave a minute off of the build time? What can it do that make can’t?

                                                                                                  1. 4

                                                                                                    I think part of this is down to the way that CMake uses them. The generated Ninja files rely entirely on Ninja for the build. The generated Makefiles invoke CMake bits to print useful messages. Ninja does the right thing for output by default: it buffers the output from every command and then prints it atomically for any command that produces output and shows the build command only for build steps that fail (unless you pass -v). With make, the build steps all inherit the controlling TTY and so are interleaved. It’s been years since I used make with CMake[1], but as I recall it wraps each compile command in a CMake invocation that captures the output and then tries to write it atomically. The CMake build rules also do more to produce pretty output in the form that is the default for Ninja.

                                                                                                    In addition, it’s not about what Ninja can do that Make doesn’t, it’s about what Ninja doesn’t do. Ninja is not intended as a general-purpose scripting language. It doesn’t have macros, variables that are evaluated by running shell scripts, and so on, it is just a declarative description of build dependencies. It is designed with speed as its only goal, delegating all of the input complexity to a pre-build tool. Make is a far more flexible tool (particularly a modern Make such as GNU Make or BMake) that is designed to be useable without a separate configuration step, even if most uses do add one.

                                                                                                    This makes it simpler to parse and get to the build step. For example, Ninja doesn’t have suffix rules. CMake is responsible for providing the full set of flags and build commands for every individual rules. This is I think Ninja is a little bit more clever about arranging the dependency tree to maximise parallelism. Not relevant for this, but important on more resource-constrained systems: Ninja is also able to have different-sized pools for different jobs, so if link steps take more memory then it can reduce the degree of parallelism during link steps.

                                                                                                    [1] I have this in a file sourced from my .bashrc so CMake automatically uses Ninja:

                                                                                                    # Set the default generator to Ninja.
                                                                                                    export CMAKE_GENERATOR=Ninja
                                                                                                    # Always generate a compile-commands JSON file:
                                                                                                    export CMAKE_EXPORT_COMPILE_COMMANDS=true
                                                                                                    # Give Ninja a more verbose progress indicator
                                                                                                    export NINJA_STATUS="%p [%f:%s/%t] %o/s, %es "
                                                                                                    
                                                                                                    1. 3

                                                                                                      Ninja’s scheduling is really very impressive.

                                                                                                      Many years ago, I patched Jam to compile faster. My approach was different: When I had a choice of which job to start, I’d prefer the one that depended on the most recent source file. This produced amazingly fast compile times if there were errors. Builds that failed would generally fail very quickly, often in the first second. You can imagine what that does to digressions, and I thought it achieved that without any cost to successful builds.

                                                                                                      Ninja was the first build tool to prove me wrong. The order in which it starts jobs is better that mine in some/many cases. Drat.

                                                                                                  1. 10

                                                                                                    Good to see that when incremental builds are broken, it is a bug!

                                                                                                    I have fond memories of recursive Make (where correct incremental builds were practically unattainable): We would rush together as a bunch of developers standing almost on top of each other in someone’s office a few odd times per year. This to add the missing dependency in the toplevel Makefile, of course. Like a ceremony. Then, we would wish each other good luck and go back to work. At least, the bug was always in our hands.

                                                                                                    Even CMake, I think is flawed when it comes to cached variables (i.e. variables): There is no cache invalidation – they take precedence over new default values. Every time I get a nonsense error, and it helps to delete the build directory, I get so sad.

                                                                                                    1. 2

                                                                                                      Even CMake, I think is flawed when it comes to cached variables (i.e. variables): There is no cache invalidation – they take precedence over new default values. Every time I get a nonsense error, and it helps to delete the build directory, I get so sad.

                                                                                                      The hardest problem in CMake is what happens when you change the default: CMake’s cache doesn’t differentiate between variable-was-set-to-default-value-X and var-was-set-to-X. This tripped me up recently when LLVM switched to enabling the new pass manager by default. My builds had cached the disabled value because that was the old default.

                                                                                                    1. 6

                                                                                                      I am against this code for the reason detailed in Ethics for Programmers post.

                                                                                                      1. 1

                                                                                                        Can you elaborate on what conflict you see between the two? I remember the Ethics for Programmers post from many years ago, and have not read the ACM code carefully yet.

                                                                                                        1. 7

                                                                                                          Ethics for Programmers says programming is for the benefit of users. “It is a strong ethical obligation on the part of the programmer to make sure that programs do, always, and only, what the user asks them to.”

                                                                                                          ACM code says programming is for the public good. “The Code includes principles formulated as statements of responsibility, based on the understanding that the public good is always the primary consideration.”

                                                                                                          I strongly reject and condemn ACM code because users come before the public good. Consider ethics for lawyers that says the public good is the primary consideration and you shouldn’t defend murderers to the best of your ability. That would be absurd. ACM code is similarly absurd.

                                                                                                          1. 3

                                                                                                            Lawyer is an interesting comparison. The ACM code seems to most strongly parallel engineering ethics codes (example), where public good is commonly taken to be the primary obligation. Maybe not that surprisingly, since many active ACM members see themselves as engineers, and/or wish the field were more like “real” engineering (put in extra scare quotes). But it’s an interesting question whether that’s the best comparison.

                                                                                                            I agree that a lot of other professional ethics codes recognize primary responsibility to someone more proximal than the public at large. Doctor is another example: doctors do have a general obligation to public health, but this is usually not taken to override their more direct obligation to their patients (i.e. they aren’t supposed to take action to harm the patient even if there were a way to know with high confidence that doing so would benefit the general public overall).

                                                                                                            1. 3

                                                                                                              But it’s actually in the public’s best interest for anyone charged with a crime to receive a robust defense and fair trial from a competent attorney. It only becomes a “public good” when someone is either found guilty beyond a reasonable doubt and therefor faces justice - or is found not guilty and set free - during a fair trial.

                                                                                                              Locking up everyone you merely suspect is guilty actually damages society as a whole.

                                                                                                              1. 2

                                                                                                                No. You are supposed to be able to tell your attorney you did it, and she’s still supposed to do everything she can to get you off. It’s up to the prosecutor to convince the judge and jury, and for them to decide.

                                                                                                                1. 4

                                                                                                                  You are supposed to be able to tell your attorney you did it, and she’s still supposed to do everything she can to get you off

                                                                                                                  I don’t believe that’s the case in the UK or USA. If you have confessed then, because of attorney-client privilege, they are not required to report this, but they are required to advise you to enter a guilty plea and decline to represent you if you refuse. They are required to continue to act in your interest in terms of plea bargaining, advocating for extenuating circumstances (e.g. diminished responsibility), and so on.

                                                                                                                  Note that there’s a difference between confessing to your attorney and your attorney thinking that you did it. If you maintain your innocence but your attorney thinks you’re lying, they are still required to act in a way that represents your best interests on the presumption of innocence.

                                                                                                                  1. 1

                                                                                                                    If you tell your attorney you are guilty, and can provide credible information you are not lying, 99.9% of attorneys will recommend you plead guilty because the likelihood you are found not-guilty is extremely low. In this scenario you’re only getting off on some technicality that your lawyer can exploit (which is itself risky) - which again works towards the public good as those procedures work to protect everyone’s civil liberties. Allowing rules to be broken (eg evidence mishandling) because you’re really guilty works against the public good.

                                                                                                                    Also: In the United States, attorneys will actually recommend their client’s plead GUILTY when they’re NOT GUILTY if they are unable to make bail because often if you just plead guilty you can get probation whereas waiting for a trail can take up to 90 days, which you will spend in jail.

                                                                                                          1. 3

                                                                                                            Great article, though it missed one place for a bug in section 4: Bugs in the compiler. Unless you’re compiling your strlen function with CompCert, the compiler may modify it in ways that violate your assumptions. This is closely related to ‘bug in the logical encoding of the C programming language semantics’: It’s not so much an issue of there being a bug in the encoding of the C semantics as a difference in the semantics that you use for the proofs and the semantics that the compiler uses. Our C semantics is parameterised and attempts to capture a strict interpretation of the standard as well as variations on a de-factor standard as understood by C programmers and compilers.

                                                                                                            Current verification tooling suffers from a gap in language support. If I want to write code with a low probability of bugs, I’ll use a higher-level language (even C++ is preferable to C), but the proof tools tend to generate C, so the interface between verified and unverified code uses interfaces that contain sources of bugs. For example, F* lowers enumeration-like constructs in the verified code to an int in C with some #defines for the values. It’s easy to accidentally switch the order of a couple of these in a call from unverified to verified code and the compiler’s type checker won’t warn at all.

                                                                                                            The Rust Belt project is interesting in this regard, trying to come up with verified implementations of the unsafe uses that are required to implement the Rust standard library. That alone would let you write an entire nontrivial program in safe Rust (currently impossible - all I/O is unsafe, for example), but would also potentially be a building block for verification of entire crates.

                                                                                                            Partial verification is also something that we’re interested in for Verona. We provide very strong type safety and data race freedom guarantees and so can provide boxes for putting verified code in where they can (correctly) assume some very strong properties of the calling code.