1. 1

    Is it just me, or is unveil a terrible choice of name? It normally means “remove a veil”, “disclose” or “reveal”. Its function is almost exactly the opposite - it removes access to things! As the author says:

    Let’s start with unveil. Initially a process has access to the whole file system with the usual restrictions. On the first call to unveil it’s immediately restricted to some subset of the tree.

    Reading the first line of the man page I can see how it might make sense in some original context, but this is the opposite of the kind of naming you want for security functions…

    1. 3

      Is it just me, or is unveil a terrible choice of name? It normally means “remove a veil”, “disclose” or “reveal”. Its function is almost exactly the opposite - it removes access to things!

      It explicitly grants access to a list of things, starting from the empty set. If it’s not called, everything is unveiled by default.

      1. 3

        I am not a native speaker, so I cannot comment if the verb itself is a good choice or not :)

        As a programmer who uses unveil() in his own programs, the name makes total sense. You basically unveil selected path to the program. If you then change your code to work with other files, you also have to unveil these files to your program.

        1. 2

          OK, I understand - it’s only for the first usage it actually restricts, and immediately also unveils, after that it continues to unveil.

        2. 2

          “Veiling” is not a standard idea in capability theory, but borrowed from legal practice. A veiled fact or object is ambient, but access to it is still explicit and tamed. Ideally, filesystems would be veiled by default, and programs would have to statically register which paths they intend to access without further permission. (Dynamic access would be delegated by the user as usual.)

          I think that the main problem is that pledges and unveiling are performed as syscalls after a process has started, but there is no corresponding phase before the process starts where pledges are loaded from the process’s binary and the filesystem is veiled.

          1. 1

            Doing it as part of normal execution implements separate phases of pledge/unveil boundaries in a flexible way. The article gives the example of opening a log file, and then pledging away your ability to open files, and it’s easy to imagine a similar process for, say, a file server unveiling only the public root directory in between loading its configuration and opening a listen socket.

            1. 1

              I think that the main problem is that pledges and unveiling are performed as syscalls after a process has started, but there is no corresponding phase before the process starts where pledges are loaded from the process’s binary and the filesystem is veiled.

              Well the process comes from somewhere. Having a chain-loader process/executable that sanitises the inherited environment and sets up for the next fits well with the established execution model. It’s explicitly prepared for this in pledge(, execpromises).

              1. 2

                You could put it in e.g. an elf header, or fs-level metadata (like suid). Which also fits well with the existing execution model.

                Suid is a good comparison, despite being such an abomination, because under that model the same mechanism can double as a sandbox.

                Chainloader approach is good, but complexity becomes harder to wrangle with explicit pledges if you want to do djb-style many communicating processes. On the other hand, file permissions are distant from the code, and do not have an answer for ‘I need to wait until runtime to figure out what permissions I need’.

                1. 1

                  Not going too far into the static/dynamic swamp shenanigans (say setting a different PT_INTERP and dlsym:ing out a __constructor pledge/unveil) - there’s two immediate reasons why I’d prefer not to see it as a file-meta property.

                  1. Filesystem legacy is not pretty, and accidental stripping of meta on a move to incompatible file-system would have a fail-silent-dangerous (stripping sudo is not dangerous versus stripping pledge setup).
                  2. Pledge- violations go kaboom, then you need to know that this was what happened (dmesg etc.) and you land in core_pattern like setups. The choice of chain-loader meanwhile takes the responsibility of attribution/communication so x11 gets its dialog or whatever, isatty() a fprintf and others a syslog and so on.
            2. 1

              Like Linux’s unshare

            1. 2

              Scany adds some quality-of-life improvement on top of pgx by eliminating the need to scan into every field of a struct. However, the desired field names must still be listed out in a SELECT … statement, so it only reduces boilerplate by half.

              I personally don’t feel like that counts as boilerplate. But maybe someone with more expertise in SQL queries can tell me why using a SELECT * would be better than listing the fields. Or maybe I just don’t mind taking a little extra time to be explicit because I prefer it.

              And I don’t see the difference between writing the SQL statements in my Go code as strings and writing them in a SQL file and using code generation to generate Go code with the queries in…a string. I dunno, maybe I’m more allergic to code generation than the author.

              1. 2

                SELECT * means you’re getting an unbounded mess of all the columns, while listing them individually means you’ll both only get what you need, and your query will break if the table structure changes. It’s generally better according to the SQL book I read recently.

              1. 11

                It’s mentioned, but I feel like the article undersells the incredible synchronicity of at least the AP of the LAMP stack. Like the cgi-bin before it, PHP was stupidly easy to deploy onto a web server. Just drop in your files and go. No managing extra servers for web apps or anything like that really. This meant it was stupidly easy to allow clients/users/etc. to run dynamic content with PHP.

                With basically every “modern” web platform outside of PHP, you’re either

                • relying on more servers you run that get proxied to (e.g. dynamic applications in Python, Go, Node, etc.)
                • building static content and relying on 3rd parties if you want to inject any dynamicness

                so it’s definitely more of a process. It’s interesting seeing companies like Heroku or Fly.io appearing which offer something as close to the old Web Hosting Company “fire and forget” for modern apps that need some kind of server component, but I still wonder how in the long run it’ll affect newer developers who, outside of these handful of services, need to basically become full sysops to built out their own stack on a VPS since you can’t just dump your content on a web host and call it a day anymore.

                1. 4

                  Unfortunately, the mod_php model was a security nightmare for shared hosting, which is where some of the LAMP-hatred comes from. mod_php was in-process for Apache and so ran as whatever user httpd ran as. This user had to have, at least, read-only access to every user’s public_html directory. It didn’t matter for static hosting that Apache could read everyone’s public_html directory because (almost) everything there was expected to be shared with the Internet. Unfortunately, with PHP, you ended up with secrets in your public_html: in particular in a LAMP stack you’d have the password for the database. Another user could write a .php file that just read your .php file and dumped the output and Apache would happily then tell everyone your database password. Even if no one on the same system did this maliciously, a load of off-the-shelf PHP things came with bugs that allowed someone who sent the right HTTP request to dump the contents of an arbitrary file on the filesystem that Apache could read.

                  This was made even worse by the fact that Apache ran as root until the early 2000s. I saw a few systems have their shadow password file leaked because a PHP file allowed arbitrary filesystem reads and Apache was running as root.

                  1. 4

                    Right, I remember deploying applications felt a lot easier than now, where there’s some kind of prior knowledge you have to have with i..e Docker, versus “ftp it and follow the setup for database”. Plus it’s dead simple for adding quick little scripts or interactivity to existing sites. I think PHP isn’t as good for complex stuff, but that’s the curse of being able to get going quickly.

                    1. 4

                      Exactly. I don’t know how/why Apache decided to bake PHP in, but that was a kingmaker that made PHP ubiquitous despite it being IMHO a deeply terrible language. Even I got sucked in by the simplicity and used PHP for about a year circa 2005 before giving up in disgust. (In fairness, some of the blame lay with the crappy quality of libraries. I remember a supposedly-complete WebDAV library I had to basically gut and rewrite.)

                      1. 4

                        Apache didn’t, they just provided an extension system to allow people to do all sorts of stuff, like embed language runtimes. mod_php was always written and provided by PHP themselves.

                        1. 2

                          mod_perl and mod_python were the main contemporary competitors that I remember.

                          1. 1

                            TIL mod_php wasn’t actually an Apache-produced thing. Neat.

                            If it wasn’t specifically Apache’s holy solution, I do wonder why it picked up so much steam over the alternatives though?

                            1. 5

                              With mod_php you renamed an HTML file to .php, added <?php tags, and went from there. The problems with that show up if you try to color outside the lines, but for a lot of applications, you don’t need to do that.

                      1. 1

                        I’m surprised at how often GPS threatens to break time. Would be nice if the protocols could be updated to use less human time scales like weeks, but updating satellites is a quite brittle thing.

                        1. 6

                          The week number was already extended from 10 bits to 13 bits. The first satellites supporting it were launched in 2005 and the new format signal has been broadcast since 2014. For capable receivers, there is no rollover until 2137 (at which point, if GPS is still flying and hasn’t been further upgraded, the 157-year ambiguity will presumably be much easier to resolve than a 19-year one).

                          This is more of a “gpsd (not GPS) tried to be clever and failed” issue. They could have simply had their logic for 10-bit week numbers resolve the ambiguity in such a way that it always returns the first matching time after the build date of the copy of gpsd itself. That would have been 100% reliable for anyone who upgrades their software at least once every two decades. If that’s not good enough then maybe they could think about a ratchet. What they ended up with instead… wasn’t really smart, it just looked that way.

                          1. 1

                            Do note that the exact same bug would have happened in 2003 as it is not the week counter rollover that caused it but the fact that there hasn’t been a leap second for 256 weeks which is the modulus to which the week is given for the next leap second date. Who knows how many similar footguns there are, but I’d think it would be best to utilize the fact that we now have better equipment and rework the signals to be harder to misinterpret.

                            1. 2

                              Do note that the exact same bug would have happened in 2003

                              Yes, because it’s bad code. The solution is to have less bad code. Involving leap seconds in this was a mistake.

                              but I’d think it would be best to utilize the fact that we now have better equipment and rework the signals to be harder to misinterpret.

                              Again, already done. Not in the exact way you’re asking for, but in a way that works just fine in the real world. This is gpsd attempting to do its best with receivers relying on the old format (and fumbling it).

                              1. 1

                                A better solution is to make it harder to write bad code. GPS handling code will be written yet again another thousand times. Do we need to keep the possibility of them having the same exact bug yet again?

                                But I do have to agree on the fact that it has been fixed. The week on which the rollover will happen is now specified with the same modulo as the week so such bugs should hopefully no longer occur with new receivers.

                                1. 1

                                  The week on which the rollover will happen is now specified with the same modulo as the week

                                  This statement is nonsense.

                          2. 1

                            It’s not just satellites, it’s every receiver deployed in the last 40 years, many in neglected but safety- or life-critical use.

                            1. 1

                              It’s possible to add extra signals, it has already been done. Having one with 40 bit seconds counter should definitely be possible without interfering with existing signals, and it would decrease the fragility of GPS time tracking massively.

                              1. 1

                                How exactly does that improve the situation of the receivers existing in the wild?

                                1. 1

                                  It improves the situation of the receivers that will exist in the wild. This exact error happened 17 years ago to some Motorola receivers. The frequency of such occurrences is rare enough for information to be common knowledge even amongst those who implement GPS, but with enough impact that letting such errors repeat is asking for trouble.

                          1. 3

                            Wonderful post!

                            I’ve heard about some places making edicts about config files needing to use properly-terminated formats like JSON a instead of something like YAML to avoid truncated documents being valid. Would that also be a consideration useful in canonicalization?

                            1. 3

                              YAML in general worries me (especially the Norway Problem), and its susceptibility to truncation is noteworthy, but this problem is strictly about how you feed data into your MAC (or equivalent) function rather than a general problem with data truncation.

                              I’m sure there are other, cleverer attacks possible than the simple one I highlighted.

                              1. 2

                                OK yeah I’ve properly woken up now, the issue with scooting data from the encrypted to additional stuff doesn’t get magically fixed if you bound that data. Thanks for humoring me :)

                              2. 1

                                a simple way to handle this truncation issue you raised is to ensure that the data being hashed ends up with a 0x0 byte (which cannot occur anywhere within the string, so does not need escaped). Then the format itself (JSON, YAML etc.) does not matter.

                              1. 3

                                Not sure why he calls out Apple specifically.

                                Apple has exactly 1 type of USB-C cable that they sell, so if you have an Apple USB-C cable then it’s that one.

                                1. 5

                                  That’s not true.

                                  https://www.apple.com/shop/mac/accessories/power-cables shows two lengths of “USB-C Charge Cable” both for $20, a “Thunderbolt 3 (USB-C) Cable” for $40, and a “Thunderbolt 3 Pro Cable” for $130. In addition to the Apple-brnaded cables, they sell a Belkin TB3 cable, a Belkin USB-C charge cable, and an “Only at Apple” Mophie USB-C cable.

                                  1. 3

                                    And the thunderbolt one is branded (with the thunderbolt logo) https://www.apple.com/shop/product/MQ4H2AM/A/thunderbolt-3-usb%E2%80%91c-cable-08-m

                                1. 9

                                  Bet you a dollar that professional musicians are even more obsessive about their tooling than we are

                                  1. 5

                                    Eric Clapton’s Magic Wood Block™ is being sold for $21.

                                    Then you have the whole discussion whether or not it matters which type of wood your electric guitar’s body uses, and if so, which type of wood is the “best”. You think Vim vs. Emacs is a religious war? Hah! Kid’s stuff.

                                    1. 2

                                      Fun aside: for me, it’s €17.91.. plus €8,103.70 shipping.

                                      1. 4

                                        For that price I’d expect Eric Clapton himself to deliver it to me.

                                    2. 4

                                      Professional drivers vary from gearhead to just rolling down the windows.

                                      Doctors have minor holy wars over details of and entire plans for procedures.

                                      Builders have strong brand opinions and stronger tool choice. “Young people may use miter saws but I was always faster and cleaner with a circular saw. “

                                      Seamstresses have very strong ideas, but tend to agree about the stuff that matters more.

                                      Stock traders. Do you save up for the Bloomberg keyboard or do you rough it?

                                      Don’t even try to get started on the various fine arts.

                                      1. 1

                                        I definitely know a musician who did a doctoral thesis on the acoustics of the material used in a single part of their instrument of choice.

                                      1. 1

                                        I am not sure about the limit, but I am assuming most hosted providers has limits on a) number of records and b) rate limit on API requests. It could work better if you are hosting your own NS.

                                        1. 3

                                          Usual caveats if my hobby projects apply definitely:

                                          scale and usefulness in the real world are an afterthought

                                          1. 2

                                            Route 53 has a “please tell us whyyyyyyyy” soft cap at 10,000 records, see https://aws.amazon.com/route53/pricing/

                                            They limit you to 5 requests/second, and it looks like they can’t be concurrent requests, https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNSLimitations.html#limits-api-requests-route-53

                                          1. 12

                                            Ah, the world where storing a string vs a Boolean has “negligible” overhead. Whereas my inner bean-counter immediately starts hopping up and down yelling about heap allocations and locality of reference, and even if you store the time stamp as a double that’s 8 bytes vs 1 and do you know the size of a cache line…?

                                            Not saying one world is truer or better, of course! I wasn’t the one stating a practice that’s “almost always” right. I’m just amused at how people think their domains are all there is (like the web-dev folks who can’t believe SQLite is more widely deployed than Postgres.)

                                            1. 7

                                              If you are working with those restraints then it’s pretty obvious that his won’t work for you, but I think it’s very obvious that this post wasn’t aimed at you.

                                              1. 4

                                                Yeah. “Web development” features in the very first sentence of the post, and I’m sure the blog author knows their audience. Comments like the one you replied to don’t elevate the discussion whatsoever. (cf. @dbove’s, which is germane and an interesting discussion point.)

                                                1. 2

                                                  there are tons of other web development languages where people would cringe at:

                                                  is_published = new Date();
                                                  if (is_published)
                                                  

                                                  that’s… not how types should work, really (I know that in Python it could also be None, but in this case I find this a horrible combo of naming+type)

                                                  1. 2

                                                    Sure, say if published_at != nil or whatever the equivalent idiomatic truth test in your language of choice is.

                                                    The article maybe spends too much time on the ergonomics in JavaScript with the large code block, drawing attention away from its whole point: that you’re better off storing a nullable timestamp than a boolean in most cases.

                                                    How you then test that field is kind of secondary to the data modelling concern. (“And you can pretty this up with a helper function called isPublished() which checks the published_at attribute.”)

                                                2. 4

                                                  Mea culpa — by the time I reached the end I must’ve forgotten the first sentence, “ In my 15+ years of web development…” 😫

                                                3. 2

                                                  A string? Is that a typo?

                                                  1. 3

                                                    If you’re using sqlite3, it is. https://www.sqlite.org/datatype3.html explains:

                                                    2.2. Date and Time Datatype

                                                    SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values:

                                                    • TEXT as ISO8601 strings (“YYYY-MM-DD HH:MM:SS.SSS”).
                                                    • REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
                                                    • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

                                                    Applications can chose to store dates and times in any of these formats and freely convert between formats using the built-in date and time functions.

                                                    And sure enough:

                                                    > sqlite3 tmp.sqlite3
                                                    SQLite version 3.32.3 2020-06-18 14:16:19
                                                    Enter ".help" for usage hints.
                                                    sqlite> create table asdf (id integer primary key, created_at timestamp);
                                                    sqlite> insert into asdf (id, created_at) values (1, CURRENT_TIMESTAMP);
                                                    sqlite> ^D
                                                    > strings tmp.sqlite3
                                                    SQLite format 3
                                                    tableasdfasdf
                                                    CREATE TABLE asdf (id integer primary key, created_at timestamp)
                                                    32021-04-25 03:57:55
                                                    > xxd tmp.sqlite3 | tail -2
                                                    00001fe0: 0000 0000 0000 0000 1601 0300 3332 3032  ............3202
                                                    00001ff0: 312d 3034 2d32 3520 3033 3a35 373a 3535  1-04-25 03:57:55
                                                    
                                                1. 5

                                                  I wonder what GUI apps people really want on Windows. There’s a few I like, but not enough to justify the pain of actually running them in WSL, and they almost always have superior alternatives elsewhere.

                                                  1. 7

                                                    I want emacs running next to node & rails projects without having to go through SMB or NTFS.

                                                    1. 1

                                                      Can you have something like ext4 on WSL?

                                                    2. 5

                                                      I teach occasionally and I would love to be able to have my students run an IDE in WSL to avoid dealing with a bunch of Windows “stuff” that doesn’t matter. That’s a pretty minor use-case, though, to be fair.

                                                      1. 3

                                                        I use gitg pretty often on Windows because I tend to do git checkouts in WSL to avoid any line-ending issues and generally use the command line, but sometimes run gitg when I want to add a subset of my changes to a commit. I used to use Konsole until the Windows Terminal got better.

                                                        The sound bit might be interesting, but vcXsrv from Chocolatey makes it very easy to run X11 apps in WSL or in a VM. It’s interesting that this is using a separate VM, because that implies that it probably isn’t specific to Linux: I’d expect a FreeBSD VM to be able to talk to the Wayland and PulseAudio server just as easily.

                                                      1. 8

                                                        I am desperate for someone to tell me what makes a font “cute”. This isn’t a joke, I’m not making fun, I’m being entirely sincere. I have been baffled by emotional reactions to fonts for a long time and I really want to understand.

                                                        1. 17

                                                          I’m not a typesetting expert by any means, but it’s the feeling evoked by looking at the font.

                                                          If a font has sharp, jagged edges, it looks industrial like some sort of equipment.

                                                          A font with thin strokes and wide serifs has a formal feeling to it like old style text in an old newspaper.

                                                          A cute font is generally going to be rounded, without serif, and with uniform thickness across each stroke with some optional flourishing. These fonts remind the viewer of a small cute animal with it’s rounded edges.

                                                          Aside from the lettering looking like some sort of small animal, the lettering is a much cleaner version of the type you would expect a child to make while writing with a marker, crayon, simple paintbrush or some other wide tipped implement and may remind the viewer of their childhood.

                                                          These are the things I see. Maybe someone else can offer some more enlightenment.

                                                          1. 2

                                                            I don’t know that I would have put anything this way, before reading your comment, and “cuteness” isn’t really a thing I generally experience from fonts (unless we’re talking about emoji), but I want to voice my support for letters that look like some sort of small animal.

                                                          2. 8
                                                            1. Babies are the baseline standard for adorability. They have big round eyes. They have big heads relative to their bodies. Adorable babies are also a bit chubby with gentle curves that have a large radius of curvature.
                                                            2. To make a font more adorable, you need visual cues which trigger the adorability reflex. If you don’t have that reflex, I can still describe some analytical measures that work (okay, they work for me). Give it a large x-height relative to the caps height. That triggers the “babies have big heads” pattern. The lower case letters should contain big round circles, when there is a choice of letter form, which evokes the “babies have big round eyes” pattern. For the lower case ‘a’, you want the “single story” version, often associated with italic and san-serif fonts (but not exclusively), as opposed to the “two-story” version. Similarly for lower case “g”, you want the simpler version that has the bigger “head”. Also, san-serif is more adorable than serif.
                                                            1. 14

                                                              Babies are the baseline standard for adorability. They have big round eyes. They have big heads relative to their bodies. Adorable babies are also a bit chubby with gentle curves that have a large radius of curvature.

                                                              Not gonna lie, I had to giggle because this is the first time I’ve seen someone attempt to come up with an objective measure of cuteness. How about we give it a unit of measure? I propose it be named an uwu. “The font was at least 30 centiuwus.”

                                                              1. 1

                                                                Hmm, I know babies are the cuteness standard but I never thought about that removed from the features themselves. Thanks!

                                                              2. 2

                                                                A few things make me think Fantasque Sans Mono is cute. It’s got relatively chunky strokes (the “m” has lots of “ink” to it). I put a sample up at https://m.bonzoesc.net/@bonzoesc/106105499791357051

                                                                There’s enough ornamentation to not be austerely geometric like Futura or Avenir, but there’s an imprecision and lack of uniformity to it. The top of the “k” is looped, and the loop is more tadpole-shaped than it is round, just like the void in the bottom of the two-story “a”. The middle of the “e” is gradually sloped, although that’s really subtle. The “g” is a fun two-story thing, with a bigger void in the basement than the ground floor. The leg on the “R” is at a funky angle. The “3” is bottom-heavy.

                                                                I guess to an extent it might be a feeling of irregularity, lots of things that are purposefully at conflicting angles, terminals on strokes being a bit suggestive of the hypothetical pen starting at an angle before aligning with the grid. Stuff that really only works on retina screens and in a way that an OS vendor wouldn’t pay for.

                                                                1. 1

                                                                  Thanks! That makes a lot of sense. I guess I’ve never really carefully looked at individual glyphs.

                                                                2. 2

                                                                  for me it’s how it’s rounded with a wide radius. kind of like those fridge magnets kids play with.

                                                                1. 7

                                                                  I love Fantasque Sans Mono; it’s so damn cheerful and twee every time I look at a terminal or editor.

                                                                  1. 5

                                                                    l and I look identical in that font :(

                                                                    1. 26

                                                                      As in the font used on lobste.rs which made your comment a bit hard to parse ;)

                                                                      1. 1

                                                                        yeah, fonts shouldn’t introduce ambiguity by displaying different characters the same way.

                                                                      2. 6

                                                                        l and I

                                                                        Perhaps I’m missing something, but if I type them in the code sample input box on Compute Cuter (selecting Fantasque Sans Mono) they look different to me?

                                                                        1. 3

                                                                          I also see clearly identifiable glyphs for each when I try that. The I has top and bottom serifs, the l has a leftward head and rightward tail (don’t know what you call em), and only the | is just a line.

                                                                        2. 1

                                                                          Honestly when is that ever a real issue? You’ve got syntax highlighting, spellcheck, reference check, even a bad typer wouldn’t accidentally press the wrong key, you know to use mostly meaningful variable names and you’ve never used L as an index variable… So maybe if you’re copying base64 data manually but why?

                                                                          1. 9

                                                                            My friend who’s name is Iurii started spelling his name with all-lowercase letters because people called him Lurii. Fonts that make those indistinguishable even in lowercase would strip him of his last resort measure to get people to read his name correctly. (Of course, spelling it Yuriy solves the issue, but Iuirii is how his name is written in his id documents, so it’s not always an option)

                                                                            1. 2

                                                                              It could be, and it’s not just limited to I, l, and 1. That’s why in C, when I have a long integer literal, I also postfix it with ‘L’: 1234L. Doing it that way makes it stand out easier than 1234l. And if I have to do an unsigned long literal, I use a lower case ‘u’: 5123123545uL. That way, the ‘u’ does stand out, compared to 5123123545UL or 5123123545ul.

                                                                            2. 1

                                                                              cf

                                                                          1. 8

                                                                            In the Arduino world, everything is done in C++, a language which is almost never used on 8-bit microcontrollers outside of this setting because it adds significant complexity to the toolchain and overhead to the compiled code.

                                                                            I don’t buy this. C++ is C with extra features available on the principle that you only pay for what you use. (The exception [sic] being exceptions, which you pay for unless you disable them, which a lot of projects do.)

                                                                            The main feature is classes, and those are pretty damn useful; they’re about the only C++ feature Arduino exposes. There is zero overhead to using classes unless you start also using virtual methods.

                                                                            The C++ library classes will most definitely bloat your code — templates are known for that — but again, you don’t have to use any of them.

                                                                            (Aside: can someone explain why anyone’s still using 8-bit MCUs? There are so many dirt cheap and low-power 32-bit SoCs now, what advantage do the old 8-but ones still have?)

                                                                            1. 9

                                                                              (Aside: can someone explain why anyone’s still using 8-bit MCUs? There are so many dirt cheap and low-power 32-bit SoCs now, what advantage do the old 8-but ones still have?)

                                                                              They’re significantly cheaper and easier to design with (and thus less pretentious in terms for layout, power supply parameters, fabrication and so on). All of these are extremely significant factors for consumer products, where margins are extremely small and fabrication batches are large.

                                                                              Edit: as for C++, I’m with the post’s author here – I’ve seen it used on 8-bit MCUs maybe two or three times in the last 15 years, and I could never understand why it was used. If you’re going to use C++ without any of the ++ features except for classes, and even then you still have to be careful not to do whatever you shouldn’t do with classes in C++ this year, you might as well use C.

                                                                              1. 3
                                                                                • RAII is a huge help in ensuring cleanup of resources, like freeing memory.
                                                                                • Utilities like unique_ptr help prevent memory errors.
                                                                                • References (&) aren’t a cure-all for null-pointer bugs, but they do help.
                                                                                • The organizational and naming benefits of classes, parameter overloading and default parameters are significant IMO. stream->close() vs having to remember IOWriteStreamClose(stream, true, kDefaultIOWriteStreamCloseMode).
                                                                                • As @david_chisnall says, templates can be used (carefully!) to produce super optimized type-safe abstractions, and to move some work to compile-time.
                                                                                • Something I only recently learned is that for (x : collection) even works with C arrays, saving you from having to figure out the size of the array in more-or-less fragile ways.
                                                                                • Forward references to functions work inside class declarations.

                                                                                I could probably keep coming up with benefits for another hour if I tried. Any time I’m forced to write in C it’s like being given those blunt scissors they use in kindergarten.

                                                                                1. 2

                                                                                  The memory safety/RAII arguments are excellent generic arguments but there are extremely few scenarios in which embedded firmware running on an 8-bit MCU would be allocating memory in the first place, let alone freeing it! At this level RAII is usually done by allocating everything statically and releasing resources by catching fire, and not because of performance reasons (edit: to be clear, I’ve worked on several projects where no code that malloc-ed memory would pass the linter, let alone get to a code review – where it definitely wouldn’t have passed). Consequently, you also rarely have to figure out the size of an array in “more-or-less fragile ways”, and it’s pretty hard to pass null pointers, too.

                                                                                  The organisational and naming benefits of classes & co. are definitely a good non-generic argument and I’ve definitely seen a lot of embedded code that could benefit from that. However, they also hinge primarily on programmer discipline. Someone who ends up with IOWriteStreamClose(stream, true, kDefaultIOWriteStreamCloseMode) rather than stream_close(stream) is unlikely to end up with stream->close(), either. Also, code that generic is pretty uncommon per se. The kind of code that runs in 8-16 KB of ROM and 1-2 KB of RAM is rarely so general-purpose as to need an abstraction like an IOWriteStream.

                                                                                  1. 2

                                                                                    I agree that you don’t often allocate memory in a low-end MCU, but RAII is about resources, not just memory. For example, I wrote some C++ code for controlling an LED strip from a Cortex M0 and used RAII to send the start and stop messages, so by construction there was no way for me to send a start message and not send an end message in the same scope.

                                                                                    1. 1

                                                                                      That’s one of the neater things that C++ allows for and I liked it a lot back in my C++ fanboy days (and it’s one of the reasons why I didn’t get why C++ wasn’t more popular for these things 15+ years ago, too). I realise this is more in “personal preferences” land so I hope this doesn’t come across as obtuse (I’ve redrafted this comment 3 times to make sure it doesn’t but you never know…)

                                                                                      In my experience, and speaking many years after C++-11 happened and I’m no longer as enthusiastic about it, using language features to manage hardware contexts is awesome right up until it’s not. For example, enforcing things like timing constraints in your destructors, so that they do the right thing when they’re automatically called at the end of the current scope no matter what happens inside the scope, is pretty hairy (e.g. some ADC needs to get the “sleep” command at least 50 uS after the last command, unless that command was a one-shot conversion because it ignores commands while it converts, in which case you have to wait for a successful conversion, or a conversion timeout (in which case you have to clear the conversion flag manually) before sending a new command). This is just one example but there are many other pitfalls (communication over bus multiplexers, finalisation that has to be coordinated across several hardware peripherals etc.)

                                                                                      As soon as you meet hardware that wasn’t designed so that it’s easy to code against in this particular fashion, there’s often a bigger chance that you’ll screw up code that’s supposed to implicitly do the right thing in case you forget to “release” resources correctly than that you’ll forget to release the resources in the first place. Your destructors end up being 10% releasing resources and 90% examining internal state to figure out how to release them – even though you already “know” everything about that in the scope at the end of which the destructor is implicitly called. It’s bug-prone code that’s difficult to review and test, which is supposed to protect you against things that are quite easily caught both at review and during testing.

                                                                                      Also, even when it’s well-intentioned, “implicit behaviour” (as in code that does more things than the statements in the scope you’re examining tell you it does) of any kind is really unpleasant to deal with. It’s hard to review and compare against data sheets/application notes/reference manuals, logic analyser outputs and so on.

                                                                                      FWIW, I don’t think this is a language failure as in “C++ sucks”. I’ve long come to my senses and I think it does but I don’t know of any language that easily gets these things right. General-purpose programming languages are built to coordinate instruction execution on a CPU, I don’t know of any language that allows you to say “call the code in this destructor 50us after the scope is destroyed”.

                                                                              2. 7

                                                                                While you can of course can put a 32 bit SoC on everything, in many cares 8 bitters are simpler to integrate into the hardware designs. A very practical point, is that many 8 bitters are still available in DIP which leads to easier assembly of smaller runs.

                                                                                1. 5

                                                                                  Aside: can someone explain why anyone’s still using 8-bit MCUs? There are so many dirt cheap and low-power 32-bit SoCs now, what advantage do the old 8-but ones still have?

                                                                                  They’re dirt cheaper and lower power. 30 cents each isn’t an unreasonable price.

                                                                                  1. 3

                                                                                    You can get Cortex M0 MCUs for about a dollar, so the price difference isn’t huge. Depending on how many units you’re going to produce, it might be insignificant.

                                                                                    It’s probably a question of what you’re used to, but at least for me working with a 32 bit device is a lot easier and quicker. Those development hours saved pay for the fancier MCUs, at least until the number of produced units gets large. Fortunately most of our products are in the thousands of units…

                                                                                    1. 9

                                                                                      a 3x increase in price is huge if you’re buying lots of them for some product you’re making.

                                                                                      1. 4

                                                                                        Sure, but how many people buying in bulk are using an Arduino (the original point of comparison)?

                                                                                        1. 2

                                                                                          I mean, the example they gave was prototyping for a product..

                                                                                      2. 6

                                                                                        If you’re making a million devices (imagine a phone charger sold at every gas station, corner store, and pharmacy in the civilized world), that $700k could’ve bought a lot of engineer hours, and the extra power consumption adds up with that many devices too.

                                                                                      3. 2

                                                                                        The license fee for a Cortex M0 is 1¢ per device. The area is about the size of a pad on a cheap process, so the cost both of licensing and fabrication is pretty much as close to the minimum cost of producing any IC.

                                                                                        1. 1

                                                                                          The license fee for a Cortex M0 is 1¢ per device.

                                                                                          This (ARM licensing cost) is an interesting datapoint I have been trying to get for a while. What’s your source?

                                                                                          1. 2

                                                                                            A quick look at the Arm web site tells me I’m out of data. This was from Arm’s press release at the launch of the Cortex M0.

                                                                                            1. 1

                                                                                              Damn. Figures.

                                                                                        2. 1

                                                                                          Could you name a couple of “good” 8-bit MCUs? I realized it’s been a while since I looked at them, and it would be interesting to compare my preferred choices to what the 8-bit world has to offer.

                                                                                        3. 2

                                                                                          you only pay for what you use

                                                                                          Unfortunately many arduino libraries do use these features - often at significant cost.

                                                                                          1. 2

                                                                                            I’ve not used Arduino, but I’ve played with C++ for embedded development on a Cortex M0 board with 16 KiB of RAM and had no problem producing binaries that used less than half of this. If you’re writing C++ for an embedded system, the biggest benefits are being able to use templates that provide type-safe abstractions but are all inlined at compile time and end up giving tiny amounts of code. Even outside of the embedded space, we use C++ templates extensively in snmalloc, yet in spite of being highly generic code and using multiple classes to provide the malloc implementation, the fast path compiles down to around 15 x86 instructions.

                                                                                          1. 6

                                                                                            The problem with this argument is that it is entirely possible to do that with a non-proof-of-work system as well. In fact, a blockchain may not be necessary at all.

                                                                                            I don’t think anyone would deny that centralized databases are more performant than distributed ones, pretty much across the board. The key tradeoff Bitcoin makes here is trustless immutability. Gold was our previous trustless money, and over the past couple hundred years all credit monies and fiat currencies have massively depreciated against it.

                                                                                            Centralized ledgers do work, but they cannot provide an ironclad guarantee that the rules of the game will remain fixed into the future. We don’t even know what the supply of dollars will be six months from now. Bitcoin’s supply is predictable decades into the future.

                                                                                            The root problem with conventional currency is all the trust that’s required to make it work. The central bank must be trusted not to debase the currency, but the history of fiat currencies is full of breaches of that trust.

                                                                                            — Satoshi Nakamoto

                                                                                            1. 9

                                                                                              Gold was our previous trustless money, and over the past couple hundred years all credit monies and fiat currencies have massively depreciated against it.

                                                                                              I recently read Valerie Hansen’s The Silk Road and she makes the point that gold was not the trustless money. It was more accepted than coinage, but you need steady, reliable trading partners to make it useful as currency. The real universal currency in the oasis kingdoms was bolts of cloth.

                                                                                              1. 4

                                                                                                The real universal currency in the oasis kingdoms was bolts of cloth.

                                                                                                Bolts of what cloth, from which producer, what quality, at what time was it produced?

                                                                                                Meanwhile a chunk of gold is a chunk of gold.

                                                                                                1. 2

                                                                                                  [Warning: speculation ahead]

                                                                                                  I’m imagining the bolts to be silk.

                                                                                                  Gold can be alloyed with base metals in ways that are hard to detect using technology known to the merchants of the Silk Road. Silk can be more easily assayed.

                                                                                                  1. 1

                                                                                                    I’m sure the oasis kingdoms would have been convinced by your brilliant analysis.

                                                                                                  2. 2

                                                                                                    Nice book, I will have to add that to my list.

                                                                                                    Certainly, different commodities have served as trustless money at different times. Shells and pelts are two other examples. Gold eventually won out for global trade, but it wasn’t universal until fairly late in history.

                                                                                                  3. 2

                                                                                                    You’re falsely equating the space of “not proof-of-work” and “not blockchain” with “centralized.”

                                                                                                    The claim is that one can achieve similar decentralized feats (depending on the goal) without requiring the planet killing compute power of a proof-of-work blockchain.

                                                                                                    1. 3

                                                                                                      Hmm, well it comes up twice. First they claim that FedWire can provide properties such as transaction finality and Sybil resistance. Which is true, it can!

                                                                                                      This entire kludge is negated in FedWire because all participants are known: it is permissioned.

                                                                                                      With 25 core nodes FedWire has a degree of replication, but it is definitely not permissionless. Most importantly, it can’t provide the guarantee I highlighted about the rules remaining fixed.

                                                                                                      Second, near the end of the article they mention proof-of-stake, but it’s a bit of a throwaway line.

                                                                                                      Through the usage of either permissioned systems (like an RTGS) or a proof-of-stake chain, the energy consumed by PoW chains did not need to take place at all. In fact, PoS chains can provide the same types of utility that PoW chains do, but without the negative environmental externalities.

                                                                                                      They mention that the “transition to proof-of-stake is beyond the scope of this article” and don’t really dive into how PoS achieves any of these goals.

                                                                                                      The fatal flaw with proof-of-stake is that 51% attacks are unrecoverable. If one entity ever manages to get more than half the PoS coins, they forever control the rules of the network. PoS networks can be decentralized, but they can never be permissionless. In order to get new coins, you have to buy them from someone who already owns them.

                                                                                                      In contrast, PoW is both decentralized and permissionless. Anyone can participate in the mining process without a prior investment. 51% attacks can temporarily interrupt a PoW chain, but an attack is ultimately recoverable.

                                                                                                      So to clarify my position I would add that it’s not just decentralization which is important, but permissionlessness.

                                                                                                    2. 2

                                                                                                      Centralized ledgers do work, but they cannot provide an ironclad guarantee that the rules of the game will remain fixed into the future. We don’t even know what the supply of dollars will be six months from now. Bitcoin’s supply is predictable decades into the future.

                                                                                                      Being able to reinterpret or change the rules is a feature, since it makes it possible to fix mistakes that were made at the inception of the rules. Generally speaking, if you had a traditional contract where a random participant can just set every other participant’s stake on fire, you can probably convince a legal entity that wasn’t the intention and roll back the contract without affecting everyone else using that currency. If the use of the system goes from “currency you can use to buy pizza, drugs, fake IDs, or murder” to “thawing the tundra and flooding my neighborhood so a few really rich guys get even richer” maybe the rules should change, and in a way that doesn’t require the few really rich guys’ consent.

                                                                                                    1. 4

                                                                                                      One of the most noncontroversial good practices in the industry is automated tests. We don’t write a lot of these because our code doesn’t follow standard decoupling practices; while those principles make for easy to maintain code for a team, they add extra steps during runtime, and allocate more memory. It’s not much on any given transaction, but over thousands per second, it adds up.

                                                                                                      Excuse me? I’ve never heard of running the test suite on every request in production lmao

                                                                                                      1. 3

                                                                                                        It’s not talking about running the test suite on every request, but about abstraction layers you might need to allow testing. Say you have a function that uses service X. You will mock X during testing, so you add an abstraction layer and inject a mock object X’ during testing in place of the real X. This indirection adds overhead.

                                                                                                      1. 3

                                                                                                        The impact gets magnified because we develop about 200 new Java services per year.

                                                                                                        Do they wear out and need replacement? Is this an uber thing where they just choose to implement business rules for every country as a separate microservice?

                                                                                                        1. 3

                                                                                                          This is yet another manifestation of Conway’s law. There are trade offs, it is not necessarily bad thing.

                                                                                                          1. 1

                                                                                                            That sentence stood out for me as well. Why do they need so many new services? Doesn’t that overcomplicate their own system? With microservices, it seems it’s always about adding, never about removing or consolidating services…

                                                                                                            1. 3

                                                                                                              In my experience what happens is that the DevOps team makes it suuuper easy for someone to make a new service (create a new github repo and it has all the stuff templated out for you so all you do is write code and everything else just works). This tends to make teams a little microservice trigger happy… eventually things become a mess and someone decides to “consolidate”.

                                                                                                              Reducing the barrier so much to whip up new services will generally result in this behavior. I think it also depends on the engineering organization size and structure (what teams own which features, how good are they at working cross-team, etc.)

                                                                                                              1. 3

                                                                                                                That sentence stood out for me as well. Why do they need so many new services?

                                                                                                                Depends on your product offerings and scale. A company like Twilio would probably have many products and many teams supporting those products. Monoliths scale quite badly in those situations, unless you have an entire team devoted to managing your monolith like Google or FB does.

                                                                                                            1. 4

                                                                                                              A couple notes from the comments explain some of the reasoning for denormalizing time changes into the files…

                                                                                                              Jon R says: 3 years ago at 4:13 am

                                                                                                              It seems to me that the iCalendar working group had two choices: include the time zone definition in the event description, which carries the risk that the software reading the event may have better knowledge of the time zone than the software creating it, or don’t include the time zone definition in the event description, which carries the risk that the software creating the event may have better knowledge of the time zone than the software reading it.

                                                                                                              Given that the person creating the event is in a better position to judge whether the event is being created at the correct time than the person receiving it, I would suggest that the decision they made is less “batshit insane” and more “practical and sensible”. Also, if the zone is described in the event and there are multiple attendees then at least they will still all turn up at the same time, rather than potentially having different ideas about when it is.

                                                                                                              It seems possible as well that the tzdata file was less ubiquitous in 1997 when they were creating the iCalendar file format. Unfortunately, the IETF Calendaring and Scheduling Working Group’s mail archives appear to be lost in the mists of time.

                                                                                                              At least Apple does use the tzdata zone names, e.g. “Europe/London”, when referring to the time zones - unlike Microsoft who tend to say things like “GMT Standard Time”, by which, in the British summer, they mean not GMT but GMT+1. Now that’s batshit insane.

                                                                                                              … and that you can just use a time zone reference, with the caveat that of course some file consumers are going to handle that differently:

                                                                                                              farktronix says: 3 years ago at 6:22 pm

                                                                                                              They added a way to specify timezones by reference back in 2016 with RFC 7809. You’ll love this quote from that RFC:

                                                                                                              “Observation and experiments have shown that, in the vast majority of cases, CalDAV clients have typically ignored time zone definitions in data received from servers, and instead make use of their own “built-in” definitions for the corresponding time zone identifier.”

                                                                                                              [elided post]

                                                                                                              farktronix says: 3 years ago at 6:26 pm

                                                                                                              Yep, but that comment from the RFC means you can probably leave out the TIMEZONE section in your standalone iCalendar files as long as you’re using Olson names because most clients don’t use the TIMEZONE section as the source of timezone information anyway.

                                                                                                              jwz says: 3 years ago at 6:33 pm

                                                                                                              Yeah, I had been doing that for years, and it seemed to be working fine. The only reason I even noticed this was that the ICS validator I had been using went offline, and the new one I switched to complains about it.

                                                                                                              1. 74

                                                                                                                First, their argument for Rust (and against C) because of memory safety implies that they have not done due diligence in finding and fixing such bugs. […] And with my bc, I did my due diligence with memory safety. I fuzzed my bc and eliminated all of the bugs.

                                                                                                                This seems like such a short-sighted, limited view. Software is not bug-free. And ruling out a class of bugs by choice of technology as a measure of improving overall robustness won’t fix everything, but at the very least it’s a trade-off that deserves more thorough analysis than this empty dismissal.

                                                                                                                1. 21

                                                                                                                  I think he is probably right (after all, he wrote it) when he says rewriting his bc in Rust would make it more buggy. I disagree this is an empty dismissal, since it is backed by his personal experience.

                                                                                                                  For the same reason, I think cryptography developers are probably right (after all, they wrote it) when they say rewriting their software in Rust would make it less buggy. So the author is wrong about this. His argument is not convincing why he knows better than developers.

                                                                                                                  1. 15

                                                                                                                    I think there’s a big difference between programs and libraries with stable requirements and those that evolve here. The bc utility is basically doing the same thing that it did 20 years ago. It has a spec defined by POSIX and a few extensions. There is little need to modify it other than to fix bugs. It occasionally gets new features, but they’re small incremental changes.

                                                                                                                    Any decision to rewrite a project is a trade off between the benefits from fixing the accumulated technical debt and the cost of doing and validating the rewrite. For something stable with little need of future changes, that trade is easy to see: the cost of the rewrite is high, the benefit is low. In terms of rewriting in a memory-safe language, there’s an additional trade between the cost of a memory safety vulnerability and the cost of the rewrite. The cost of Heartbleed in OpenSSL was phenomenal, significantly higher than the cost of rewriting the crypto library. In the cast of bc, the cost of a memory safety bug is pretty negligible.

                                                                                                                    Data from Microsoft’s Security Response Center and Google’s Project Zero agree that around 70-75% of vulnerabilities are caused by memory safety bugs. Choosing a language that avoids those by construction means that you can focus your attention on the remaining 25-30% of security-related bugs. The author talks about fuzzing, address sanitiser, and so on. These are great tools. They’re also completely unnecessary in a memory-safe language because they try to find classes of bugs that you cannot introduce in the first place in a memory-safe language (and they do so probabilistically, never guaranteeing that they’ve found them all).

                                                                                                                    If you’re starting a new project, then you need a really good reason to start it in C and pay the cost of all of that fuzzing.

                                                                                                                    1. 17

                                                                                                                      Data from Microsoft’s Security Response Center and Google’s Project Zero agree that around 70-75% of vulnerabilities are caused by memory safety bugs. Choosing a language that avoids those by construction means that you can focus your attention on the remaining 25-30% of security-related bugs.

                                                                                                                      There’s an implied assumption here that if a language is memory safe, those memory safe bugs will simply go away. In my experience, that is not quite true. Sometimes those memory safety bugs will turn into logic bugs.

                                                                                                                      Not to pick on Rust here, but in Rust it is very common to put values into an array and use array indices instead of pointers when you have some kind of self-referential data structure that’s impossible to express otherwise using rust’s move semantics. If you simply do such a naive transformation of your C algorithm, your code will be memory safe, but all your bugs, use after free, etc, will still be there. You just lifted them to logical bugs.

                                                                                                                      Rust has no good abstractions to deal with this problem, there are some attempts but they all have various practical problems.

                                                                                                                      Other languages like ATS and F* have abstractions to help with this problem directly, as well as other problems of logical soundness.

                                                                                                                      1. 13

                                                                                                                        Right - but in lifting these from memory bugs to logic bugs, you get a runtime panic/abort instead of a jump to a (likely-attacker-controllable) address. That’s a very different kind of impact!

                                                                                                                        1. 9

                                                                                                                          You don’t get a panic if you access the “wrong” array index. The index is still a valid index for the array. Its meaning (allocated slot, free slot, etc), is lost to the type system, though in a more advanced language it need not be. This later leads to data corruption, etc, just like in C.

                                                                                                                          1. 7

                                                                                                                            It leads to a much safer variant of data corruption though. Instead of corrupting arbitrary memory in c or c++ (like a function pointer, vtable, or return address), you are only corrupting a single variable’s value in allocated and valid and aligned memory (like a single int).

                                                                                                                            You would get a panic in rust for every memory corruption bug that could cause arbitrary code execution, which is what matters.

                                                                                                                            1. 1

                                                                                                                              This later leads to data corruption, etc, just like in C.

                                                                                                                              Can you expand on this? I had expected the behaviour in Rust to be significantly safer than C here. In C, the data corruption caused by use-after free often allows an attacker to execute arbitrary code.

                                                                                                                              I totally see your point about logical corruption (including things like exposing critical secrets), but I don’t follow that all the way to “just like C”. How would an array index error be exploited in Rust to execute arbitrary code?

                                                                                                                              1. 11

                                                                                                                                I have once written a bytecode interpreter in C++, for a garbage collected scripting language. I implemented my own two-space garbage collector. For performance reasons, I didn’t use malloc() directly, but instead allocated a big enough byte array to host all my things. If I overflew that array, Valgrind could see it. But if I messed up it’s internal structure, no dice. That heap of mine was full of indices and sizes, and I made many mistakes that caused them to be corrupted, or somehow not quite right. And I had no way to tell.

                                                                                                                                I solved this by writing my own custom heap analyser, that examined the byte array and tell me what’s in there. If I see all my “allocated” objects in order, all was well. Often, I would see something was amiss, and I could go and fix the bug. Had I written it in Rust instead, I would have had to write the exact same custom heap analyser. Because Rust wouldn’t have prevented me from putting the wrong values inside my array. It’s perfectly “safe” after all, to write gibberish in that array as long as I don’t overflow it.

                                                                                                                                Now could this particular bug lead to arbitrary code execution? Well, not quite. It would generate wrong results, but it would only execute what my C++/Rust program would normally execute. In this case however, I was implementing a freaking scripting language. The code an attacker could execute wasn’t quite arbitrary, but it came pretty damn close.

                                                                                                                                1. 6

                                                                                                                                  The effects of data corruption depend on what the code does with the data. This often means arbitrary code execution, but not always. It’s not a property of C, it’s a property of the code. This doesn’t change when you change the implementation language.

                                                                                                                                  Fundamentally there is no semantic difference between a pointer in a C heap and an array index into a Rust array. In fact some sophisticated blog authors that explain this array technique often point out they compile to the exact same assembly code. It’s what the code does with the data that leads to exploitation (or not).

                                                                                                                                  Of course Rust has many additional safety advantages compared to C, buffer overflows don’t smash the stack, etc, and using references in Rust if you can is safe. And when using references, there’s a great deal of correlation between Rust’s notion of memory safety and true logic safety. This is good! But many people don’t realise that this safety is predicated on the lack of aliasing. The borrow checker is only a mechanism to enforce this invariant, it’s not an operative abstraction. It’s the lack of aliasing that gets you the safety, not the borrow checker itself. When you give up aliasing, you lose a lot of what Rust can do for you. Virtually everybody understands that if you introduce unsafe pointers, they give up safety, but less people seem to understand that introducing aliasing via otherwise safe mechanism has the same effect. Of course, the program continues to be memory safe in Rust terms, but you lose the strong correlation between memory safety and logic safety that you used to have.

                                                                                                                                  Not that there’s anything wrong with this, mind you, it’s just something people need to be aware of, just as they are already aware of the tradeoffs that they make when using unsafe. It does make a projection for the number of bugs that Rust can prevent in practice more difficult, though.

                                                                                                                                  1. 6

                                                                                                                                    I think this is incorrect. Arbitrary code execution does not mean “can execute an arbitrary part of my program due to a logic bug”, it means “can execute arbitrary code on the host, beyond the code in my program”. Even a rust alias logic bug dies not open up this kind of arbitrary code execution exposure because you can’t alias an int with a function pointer or a vtable or a return address on the stack, like you can in c or c++. You can only alias an int with an int in safe rust, which is an order of magnitude safer and really does eliminate an entire class of vulnerabilities.

                                                                                                                                    1. 6

                                                                                                                                      I think this is incorrect. Arbitrary code execution does not mean “can execute an arbitrary part of my program due to a logic bug”, it means “can execute arbitrary code on the host, beyond the code in my program”.

                                                                                                                                      In the security research world, we usually treat control of the program counter (the aptly named rip on x86-64) as “arbitrary code execution.” You can do a surprising of programming using code that’s already in a process without sending any byte code of your own with return-oriented programming.

                                                                                                                                      1. 3

                                                                                                                                        But does Rust let you do that here? What does a snippet of Rust code look like that allows attacker-controlled indexing into an array escalate to controlling the program counter?

                                                                                                                                        1. 2

                                                                                                                                          Surely you agree that “variables changing underfoot” implies “programs flow becomes different from what I expect”. That’s why we use variables, to hold the Turing machine state which influences the next state. A logical use after free means “variables changing underfoot”. You don’t expect a free array slot’s value (perhaps now reallocated) to change based on some remote code, but it does.

                                                                                                                                          1. 3

                                                                                                                                            Right, but “program flow becomes different from what I expect, but it still must flow only to instruction sequences that the original program encoded” is much much safer than “program flow can be pointed at arbitrary memory, which might not even contain instructions, or might contain user-supplied data”.

                                                                                                                                            1. 2

                                                                                                                                              With ROP, the program flow only goes through “instruction sequences that the original program encoded”, and yet ROP is pretty much fatal.

                                                                                                                                              1. 7

                                                                                                                                                ROP is not possible when you index an array wrong in rust, what is your point?

                                                                                                                                                1. 6

                                                                                                                                                  And you can’t do rop in safe rust.

                                                                                                                                                  1. 2

                                                                                                                                                    Maybe not directly within the native code of the program itself, but I think (at least part of) 4ad’s point is that that’s not the only level of abstraction that matters (the memory bug vs. logic bug distinction).

                                                                                                                                                    As an example, consider a CPU emulator written entirely in safe Rust that indexes into a u8 array to perform its emulated memory accesses. If you compile an unsafe program to whatever ISA you’re emulating and execute it on your emulator, a bad input could still lead to arbitrary code execution – it’s at the next semantic level up and not at the level of your program itself, but how much does that ultimately matter? (It’s not really terribly different than ROP – attacker-controlled inputs determining what parts of your program get executed.)

                                                                                                                                                    That’s admittedly a somewhat “extreme” case, but I don’t think the distinction between programs that do fall into that category and those that don’t is terribly clear. Nearly any program can, if you squint a bit, be viewed essentially as a specialized interpreter for the language of its config file (or command-line flags or whatever else).

                                                                                                                                                    1. 2

                                                                                                                                                      There’s no distinction here. If your program implements a cpu emulator then your program can execute with no arbitrary code execution at all and still emulate arbitrary code execution on the virtual cpu. If you want the virtual program executing to not possibly execute arbitrary virtual instructions, you need to generate the virtual program’s instructions using a safe language too.

                                                                                                                                                      In most cases, though, arbitrary virtual code execution is less dangerous than arbitrary native code execution, though that’s beside the point.

                                                                                                                                                      1. 2

                                                                                                                                                        So…we agree? My point was basically that attacker-controlled arbitrary code execution can happen at multiple semantic levels – in the emulator or in the emulated program (in my example), and writing the emulator in a safe language only protects against the former, while the latter can really be just as bad.

                                                                                                                                                        Though I realize now my example was poorly chosen, so a hopefully better one: even if both the emulator and the emulated program are written in memory-safe languages, if the emulator has a bug due to an array-index use-after-free that causes it to misbehave and incorrectly change the value of some byte of emulated memory, that destroys the safety guarantees of the emulated program and we’re back in arbitrary-badness-land.

                                                                                                                                                        1. 1

                                                                                                                                                          Sure but this is just as meaningful as talking about a cpu hardware bug that might cause a native safe program to run amok. Technically true but not very useful when evaluating the safe programming language

                                                                                                                                          2. 3

                                                                                                                                            Right, I agree, and safe rust aliasing that the GP described is not possible to control the program counter arbitrarily.

                                                                                                                                          3. 4

                                                                                                                                            Yeah exactly, this is the part I thought @4ad was arguing was possible. Eg. in C, use-after-free often allows me to make the program start interpreting attacker-provided data as machine code. I thought this is what 4ad was saying was also possible in Rust, but I don’t think that’s what they are claiming now.

                                                                                                                                            To me, that’s a big difference. Restricting the possible actions of a program to only those APIs and activities the original code includes, vs C where any machine code can be injected in this same scenario, is a major reduction in attack surface, to me.

                                                                                                                                            1. 4

                                                                                                                                              One thing to note is that code is data and data is code, in a true, hard-mathematical sense.

                                                                                                                                              The set of

                                                                                                                                              the possible actions of a program to only those APIs and activities the original code includes,

                                                                                                                                              and

                                                                                                                                              C where any machine code can be injected in this same scenario

                                                                                                                                              is exactly the same (unbounded!). Of course it is much easier in practice to effect desired behavior when you can inject shell code into programs, but that’s hardly required. You don’t need to inject code with ROP either (of course ROP itself is not possible in Rust because of other mitigations, this is just an example).

                                                                                                                                              Please note that in no way I am suggesting that Rust is doing anything bad here. Rust is raising the bar, which is great. I want the bar raised even higher, and we know for a fact that this is possible today both in theory and practice. Until we raise the bar, I want people to understand why we need to raise the bar.

                                                                                                                                              At the end of a day you either are type safe or you aren’t. Of course the specifics of what happens when you aren’t type safe depend on the language!

                                                                                                                                              PS: arrays can contain other things than integers, e.g. they can contain function pointers. Of course you can’t confuse an int with a function pointer, but using the wrong function pointer is pretty catastrophic.

                                                                                                                                              1. 3

                                                                                                                                                is exactly the same (unbounded!).

                                                                                                                                                I guess this is what I don’t understand, sorry for being dense. Can you show a concrete code example?

                                                                                                                                                In my mind I see a program like this:

                                                                                                                                                
                                                                                                                                                enum Action {
                                                                                                                                                    GENERATE_USER_WEEKLY_REPORT,
                                                                                                                                                    GENERATE_USER_DAILY_REPORT,
                                                                                                                                                    LAUNCH_NUCLEAR_MISSILES
                                                                                                                                                }
                                                                                                                                                
                                                                                                                                                impl Action {
                                                                                                                                                  pub fn run(&self) {
                                                                                                                                                    ...
                                                                                                                                                  }
                                                                                                                                                }
                                                                                                                                                
                                                                                                                                                // Remember to remove the nuclear missile action before calling!
                                                                                                                                                fn exploitable( my_actions:  &Vec<Box<Action>>, user_controlled: usize ) {
                                                                                                                                                  my_actions[user_controlled].run();
                                                                                                                                                }
                                                                                                                                                
                                                                                                                                                

                                                                                                                                                In my mind, there are two differences between this code in Rust and similar code in C:

                                                                                                                                                1. This only allows the user to launch nuclear missiles; it does not allow them to, say, write to the harddrive or make network calls (unless one of the actions contained code that did that ofc); in C, I’d likely be able to make something like this call any system function I wanted to, whether machine code to do that was present in the original binary or not.

                                                                                                                                                2. In Rust, this doesn’t allow arbitrary control flow, I can’t make this jump to any function in the binary, I can only trick it into running the wrong Action; in C, I can call run on any arbitrary object anywhere in the heap.

                                                                                                                                                ie. in C, this would let me execute anything in the binary, while in Rust it still has to abide by the control flow of the original program, I thought was the case, anyway.

                                                                                                                                                I think you’re saying this is wrong, can you explain how/why and maybe show a code example if you can spare the time?

                                                                                                                                                1. 4

                                                                                                                                                  This is correct and 4ad is mistaken. I’m not sure why 4ad believes the two are equivalent; they aren’t.

                                                                                                                                                2. 3

                                                                                                                                                  “is exactly the same”

                                                                                                                                                  It simply isn’t, and I’m not sure why you think it is.

                                                                                                                                            2. 1

                                                                                                                                              In fact some sophisticated blog authors that explain this array technique often point out they compile to the exact same assembly code.

                                                                                                                                              Do you have any links on this that you recommend?

                                                                                                                                    2. 2

                                                                                                                                      Good analysis. You didn’t use the words, but this is a great description of the distinction between stocks and flows: https://en.wikipedia.org/wiki/Stock_and_flow. I wish more people talking about software paid attention to it.

                                                                                                                                      1. 2

                                                                                                                                        Author here.

                                                                                                                                        I would also argue that crypto should not change often, like bc. You might add ciphers, or deprecate old ones, but once a cipher is written and tested, there should be very little need for it to change. In my opinion.

                                                                                                                                      2. 8

                                                                                                                                        For the same reason, I think cryptography developers are probably right (after all, they wrote it) when they say rewriting their software in Rust would make it less buggy.

                                                                                                                                        Have they actually rewrote anything? Or have they instead selected a different crypto library they trust better than the previous one? On the one hand, Rust has no advantage over C in this particular context. On the other hand, they may have other reasons to trust the Rust library better than the C one. Maybe it’s better tested, or more widely used, or audited by more reputable companies.

                                                                                                                                        If I take your word for it however, I have to disagree. Rewriting a cryptographic library in Rust is more likely to introduce new bugs, than it is to fix bugs that haven’t already been found and fixed in the C code. I do think however that the risk is slim, if they take care to also port the entire test suite as well.

                                                                                                                                        1. 7

                                                                                                                                          In the Cryptography case isn’t the Rust addition some ASN.1 parsing code? This is cryptography adjacent but very much not the kind of different that your point about cryptography code makes. Parsing code unless it is very trivial and maybe not even then tends to be some of the more dangerous code you can write. In this particular case Rust is likely a large improvement in both ergonomics for the parsing as well as safety.

                                                                                                                                          1. 1

                                                                                                                                            You’ve got a point. I can weaken it somewhat, but not entirely eliminate it.

                                                                                                                                            I don’t consider ASN.1 “modern”. It’s over complicated for no good reason. Certificates can be much, much simpler than that: at each level, you have a public key, ID & expiration date, a certificate of the CA, and a signature from the CA. Just put them all in binary blobs, and the only thing left to parse are the ID & expiration date, which can be left to the application. And if the ID is an URL, and the expiration date is an 64-bit int representing seconds from epoch, there won’t be much parsing to do… Simply put, parsing certificate can be “very trivial”.

                                                                                                                                            Another angle is that if you need ASN.1 certificates, then you are almost certainly using TLS, so you’re probably in a context where you can afford the reduced portability of a safer language. Do use the safer language in this case.

                                                                                                                                            Yet another angle is that in practice, we can separate the parsing code from the rest of the cryptographic library. In my opinion, parsing of certificate formats do not belong to a low-level cryptographic library. In general, I believe the whole thing should be organised in tiers:

                                                                                                                                            • At the lowest level, you have the implementation of the cryptographic primitives.
                                                                                                                                            • Just above that, you have constructions: authenticated encryption, authenticated key exchange, PAKE…
                                                                                                                                            • Higher up still, you have file format, network packet formats, and certificates. They can (and should) still be trivial enough that even C can be trusted with them. They can still be implemented with zero dependencies, so C’s portability can still be a win. Though at that level, you probably have an idea of the target platforms, making portability less of a problem.
                                                                                                                                            • Higher up still is interfacing with the actual system: getting random numbers, talking to the file system, actually sending & receiving network packets… At that level, you definitely know which set of platforms you are targetting, and memory management & concurrency start becoming real issues. At that point you should seriously consider switching to a non-C, safer language.
                                                                                                                                            • At the highest level (the application), you should have switched away from C in almost all cases.
                                                                                                                                        2. 2

                                                                                                                                          For the same reason, I think cryptography developers are probably right (after all, they wrote it) when they say rewriting their software in Rust would make it less buggy. So the author is wrong about this. His argument is not convincing why he knows better than developers.

                                                                                                                                          This is a fair point. When it comes down to it, whether I am right or wrong about it will only be seen in the consequences of the decision that they made.

                                                                                                                                        3. 14

                                                                                                                                          Here’s the more thorough analysis you’re asking for: this is cryptographic code we’re talking about. Many assumptions that would be reasonable for application code simply does not apply here:

                                                                                                                                          • Cryptographic code is pathologically straight-line, with very few branches.
                                                                                                                                          • Cryptographic code has pathologically simple allocation patterns. It often avoids heap allocation altogether.
                                                                                                                                          • Cryptographic code is pathogenically easy to test, because it is generally constant time: we can test all code paths by covering all possible input & output lengths. If it passes the sanitizers & valgrind under those conditions, it is almost certainly correct (with very few exceptions).

                                                                                                                                          I wrote a crypto library, and the worst bug it ever had wasn’t caused by C, but by a logic error that would have happened even in Haskell. What little undefined behaviour it did have didn’t have any visible effect on the generated code.

                                                                                                                                          Assuming you have a proper test suite (that tests all input & output lengths), and run that test suite with sanitisers & Valgrind, the kind of bug Rust fixes won’t occur in your cryptographic C code to begin with. There is therefore no practical advantage, in this particular case to using Rust over C. Especially when the target language is Python: you have to write bindings anyway, so you can’t really take advantage of Rust’s better APIs.

                                                                                                                                          1. 2

                                                                                                                                            These bugs still occur in critical software frequently. It is more difficult and time consuming to do all of the things you proposed than it is to use a safer language (in my opinion), and the safer language guarantees more than your suggestions would. And there’s also no risk of someone forgetting to run those things.

                                                                                                                                            1. 6

                                                                                                                                              These bugs still occur in critical software frequently.

                                                                                                                                              Yes they do. I was specifically talking about one particular kind of critical software: cryptographic code. It’s a very narrow niche.

                                                                                                                                              It is more difficult and time consuming to do all of the things you proposed than it is to use a safer language (in my opinion)

                                                                                                                                              In my 4 years of first hand experience writing cryptographic code, it’s really not. Rust needs the same test suite as C does, and turning on the sanitizers (or Valgrind) on this test suite is a command line away. The real advantage of Rust lies in its safer API (where you can give bounded buffers instead of raw pointers). Also, the rest of the application will almost certainly be much safer if it’s written in Rust instead of C.

                                                                                                                                              And there’s also no risk of someone forgetting to run those things.

                                                                                                                                              Someone who might forget those things has no business writing cryptographic code at all yet, be it in C or in Rust. (Note: when I started out, I had no business writing cryptographic code either. It took over 6 months of people findings bugs and me learning to write a better test suite before I could reasonably say my code was “production worthy”.)

                                                                                                                                              1. 6

                                                                                                                                                Rusts advantage goes much further than at the api boundary, but again the discussion should be around how to get safer languages more widely used (ergonomics, platform support) and not around “super careful programmers who have perfect test suites and flawless build pipelines don’t need safer languages”. To me it is like saying “super careful contractors with perfect tools don’t need safety gear”, except if you make a mistake in crypto code, you hurt more than just yourself. Why leave that up to human fallability?

                                                                                                                                                1. 4

                                                                                                                                                  Rusts advantage goes much further than at the api boundary

                                                                                                                                                  Yes it does. In almost all domains. I’m talking about modern cryptographic code.

                                                                                                                                                  again the discussion should be around how to get safer languages more widely used (ergonomics, platform support)

                                                                                                                                                  Write a spec. A formal one if possible. Then implement that spec for more platforms. Convincing projects to Rewrite It In Rust may work as a way to coerce people into supporting more platforms, but it also antagonises users who just get non-working software; such a strategy may not be optimal.

                                                                                                                                                  not around “super careful programmers who have perfect test suites and flawless build pipelines don’t need safer languages”.

                                                                                                                                                  You’re not hearing me. I’m not talking in general, I’m talking about the specific case of cryptographic code (I know, I’m repeating myself.)

                                                                                                                                                  • In this specific case, the amount of care required to write correct C code is the same as the amount of care required to write Rust code.
                                                                                                                                                  • In this specific case, Rust is not safer.
                                                                                                                                                  • In this specific case, you need that perfect test suite. In either language.
                                                                                                                                                  • In this specific case, you can write that perfect test suite. In either language.

                                                                                                                                                  except if you make a mistake in crypto code, you hurt more than just yourself. Why leave that up to human fallability?

                                                                                                                                                  I really don’t. I root out potential mistakes by expanding my test suite as soon as I learn about a new class of bugs. And as it happens, I am painfully aware of the mistakes I made. One of them was even a critical vulnerability. And you know what? Rust wouldn’t have saved me.

                                                                                                                                                  Here are the bugs that Rust would have prevented:

                                                                                                                                                  • An integer overflow that makes elliptic curves unusable on 16-bit platforms. Inconvenient, but (i) it’s not a vulnerability, and (ii) Monocypher’s elliptic curve code is poorly suited to 16-bit platforms (where I recommend C25519 instead).
                                                                                                                                                  • An instance of undefined behaviour the sanitizers didn’t catch, that generated correct code on the compilers I could test. (Note that TweetNaCl itself also have a couple instances of undefined behaviour, which to my knowledge never caused anyone any problem so far. Undefined behaviour is unclean, but it’s not always a death sentence.)
                                                                                                                                                  • A failure to compile code that relied on conditional compilation. I expect Rust has better ways than #ifdef, though I don’t actually know.

                                                                                                                                                  Here are the bugs that Rust would not have prevented:

                                                                                                                                                  • Failure to wipe internal buffers (a “best effort” attempt to erase secrets from the computer’s RAM).
                                                                                                                                                  • A critical vulnerability where fake signatures are accepted as if they were genuine.

                                                                                                                                                  Lesson learned: in this specific case, Rust would have prevented the unimportant bugs, and would have let the important ones slip through the cracks.

                                                                                                                                                  1. 8

                                                                                                                                                    I’m talking about modern cryptographic code.

                                                                                                                                                    In this discussion, I think it is important to remind that cryptography developers are explicitly and intentionally not writing modern cryptographic code. One thing they want to use Rust on is ASN.1 parsing. Modern cryptographic practice is that you shouldn’t use ASN.1 and they are right. Implementing ASN.1 in Rust is also right.

                                                                                                                                                    1. 4

                                                                                                                                                      I’m talking about modern cryptographic code.

                                                                                                                                                      So am I.

                                                                                                                                                      In this specific case, the amount of care required to write correct C code is the same as the amount of care required to write Rust code.

                                                                                                                                                      I disagree.

                                                                                                                                                      In this specific case, Rust is not safer.

                                                                                                                                                      I disagree here too.

                                                                                                                                                      In this specific case, you need that perfect test suite. In either language.

                                                                                                                                                      I partially agree. There is no such thing as a perfect test suite. A good crypto implementation should have a comprehensive test suite, of course, no matter the language. But that still isn’t as good as preventing these classes of bugs at compile time.

                                                                                                                                                      Rust wouldn’t have saved me.

                                                                                                                                                      Not really the point. Regardless of how lucky or skilled you are that there are no known critical vulnerabilities in these categories in your code, that disregards both unknown vulnerabilities in your code, and vulnerabilities in other people’s code as well. A safe language catches all three and scales; your method catches only one and doesn’t scale.

                                                                                                                                                      1. 1

                                                                                                                                                        Note that I did go the extra mile and went a bit further than Valgrind & the sanitisers. I also happen to run Monocypher’s test suite under the TIS interpreter, and more recently TIS-CI (from TrustInSoft). Those things guarantee that they’ll catch any and all undefined behaviour, and they found a couple bugs the sanitisers didn’t.

                                                                                                                                                        that disregards both unknown vulnerabilities in your code

                                                                                                                                                        After that level of testing and a successful third party audit, I am confident there are none left.

                                                                                                                                                        and vulnerabilities in other people’s code as well

                                                                                                                                                        There is no such code. I have zero dependencies. Not even the standard library. The only thing I have to fear now is a compiler bug.

                                                                                                                                                        your method catches only one and doesn’t scale.

                                                                                                                                                        I went out of my way not to scale. Yet another peculiarity of modern cryptographic code, is that I don’t have to scale.

                                                                                                                                                        1. 1

                                                                                                                                                          There is no such code.

                                                                                                                                                          Sure there is. Other people write cryptographic code too. Unless you are here just arguing against safe languages for only this single project? Because it seemed like a broader statement originally.

                                                                                                                                                          I went out of my way not to scale.

                                                                                                                                                          I mean scale as in other developers also writing cryptographic software, not scale as in your software scaling up.

                                                                                                                                                          1. 1

                                                                                                                                                            Sure there is. Other people write cryptographic code too. Unless you are here just arguing against safe languages for only this single project

                                                                                                                                                            I was talking about Monocypher specifically. Other projects do have dependencies, and any project that would use Monocypher almost certainly has dependencies, starting with system calls.

                                                                                                                                                            I mean scale as in other developers also writing cryptographic software, not scale as in your software scaling up.

                                                                                                                                                            Fair enough. I was thinking from the project’s point of view: a given project only need one crypto library. A greenfield project can ditch backward compatibility and use a modern crypto library, which can be very small (or formally verified).

                                                                                                                                                            Yes, other people write cryptographic code. I myself added my own to this ever growing pile because I was unsatisfied with what we had (not even Libsodium was enough for me: too big, not easy to deploy). And the number of bugs in Monocypher + Libsodium is certainly higher than the number of bugs in Libsodium alone. No doubt about that.

                                                                                                                                                            Another reason why crypto libraries written in unsafe languages don’t scale, is the reputation game: it doesn’t matter how rigorously tested or verified my library is, if you don’t know it. And know it you cannot, unless you’re more knowledgeable than I am, and bother to audit my work yourself, which is prohibitively expensive. So in practice, you have to fall back to reputation and external signs: what other people say, the state of documentation, the security track record, issues from the bug tracker…

                                                                                                                                            2. 8

                                                                                                                                              This made me twitch!

                                                                                                                                              Why make a choice which prevents an entire class of bugs when you could simply put in extra time and effort to make sure you catch and fix them all?

                                                                                                                                              Why lock your doors when you can simply stand guard in front of them all night with a baseball bat?

                                                                                                                                              While personally would back the cryptography devs’ decision here, I think there is a legitimate discussion to be had around whether breaking compatibility for some long-standing users is the right thing to do. This post isn’t contributing well to that discussion.

                                                                                                                                            1. 25

                                                                                                                                              Go gives you the tools for concurrency, but doesn’t make them terribly easy - except to misuse.

                                                                                                                                              Compare Erlang, which does make concurrency an essential part of its worldview, and makes building those systems less hassle-prone than Go.

                                                                                                                                              1. 9

                                                                                                                                                They use different models for concurrency, with different tradeoffs and different expressive power, but I don’t think Erlang or Go is generally better or worse than the other with regards to concurrency.

                                                                                                                                                1. 5

                                                                                                                                                  Could you give us an example? I’m not familiar with Erlang

                                                                                                                                                  1. 5

                                                                                                                                                    Erlang concurrency is in two models: a low-level process (green thread, no shared state) that can send and receive messages using imperative code, and one of several high-level behaviors that encapsulate the error-prone sending and receiving semantics so the developer only has to implement a handful of callbacks that handle incoming messages that may or may not change the server’s state and may or may not expect a reply.

                                                                                                                                                    In a chat server, you’d model a connection as a gen_server or gen_statem that receives outgoing messages from TCP and incoming messages as a cast from a channel, and a channel as a gen_server that relays incoming messages to channel members.

                                                                                                                                                    http://erlang.org/doc/design_principles/gen_server_concepts.html has some specific information about behaviors.

                                                                                                                                                    https://learnyousomeerlang.com/the-hitchhikers-guide-to-concurrency is the first of several chapters in Fred Hébert’s excellent book about the low-level concurrency, and https://learnyousomeerlang.com/what-is-otp is the first of several about the high-level behavior-based model that I’m most familiar with.

                                                                                                                                                1. 5

                                                                                                                                                  This is really cool! I’ve watched the video linked at the bottom and find the reversing-process itself to be very interesting.

                                                                                                                                                  Some reversers take the effort to try to generate byte-by-byte-matching binaries, but this doesn’t seem to be the goal here. I wonder if it makes sense to reproduce the exactly matching binary and then applying the bugfixes as a set of patches. Indeed, this makes the general process harder, but it would provide more insight from the outside which bugs were originally present in the engine and make it possible for people to do “speed runs” on confirmed vanilla-engines, just to give one idea.

                                                                                                                                                  Let’s hope Take-Two appreciates the fan-driven effort instead of trying to shut it down. I don’t think there’s anything in the engine one would still consider a trade-secret or bleeding-edge-development.

                                                                                                                                                  Reverse-engineering is the only way to keep old games alive, because you can’t compile or execute “intellectual property” on your computer as soon as the old binaries stop working. Thus, in my opinion, IP shouldn’t be valued so highly in such obvious cases, comparable to how I think patents should not be possible to be kept by those that don’t make use of them.

                                                                                                                                                  1. 5

                                                                                                                                                    Game companies sometimes don’t want to keep old games alive, because if people are playing the old games they aren’t playing/buying the new ones, or they lose the opportunity to sell them again as cheap ports to new platforms (see Super Mario 3D All-Stars). I really hope that logic will not apply here!

                                                                                                                                                    1. 3

                                                                                                                                                      Since you need to buy the assets anyway, I absolutely can’t see how that logic may apply, ever. Every alternative platform engine is a net gain from this perspective, since it gives motivation to buy the game for its assets to people who would never buy it otherwise.

                                                                                                                                                      1. 3

                                                                                                                                                        Totally agree. Now please explain that Nintendo, Take-Two, Activision-Blizzard and friends, with mountains of dead fan ports in the backyard ;_;

                                                                                                                                                        In the recent decade even major strides of the modding community were killed off, even though modding is an “obvious” net-benefit to the game’s value as well…

                                                                                                                                                        1. 3

                                                                                                                                                          Someone buying the PC port for $10 (or, more likely, pirating it and throwing away the DRM’d executable) to play is probably unlikely to spend $30 or more on an official port to a next-generation system. id Bethesda Microsoft isn’t likely to see me buy their new port of Dooms 1 or 2 because I’ve spent so much time with free source ports and the asset files I bought for $3 a decade ago.

                                                                                                                                                          1. 2

                                                                                                                                                            Let’s take Mario 64 for instance. You already own it on N64, but your N64 is sitting unplugged on a shelf since it won’t even work on your new TV and your controller is broken. N64 decompilation project comes along, you paid for the assets 25 years ago, you can happily play the game on any platform you want. Why would you pay nintendo again for the switch emulator version? You already own the game, and you can now play it comfortably at 4K. Or you might be less inclined to buy a newly released game or console, since you are already busy playing these old games, with their new mods and all.

                                                                                                                                                            In a sense, videogame sales compete against other new games but also against all past existing games, unless those past games are unavailable due to system obsolescence.

                                                                                                                                                          2. 2

                                                                                                                                                            It’s a bit more complicated than that.

                                                                                                                                                            A successful game title is valuable intellectual property. It would be irresponsible towards the IP’s once and future owners to let part of it “get away” and maybe be used in a way that’s harmful to the parent company. Obviously it’s not a big deal for Rockstar is someone makes a lewd version of GTA (but see the Hot Coffee mod!) but for companies like Nintendo it’s unthinkable.

                                                                                                                                                            1. 1

                                                                                                                                                              I think that’s why GP wrote

                                                                                                                                                              IP shouldn’t be valued so highly

                                                                                                                                                              because, yeah, perhaps people shouldn’t have to care what Nintendo thinks after all.

                                                                                                                                                              1. 1

                                                                                                                                                                I was simply replying to the statement that “companies don’t free old games because they’d sell less new ones”.

                                                                                                                                                                I’m all for comprehensive IP reform personally, and hopefully efforts like this will work towards that.