Threads for JoachimSchipper

    1. 7

      I look forward to reading comments explaining why Gregory is wrong because the Python packaging system is the best of all possible worlds, and all you have to do is know exactly which five tools still work and how to use them.

      1. 15

        I don’t think I’ve read comments indicating Python packaging is good ever? This strawman doesn’t hold up to a slight breeze.

        1. 4

          I’ve heard it from Python fanatics myself.

          1. 4

            Okay? I’m not defending packaging being good. I’m saying the OP is essentially complaining that they haven’t kept up to date in their own working knowledge, which is true. distutils was deprecated three years ago.

            1. 3

              Leibniz wasn’t arguing the world was actually good. He was arguing that it couldn’t be any better given that God was constrained by certain axioms like logical consistency, the principle of sufficient reason, absolute divine simplicity, etc.

              Defending Python packaging isn’t always about saying it’s good. It’s more about “well it can’t be any better because X.” So below, Paul Moore says it can’t be any better because of lack of resources. Elsewhere you see people say it can’t be better because Python is trying to do a more complicated thing than other languages. In your comment, the argument is that to improve required a lot of churn, so it’s on Gregory for not keeping up with the churn.

              I wouldn’t have linked to it if the comment was by someone else, but it was too perfect to link to a comment by you as refutation of your point. :-P

              1. 5

                Except it’s not a refutation as I’ve never made the argument the status quo is good, nor that it can’t be improved. To improve is to require change. I do think most of their complaints are due to them not keeping up even a cursory amount. This is a language that has new major versions every year and we can see they’re putting more effort in the last few releases into cleaning up their house (like the removing dead batteries PEP as an example).

                1. 9

                  I think it’s at least a little absurd that python has had to iterate so much for so long to get where it is today. This is a solved problem for many other languages. While you’re not saying the situation is good, it does feel like a little bit of stockholm syndrome when you suggest it’s on the user for not keeping up with the packaging tool of the moment.

                  1. 4

                    That’s the thing, it’s not even tooling of the moment. These are standards that have been in the works for over 7 years now (the PEP for pyproject.toml and build infrastructure is from 2016!). We’re talking about the same tools that have been standard since Python 2 here, pip and setuptools.

                    I think it is somewhat on the user for using something deprecated for several years then being suddenly surprised when it’s finally removed with that much notice (setuptools added a warning in Oct ’21)

                    1. 5

                      pip and setuptools have certainly been around for a long time but so have lots of other things (pipenv and poetry?). I remember around ~2017 (?) the pip dependency resolver would still leave you with broken dependencies in some cases so those other tools were definitely serving a purpose.

                      pyproject.toml is from 2016, yes, but it definitely wasn’t usable from the get-go. distutils was only deprecated in 2020.

                      To be clear, I totally agree that pip and setuptools have been the Right Way for a long time. But I don’t think that has been the consensus in the community, and to someone whose day job is not python it can be hard to discover that.

                      1. 2

                        only deprecated in 2020.

                        I think three years is plenty of time to go from deprecation to removal.

                        But I don’t think that has been the consensus in the community

                        I’m not sure it’s the consensus even now and I still explore other options every few months to see if things have improved. It feels like Poetry is doing its own thing, Pipenv was only revived for name recognition and hasn’t improved, pip-tools have difficult to use workflows as things still boil down to requirements.txt in most cases. I learned about Hatch from this blogpost and haven’t explored it yet. I’m kind of excited for Ruff’s next idea of tackling this problem.

                        I really just want a unified, “this fits most use-cases” project manager that supports the library and application workflows, and makes good opinions according to practices of the day (like src/ layout packages, pyproject.toml, etc.)

      2. 7

        In that case, you may find this response from a CPython core dev & pip maintainer to the OP surprising.

        1. 12

          Interesting.

          I do think, though, that the biggest issue we have here is a lack of resources.

          I completely disagree with that. The biggest issue is lack of leadership. Someone needs to say that all of the million little flowers blossoming in the PyPA garden are actually weeds and begin the process of pulling them up. Start with the easy cases but have a road map to get to the hard cases. No more layering one tool on top of another. Start from the bottom and work up. Have a vision for what the whole thing looks like and move towards it in deliberate steps. It does not actually take a large team. If anything, the problem with PyPA is that too many people are working in opposing directions. You want there to be between 2 and 5 people working in the same direction. That’s more than enough!

          Really what needs to happen is GvR needs to care about packaging, come out of retirement long enough to declare someone the Package Sub-Emperor for the Indefinite Crisis Period and then go back to retirement. Alas.

          1. 6

            Really what needs to happen is GvR needs to care about packaging, come out of retirement long enough to declare someone the Package Sub-Emperor for the Indefinite Crisis Period and then go back to retirement. Alas.

            The traditional name for this was “BDFL-delegate” for a particular area. And for packaging there are two such delegates: Brett Cannon, and Paul Moore (who is the author of the post you were linked to).

            1. 3

              Two dictators? Bless.

              1. 9

                I’m not sure what your intent was with that comment, but the reason is that they’re both active in proposing things in the packaging world, which raises the question of what happens if a packaging PEP is proposed by the person whose job is to make acceptance decisions on packaging PEPs. So there are two people with delegated authority to approve packaging PEPs – Brett and Paul – and whenever one of them writes a PEP the other is the one who makes the decision on whether to approve it.

      3. 3

        I don’t think Python packaging is in a good place, but I think only people already using relatively complex or long-standing Python projects bump into it as a notable problem. If you start a new Python project as an apprentice, you use Poetry. If you start a new Python project as a journeyman, you might use Poetry for an application, or Hatch/PDM for an application or library because they respect pyproject.toml. If you’re a master craftsman or greybeard, you can do anything you want except the thing you were doing - this is where the quality of documentation seems to drop off of a cliff, and google-fu will paradoxically give you unintuitive results.

        1. 7

          I have started basic scripts in python as a project. And threw the towel when we tried to install the dependencies on s coworker machine and it never managed to install or find them.

          It is broken at every level. At this point we write bash. It is atrocious to read, near unmaintainable, cannot have tests and packages are non existent.

          But at least it runs.

          1. 4

            Were you using virtual environments?

            Every time I help someone on-board with Python for the first time I spend an uncomfortable amount of time talking to them about virtual environments, and telling them “if something doesn’t work, the first question to ask yourself is whether or not you remembered to ‘activate your virtual environment’”.

            I really wish that workflow was more closely baked into the standard path for using Python. Node.js has similar mechanisms but they’re part of the default “npm install” path so they seem to bite people a lot less.

            1. 3

              Node.js has similar mechanisms but they’re part of the default “npm install” path so they seem to bite people a lot less.

              They’re not - they’re baked into the default module resolution algorithm. The fact that the core Node runtime behaves like this, and has pretty much always behaved like this, is why there aren’t 5 competing venv mechanisms and why there isn’t a spectrum of how integrated the venv stuff is with package managers (npm/yarn/pnpm/etc.). It all Just Works™ if you write a new package manager.

              Python, on the other hand, has a legacy problem. Though I’m not sure it’s an insurmountable one.

            2. 3

              We tried. It was yelling at us. Couldn’t figure out how to debug a venv, how to consistently use the right one or make it interact with other commands.

              At this point, after a decade trying to make venv works, i consider any answer to a python problem that think that venv works or solve something to be irrelevant.

              The first step toward any solution for python packaging is to ditch venv. Design solutions that do not need them and actively kill them. They are a constant source of problems and users, universal ly, hate them.

              Virtual environment are a failed UX experiment. Time to accept reality.

              1. 4

                I’m convinced virtual environments are the right underlying implementation: being able to spin up separate projects with potentially conflicting dependencies feels right to me.

                The problem is how to provide a really clear user interface over the top.

                In particular, that UI needs to consider the unhappy paths. Venv on the happy path works just fine - but evidently the moment someone steps off that happy path (forgets to run one very specific commands at the right moment, renames a crucial folder, installs an upgraded Python version) things can get very difficult to figure out.

                This is a really hard design problem!

              2. 2

                I use venv every day. Vscode and pycharm find them. I debug into them both by stepping from my code into the venv packages and by setting breakpoints in the venv packages. I use pyenv with virtualenv wrapper to get seamless changes of venvs as I change directories in a shell. Pipenv and poetry both work sanely with venvs. I’ve been using venv like this since 2018.

                I bring this up as a counterpoint to your strong assertions. I understand a decade of difficulties with a technology can sour anyone. I have several technologies that I feel that way towards.

          2. 2

            A heresy that has worked for me in the past, and that may work for you: for scripts where you need a bit more than the standard library but a lot less than full PyPi, telling your coworkers which (e.g.) Ubuntu packages to install in a README.txt works.

            This strategy does limit you to Python packages which are fairly widely used (and thus packaged), and does mean you get updates when the distribution maintainers push out package updates (or, more realistically, when you move to a new version of your distribution.) Those are arguably the wrong tradeoffs for a SaaS application that the whole company is focused on, but that’s not the only case where one might want to use Python…

        2. 5

          I’m finding that you don’t need to use Poetry or Hatch or PDM to use pyproject.toml.

          https://til.simonwillison.net/python/pyproject

          1. 1

            I wouldn’t recommend someone start a new project without a package & dependency management tool like PDM (which supports lockfiles) or Hatch.

            1. 2

              Because of the need for lock files?

              Most of my new projects are more libraries than end-user applications, so I find range-based dependencies in pyproject.toml are fine.

              I haven’t actually selected a lockfile tool I like best yet - I’m still mostly using pip-tools for that but I’m ready to upgrade if there’s a clear single winner I can switch to: https://til.simonwillison.net/python/pip-tools

        3. 1

          What would you recommend for a Python webapp on NixOS? Asking for a friend, who spent a couple of weeks trying to wrangle those two things together with only partial success. I don’t know what was worse: the fragmentation, or the lack of tooling.

          1. 2

            I would use Poetry/Hatch, which work perfectly and as expected, if you set up your development environment with a Nix flake. I think you need to configure poetry with in-project venvs (poetry config virtualenvs.in-project true). e.g. https://gist.github.com/offsetcyan/fd9352a4d00d3c6c940187c8bf341452

            But on NixOS in particular, I understand the frustrations in figuring this kind of thing out.

    2. 8

      Does anyone have a strong feeling about this actually improving security? I’m seriously skeptical, but I haven’t given it a ton of thought. Any exploit devs care to comment?

      1. 8

        Well, the obvious answer is that it would lower the surface area a little bit, right? Instead of worrying about libc and syscall(2) you just worry about libc. Whether that improves security I don’t know but considering the resources OpenBSD has, it is one less thing to worry about or deal with.

      2. 7

        I feel the security benefits are theatre, but it enables better backwards compatibility in the long run - Windows and Solaris for example, have had good dynamic library based compatibility for years. Of course, OpenBSD doesn’t care about that part…

      3. 2

        If you can’t run software written in one of the most popular memory safe languages anymore, then maybe that would be bad for security.

        1. 7

          The Go port will be fixed as it was before. How do you draw the conclusion that Go isn’t going to be supported? You didn’t read the post.

        2. 1

          Right but I’m trying to understand if this mitigation would actually make exploitation of an existing vulnerability difficult. It feels like a mitigation without a threat model.

          1. 3

            Go read up on ROP and stack pivots, especially on amd64 and variable length instruction architectures that make it impossible to completely remove ROP gadgets. There are very clear threat models already defined based on arbitrary code execution, especially remotely. Reducing the syscall surface area as much as possible minimizes the success probability.

            1. 4

              I’m surprised no one has yet decided to use a separate stack for data, especially on x86-64 with more registers, a register parameter passing paradigm and a larger memory space, leave RSP for CALL/RET and use another segment of memory for the “stack frame”. That way, overwrites of the data stack won’t affect the return address stack at all. Given how fast 32-bit systems have largely disappeared on the Internet, I think such an approach would be easier (and faster) than all the address randomization, relinking, stack canaries, et. al.

              Or (as painful as this is for me to say), just stop using C! Has anyone managed to successfully exploit a program in Rust? Go?

              1. 4

                A similar feature is called “shadow stacks” - return addresses get pushed both to the standard stack and a separate return stack, and the addresses are checked to match in the function epilogue. Its supported in all the big C compilers. I can’t speak to how often it’s actually used.

                Further afield, Forth also exposes fully separate data and return stacks. So it’s been done.

                As far as performance goes, you’re losing an extra register for the other stack, which can be significant in some cases, and also memory locality. Cost varies but has been measured around 10%.

              2. 1

                In addition to the safe stack / shadow stack work, it’s worth pointing out SPARC. SPARC had a model of register windows arranged in a circle. You had 8 private registers, 8 shared with the caller and 8 shared with the callee (you could reuse any caller-shared one you weren’t using for return and all callee-shared ones between calls). The first S in SPARC stood for ‘scalable’ because the number of windows was not architecturally specified. When you ran out, you’d trap and spill the oldest one (you should do this asynchronously, but I don’t believe the implementations that typeid ever shipped). This meant that the register spill region had to be separate from the stack. This gave complete protection from stack buffer overflows turning into ROP gadgets.

                Pure software variants have been tricky to adopt because they’re incredibly ABI disruptive. Anything that creates stacks needs to allocate space. Anything that generates code needs to preserve an extra register (not just across calls but also when calling other functions). Anything that does stack unwinding needs to know about them.

                If you’re compiling everything together and static linking, it’s feasible.

                1. 1

                  I know about the register windows on the SPARC, but I never really dove into how it interacted with the operating system with regards to context switches (process or threads)—it seems like it could be expensive.

                  1. 1

                    Switching threads was very expensive. A large SPARC core could have 8+ register windows, so needed to save at least 64 registers. That’s fairly small in comparison with a modern vector extension, but still large.

                    On later superscalar designs, it actually wasn’t that bad. Modern processors allocate L1 lines on store, so spilling a full cache line is quite cheap. If you’re doing this often, you can even skip the store buffer and just write directly from registers to the cache line. I think switching from a thread required spilling all used register windows, but resuming a thread just required reading back the top and then the others could be faulted in later. SPARC had very lightweight traps for this kind of thing (and TLB fills - 32-bit SPARC had a software-managed TLB, though later SPARCs were spending 50% of total CPU time in that trap handler so they added some hardware assist with 64-bit versions).

                    I think the biggest mistake that SPARC made was making the register window spill synchronous. When you ran out of windows, you took a synchronous fault and spilled the oldest one(s). They should have made this fully asynchronous. Even on the microarchitectures of the early SPARCs, spilling could have reused unused cycles in the load-store pipeline. On newer ones with register renaming, you can shunt values directly from the rename unit to a spill FIFO and reduce rename register pressure. I think that’s what Rock did, but it was cancelled.

            2. 1

              ROP isn’t a statistical attack, so this talk of probability is confusing.

              1. 4

                Have a look at Blind ROP: https://en.wikipedia.org/wiki/Blind_return_oriented_programming

                When you don’t have complete information of the running program, these automated techniques will operate with a probability of success or failure.

                1. 1

                  But nothing about this mitigation is unknown or randomized further, as far as I can tell. I don’t see how brop is important here or how it would be impacted by this. Maybe the attacker needs to be a bit pickier with their gadgets?

                  1. 3

                    Any ROP technique needs to find and assemble gadgets. This would remove one possible type of gadget, making it harder to achieve arbitrary syscall execution especially in light of other mitigations like pledge(2) or pinsyscall(2).

      4. 1

        Assuming they do this to all interesting syscalls, it would make shellcode writing a bit more painful as now you actually have to deal with ASLR to find the libc versions. That said, ASLR isn’t a significant barrier in 99% of cases so its not going to combat anything targeted or skilled attackers. However it seems it would also disallow static linking libc, which is a huge con for such minor gain IMO.

        Disclaimer: Its been almost a decade since I’ve attacked an OpenBSD machine on the job, so there may be additional protections im not aware of that make this change a more valuable protection.

        1. 2

          FWIW, OpenBSD relinks libc (and OpenSSH, and the kernel) on each boot. So defeating ASLR on OpenBSD may require more than finding one offset.

    3. 1

      How is this any different than just buying preconfigured server rack(s) from any manufacturer that offers them (practically all)? Many will even install private cloud software from the factory (OpenStack, various propriety ones). Even if your prefferred manufacturer won’t, all major manufacturers will PXE boot out of the box/crate, which makes it easy to install/run whatever you want.

      1. 19

        I think Oxide would point out that you can do that, and then you’ll discover that there is a bizarre bug that causes rare 100ms stalls, and the server vendor will point at the SSD vendor will point at the OS vendor will point at the motherboard vendor will point at…

        Oxide instead owns and is responsible for the entire stack.

        (Bryan Cantrill and several Oxide folks are ex-Joyent, so they have a lot of stories on the “joy” of running a cloud on off-the-shelf hardware. [Update: you may be interested in this comment.])

    4. 1

      One valuable part of the paper is the “categories definition” in par. 4.1:

      • contribution flow
      • choose a task
      • talk to the community
      • build local workspace
      • deal with the code - “may include code conventions, descriptions of the source code, and guidelines on how to write code for the project”
      • submit the changes
    5. 3

      You don’t need something as high overhead as valgrind, there is a simple static analysis that will find all realloc-related bugs: grep your code for realloc, if you are using it anywhere then you are doing something wrong.

      1. 1

        Could you expand on your reasoning a bit? Do you mean “resources should be statically allocated / not oversubscribed”? Or are you reacting to “realloc(…, 0) is undefined (in C23)”, as discussed in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf and https://queue.acm.org/detail.cfm?id=3588242 ? Or…?

        1. 2

          At best, realloc has surprising performance characteristics (will it copy? Who knows, and it may change between libc versions). It has confusing semantics (comparing the result to detect reallocation is actually UB), so your compiler may be doing things you don’t expect. At worst, it nondeterministically invalidates some pointers your program. It also has a really badly designed API: if it fails, the old pointer is valid so you have to free it.

          To avoid UB, you have to treat realloc as a malloc, memcpy, free sequence. To avoid surprising performance anomalies, you should use malloc, memcpy, free sequences. If you want to avoid leaking memory, you have to keep the old pointer and do a little dance, which makes it more code to use correctly than a malloc, memcpy, free sequence.

          With some allocators, for some sizes, sometimes, realloc can improve performance (though often at the cost of greater fragmentation). For most modern high-performance allocators, it simply hides the fact that you’re doing a free and a copy. It is hard to use correctly and, even if you do manage to, it is rarely going to give the expected performance characteristics.

          If you see realloc in code, at best it is probably a source of performance problems, at worst it’s the reason your code is crashing.

    6. 3

      “Developers get in fights with people”, “docs are out-of-date”, and “the tool is someone’s masters’ thesis” are signs of poor stewardship.

      I think that last one is a shot at Nix.

      1. 1

        That wasn’t my first thought? Hillel is a formal methods expert, and there are a lot of formal methods tools that… work well enough to get the paper out the door. (Which is itself a valid choice by the authors!)

    7. 3

      I work on a trust store project at my job and it is complicated, annoying, easy to break and no area for big successes. X509 is just broken beyond repair.

      1. 4

        fractally broken, even. WebPKI is broken because PKCS#11 is broken, because X509 is broken, because DER is broken because ASN1 is broken.

        1. 5

          DER and ASN.1 aren’t actually too bad, as formats; in modern term, ASN.1 is basically protobuf plus a few extraneous types. (Both ASN.1 and protobuf are tag-length-value encodings with an external schema, with a variable-length encoding for the length and other integers.)

          The issue with ASN.1 is that most ASN.1-consuming software was written and effectively abandoned before a lot of modern practices for secure software development.

          The issue with X.509 is not really ASN.1, but that X.509’s semantics are very complicated, inconsistently implemented, and not always the best fit for the actual problem being solved. Encoding X.509 in protobuf or XML wouldn’t really solve the problem.

          1. 2

            Both ASN.1 and protobuf are tag-length-value encodings with an external schema

            … with the major difference that SEQUENCEs are not delimited with a length. both formats support extensions but ASN.1 requires you to explicitly make space for the extensions in the SEQUENCE while protobuf wants you to declare space in the tag ranges but will accept unknown fields anywhere.

            The complex semantics of X.509 are exacerbated by the schema-mandatory parsing of ASN.1.

    8. 9

      Only tangentially related, but… Running a social networking service for people in information security seems like it would be a total nightmare.

      1. 18

        Doesn’t seem too bad? You’re going to get some vulnerability reports, but at least the reporters are well-meaning professionals. E.g. hosting a cryptocurrency exchange would be a lot more exciting…

    9. 18

      It would be good in general if people were more aware of the political considerations of choosing a TLD and the dangers they might pose to a registration.

      I’ve seen people using .dev and .app a lot, it’s worth considering these are Google-controlled TLDs. What really rubbed me the wrong way about these TLDs is Google’s decision to make HSTS mandatory for the entire TLD, forcing HTTPS for any website using them. I’m sure some people will consider this a feature but for Google to arbitrarily impose this policy on an entire TLD felt off to me. No telling what they’ll do in the future.

      1. 12

        .app and .dev aren’t comparable to ccTLDs like .sh and .up, however. gTLDs like .app and .dev have to stick to ICANN policies; ccTLDs don’t, and you’re at the mercy of the registry and national law for the country in question.

        1. 11

          I was actually just discussing this fact with someone, but interestingly, we were discussing it as a positive, not a negative.

          All of the newTLDs are under ICANN’s dominion, and have to play by ICANN’s rules, so they don’t provide independence from ICANN’s influence. Whereas the CCTLDs are essentially unconditional handouts which ICANN can’t exert influence over. So there’s a tradeoff here depending on whom you distrust more: ICANN, or the specific country whose TLD you’ve chosen.

      2. 10

        HSTS preload for the entire TLD is brilliant idea, and I think every TLD going forward should have it.

        Defaulting to insecure HTTP URLs is a legacy problem that creates a hole in web’s security (it doesn’t matter whats on insecure-HTTP sites, their mere existence is an entry point for MITM attacks against browser traffic). TOFU HSTS is only a partial band-aid, and per-domain preload list is not scalable.

        1. 1

          Does HTTPS really count as TOFU? Every cert is ultimately checked against a known list of CAs.

          1. 4

            The Trust-On-First-Use aspect is that HSTS is remembered by the browser only after the browser has loaded the site once; this leaves first-time visitors willing to connect over unencrypted HTTP.

            (Well, except for the per-domain preload list mentioned by kornel.)

            1. 2

              Sure, but HSTS is strictly a hint that HTTPS is supported, and browsers should use that instead, right? There is no actual trust there, because the TLS certificate is still authenticated as normal.

              Compare this to SSH, which actually is TOFU in most cases.

              1. 3

                Not quite - HSTS prevents connection over plaintext HTTP and prevents users from creating exceptions to ignore invalid certificates. It does more than be a hint, it changes how the browser works for that domain going forward. The TOFU part is that it won’t apply to a user’s first connection - they could still connect over plaintext HTTP, which means that a suitably positioned attacker could respond on the server’s behalf with messages that don’t include the HSTS header (if the attacker is fast enough). This works even if the site itself isn’t serving anything over HTTP or redirects immediately to HTTPS.

                Calling it TOFU is admittedly a bit of a semantic stretch as I’m not sure what the specific act of trust is (arguably HSTS tells your browser to be less trustful), but the security properties are similar in that it only has the desired effect if the initial connection is trustworthy.

                1. 1

                  Okay, I see the point about first-time connections, but that wouldn’t change regardless of the presence or absence of HSTS. So why single that header out? It seems to me that having HSTS is strictly better than not having one.

                  1. 2

                    The discussion was about HSTS preload which avoids the first connection problem just explained by pre-populating HSTS enforcement settings for specific domains directly in the browser distribution, so there is no risk of that first connection hijack scenario because the browser acts as if it had already received the header even if it had never actually connected before.

                    Normally this is something you would opt-in to and request for your own domain after you registered it, if desired… but Google preloaded HSTS for the entire TLDs in question, so you don’t have the option to make the decision yourself. If you register a domain under that TLD then Chrome will effectively refuse to ever connect via http to anything under that domain (and to my knowledge every other major browser uses the preload list from Chrome.)

                    It’s this lack of choice that has some people upset, though it seems somewhat overblown, as Google was always very upfront that this was a requirement, so it shouldn’t have been a surprise to anyone. There is also some real concern that there’s a conflict of interest in Google’s being effectively in total control of both the TLDs and the preload list for all browsers.

                    1. 1

                      The discussion was about HSTS preload which avoids the first connection problem just explained by pre-populating HSTS enforcement settings for specific domains directly in the browser distribution, so there is no risk of that first connection hijack scenario because the browser acts as if it had already received the header even if it had never actually connected before

                      Ahh, THIS is the context I was missing here. In which case, @kornel’s original comment about this being a non-scalable bandaid solution is correct IMO. It’s a useful mitigation, but probably only Google could realistically do it like this.

                      I think the more annoying thing about .dev is that a bunch of local development dns systems like puma-dev and pow used .dev and then Google took it away and made us all change our dev environments.

                      1. 2

                        I think the more annoying thing about .dev is that a bunch of local development dns systems like puma-dev and pow used .dev and then Google took it away and made us all change our dev environments.

                        That seems unfortunate, but a not terribly surprising consequence of ignoring the names that were specifically reserved for this purpose and making up their own thing instead.

          2. 1

            I mean user typing “site.example.com” URL in their browser’s address bar. If the URL isn’t in the HSTS preload list, then it is assumed to be HTTP URL and HTTPS upgrade is like TOFU (the first use is vulnerable to HTTPS-stripping). There are also plenty of http:// links on the web that haven’t been changed to https://, because HTTP->HTTPS redirects keep them working seamlessly, but they’re also a weak link if not HSTS-ed.

      3. 5

        uh! I chose .app (unaware, stupid me) for a software project that discarded the Go toolchain for this very reason. Have to reconsider, thx!

      4. 3

        I have no idea where to even start to research this stuff. I use .dev in my websites but I didn’t know it was controlled by Google. I legitimately thought these all are controlled by some central entity.

        1. 2

          I have no idea where to even start to research this stuff.

          It is not really that hard. You can start with https://en.wikipedia.org/wiki/.dev

          If you are going to rent a property (domain names) for your www home and if you are going to let your content live in that home for many years it pays off to research this stuff about where you are renting the property from.

        2. 1

          .test is untainted.

          1. 6

            Huh? There’s no registrar for .test, it’s just for private test/debug domains.

    10. 3

      This is interesting, thanks for writing it!

      Some thoughts.

      (1) This is essentially fallible dynamic memory allocation (from an arena), right? You fail the function app::run() when said function requests more memory than was passed in.

      This seems to result in a system which reliably does not run out of memory, but for any particular input it is (consistent, but) hard to predict whether the system will render said input. To me, “static allocation of [memory] at startup” suggests making the opposite choice that you made; instead of

      The scenes we are going to render are fundamentally dynamically sized. They can contain arbitrary number of objects. So we can’t just statically allocate all the memory up-front.

      I’d expect the system to limit the number of objects, and guarantee success; or maybe app::run() can require the caller to pass in e.g. 12 bytes per triangle. (I think the only allocation in your program that isn’t just N times the number of triangles or whatever is the bounding volume hierarchy?)

      (2) You allow the user to configure the amount of memory at runtime. If you’d drop that requirement, you could just stack-allocate (a const generic amount of?) scratch space and bypass quite a bit of complexity. Indeed, stack-allocation seems fairly common in memory-limited (embedded) C?

      (3) I may be mistaken, but the bounding volume hierarchy construction does not appear to have an obvious maximum number of recursive calls (i.e. an obvious limit to the amount of stack space consumed)? Using an explicit stack may fit best with the “fallible dynamic memory allocation” strategy of the rest of this code?

      Again, thanks for writing this! And please do excuse me if I misunderstood something; I’m a little worried that there isn’t already a comment along the above lines here or in r/rust.

      1. 2

        1

        I think this fairly close, but note quite, see this comment.

        2

        Yeah, I’d say I used the word “static” completely wrong – this is definitely not static, as you can render bigger scenes on bigger machines. and smaller scenes on smaller machines. Allocating just everything in statics, if you can, is of course the best solution, but that does limit you to working with essentially constant amount of data, without the ability to scale down or up without a rebuild. Which is often fine!

        3

        The depth is guaranteed to be logarithmic in size, as at each step we divide triangles into equal halves. So, by the time you run out of stack, you run out of the heap as well (but yeah, “stack” is another thing which prevents truly reliable programs. In the user-space, there’s usually little guarantees about how much stack you have, and how much stack is required for various outside-of-your-control library functions).

        Traversal can be worst-case linear, and that’s one of the reasons its written in a non-recursive fashion. It also uses “constant amount of data” approach for the workqueue of items:

        Ideally, you’d allocate that from a scratch space and pass that in, but that requires extending the in_parallel API to allow some per-thread setup. This is not too hard, but I ran out of steam by that point :-)

        1. 1

          Thanks for your response!

          (1) Yes, I think we mean the same thing there - the C developer in me doesn’t expect an arena allocator to be type-safe, but just to hand out void *. ;-)

          (Sorry, I didn’t check for new comments posted after I started writing. Thanks for the reference!)

          (3) I see, thanks for the clarification and extra information! I agree that being logarithmic in the input is probably good enough in practice.

    11. 10

      I thought this was going to be along the lines of “What color is your function” but it’s really its own deal. I find the colors a little hard to keep straight. In the end, the author concludes that there’s a spectrum, which is part of why it’s hard to keep straight: the different kinds of markup all bleed into each other. Still this is a really interesting line of thought, and I’m interested in reading what’s next in the series.

      One thought is that things like JSX show that there’s interest in describing applications as XML-like trees of markup.

      Another thought is when you do the CSS for a webpage, there are basically two different categories of CSS: the content well vs everything else. For most of the page, you try to use semantic tags and style things with classes, so a call to action might have an H2 that gets styled as small text or whatever. But the content well is totally different, and now you can’t use classes and you’re stuck with handful of basic HTML rich text tags that you’re styling. It’s a pretty severe shift, and I think it definitely relates to switching from one “color” of markup to another at the boundary.

      1. 5

        Thanks for the kind words :)

        I agree the colors are slippery. I had a hard time making a fairly abstract discussion a little more visual/tangible. This was like draft 5 of this color idea, and this color idea was itself part of the third broad framework for approaching the whole discussion. I had to make peace with shipping it, but it’d be satisfying to see someone synthesize it better.

        When I outlined the posts in this framework back in Spring, I thought it would be the 5th post (covering just 3 colors–and structurally hewing much closer to “What color is your function”) in a series that started with https://lobste.rs/s/rgqg1v/what_functions_why_functions. Alas, writing the series and the first few drafts of this post made me want to call out and discuss more than just the first 3 types (and restructure the post because I think I stretched the comparison too thin).

        The main reason I’ve held on to the color conceit despite this trouble is that I can imagine highlighting tools that help humans ~see these distinctions in their markup, which feels like a good first step to drilling down on problems with reusing something like a documentation corpus.

        I agree that describing user interfaces with XML-like markup has been fruitful despite the separation-of-concerns problems created by the specifics of how we’ve stumbled into it (at least in the web stack).

        1. 4

          … in the web stack.

          I, too, thought the article was interesting.

          I do hope you’ll also discuss other ecosystems; e.g. reStructuredText can be used as a “cooler” (and extensible) Markdown, and LaTeX has a separate document class for major presentation… plus a mix of pure-layout and nearly-pure-structure commands (e.g. \textbf vs. \label). Something like Photoshop is quite far towards a “warm” extreme (but keeping e.g. layers separate means that it’s not as “warm” as e.g. Paint). Etc.

          1. 4

            I dithered a bit about how many examples I needed to annotate to illustrate the concept, but the next few on my list are reST, mdoc, and LaTeX. (Though my LaTeX knowledge is mainly looking in from the outside; would you be open to a DM for thoughts or feedback at some point?)

            Interesting point about Photoshop. I’ve mulled both imperative local style and labeled style approaches in Word and InDesign, but I guess my text/documentation-centric focus has shaped how I have been extending these ideas out into multiple media without really reflecting on more traditional graphic design/illustration/etc. (even though they’re obviously germane).

            1. 6

              LaTeX is a bit weird because it’s not really a distinct thing. TeX is a Turing-complete imperative language that is designed for producing typeset output. LaTeX is an attempt to build an eDSL on top of TeX that gives semantic markup. There are several problems derived from the fact that TeX is a programming language, not a markup language. They also have exciting scoping rules, where some things affect the current ambient state, rather than describing their arguments. You can write something in italics as either \textit{this is italic} or {\itshape this is italic}. The definitions of the markup language are inline and can be interleaved with the programming language and so you get various things that you might call red markup in your taxonomy but are actually infrared (or something): they’re not markup describing how something appears, they’re commands describing how to make something appear a specific way. These are interleaved with the clean semantic markup.

              This means that it’s impossible to parse LaTeX as a markup language, in the general case. For example, if you see \foo{bar}, this might be:

              • Semantic markup saying that bar has some property.
              • Presentation markup saying that bar should be typeset in some way.
              • A marker that you should start a bar-typed region that requires some explicit markup to terminate it.
              • A macro that redefines another macro that is used later on.
              • Something that invokes some arbitrary code.

              It’s possible to be disciplined and separate your yellow and blue markup from other things in LaTeX. I’ve done this in a couple of books, where the only LaTeX macros that I use in the .tex files for each chapter describe semantics (e.g. this is a heading, this is a keyword, this is an abbreviation and its expansion, this is a code listing pulling in lines n to m from file x). Doing this let me also parse my markup with something that generated semantic XHTML markup for ePub versions. That isn’t really writing LaTeX though, it’s writing a custom markup language in TeX that happens to include a subset of LaTeX.

              1. 2

                Great context. This sounds hard-won :)

                There’s a lot of “you can do this with existing languages if you’re willing to make up a custom language, use just a subset of an existing one, or both” in this space.

                The definitions of the markup language are inline and can be interleaved with the programming language and so you get various things that you might call red markup in your taxonomy but are actually infrared (or something): they’re not markup describing how something appears, they’re commands describing how to make something appear a specific way. These are interleaved with the clean semantic markup.

                I guess this corresponds with the “procedural” type in the Coombs et al taxonomy. I’ll have to chew on whether to fold it in or call it out-of-scope.

                1. 3

                  I wish SILE had more traction. It implements a load of the algorithms from the TeX papers for typesetting (including the ones that the authors of TeX wanted to implement but which required more than 1 MiB of RAM and so were totally infeasible back then). It has a much cleaner separation here. It consumes some markup (this is pluggable and it has a couple of options out of the box), which is a pure markup language. It then uses Lua for defining how these are mapped to typesetting concepts. Both the input and output are better than TeX, but LaTeX has had decades of people building useful features on top. TikZ is my favourite example here: it’s a graphics package built entirely in TeX, which is a completely ludicrous substrate for it, but it’s so useful. Porting it over to SILE would be almost as much work as writing SILE was in the first place (probably less work than implementing it in TeX on the first place, but that’s sunk cost).

            2. 3

              would you be open to a DM for thoughts or feedback at some point?

              I’m afraid that wouldn’t be a very good idea, since I’m rather intermittently available for the next couple of weeks (months?). I’m sorry - I do very much want to read what you have to say, but I can’t realistically commit to anything. Sorry!

              1. 3

                No worries–exactly the kind of reason I asked ahead for :)

    12. 5

      I’ve been wanting to write publicly for years, but haven’t gotten around to it until now.

      Independent of the content, I would greatly appreciate any feedback on writing style, voice, the included diagrams, etc!

      1. 2

        Congratulations! I think you make your point well.

        Since you requested some feedback: while reading the piece, I did notice that you’re sometimes spending more sentences than necessary, sometimes on discussing what you’re not discussing.

        • E.g. in the section “Design”, you might consider mostly deleting the first three paragraphs and the image, leaving only “when working with striped development, I often like to think of it as a three-step process: (…) begun to stabilize”.
        • E.g. in the section “Implementation”, you could simply delete “As this isn’t meant to be a post about how to write code, there isn’t all that much to say about the implementation phase.”

        Nonetheless, I do - again - think you make your point well, and I hope you continue to submit relevant posts here. ;-)

        1. 2

          Thank you! This is exactly the type of feedback I’m looking for. Greatly appreciated.

      2. 2

        FYI, on smaller screens, the top and bottom halves of the side navigation crash into each other and cause a layout issue.

        1. 1

          Interesting, thank you. I checked it on mobile but not a small display. I’ll have to fix this tonight!

          Edit: top half of the sidebar has been disabled until I can fix this, thanks again!

      3. 2

        I especially like the “Stripes are completed left to right” diagram. It makes it really clear what this is all about.

    13. 4

      The first part of the post up to Maud reads to me like more condemnation of Markdown as a format unsuitable for any real content. If you want definition lists, and not some garbage front matter that isn’t standardized across implementations or codified in the spec and doesn’t render properly by the basic parsing tools, then don’t use Markdown—and you’ll get more features like figures, admonitions, detail/summary/collapsible syntax, blockquotes with attribution, file imports, section numbers, table of contents, ya know, stuff for actually posting rich text content for books and blogs. A lot of the hurdles encountered could have been circumvented for a long time with Asciidoctor alone. Slap on a standalone static binary tool like Soupault and you can do a lot of ‘dynamic’-looking things by pre/post-processing and augmenting that output by parsing metadata that already exists in elements on the page without resorting to all of this custom frontstuff, custom elements, shortcodes, et. al. syntax hacks that lock you into specific static site generators and breaking the lightweight markup spec.

      I’m not calling out the author specifically for having gone down this route—I had too—but we as a community need to start choosing the right tool for the job and Markdown ain’t it.

      But in the end the post endorses Nix in lieu of more complex tools (Kubernetes, Hereby/Dokku), so it’s okay.

      1. 3

        I mean, markdown is suitable for “real content”, but I needed to extend it as things got more complicated. I assume I’d have to do this with any other template language too.

        1. 4

          I do think it’s a valid criticism of Markdown that it doesn’t have clean, well-defined extension points. E.g. reStructuredText / LaTeX / HTML all allow you to add some additional tag-like things without leaving the format, to an extent that Markdown doesn’t.

        2. 1

          It always needs extensions once you try to write a blog or book… or even a semifancy README. The base language does not give you enough for these tasks and the extensions collide with other implementations pretty quickly because of how Spartan the basics are.

      2. 2

        I’ve been flailing a bit on it as a ~project, but I’ve been trying to write about this rough part of the map lately. I may poke some of the people in this subthread when I get the next post sorted…

        Resisting the urge to vomit much context here, but a little: over the past ~decade, I’ve had multiple projects drag me through the pain of trying to automate things that deal with soft/idiomatic use of presentational markup.

        Last year I started prototyping a ~toolchain for documentation single-sourcing on top of a very flexible markup language I’d wanted to try out for a while, D★Mark.

        Both building it and figuring out how to write about it have made me wrestle with markup a lot. I have a nagging self-doubt that they’re all trite observations that everyone else knows, but it’s helped me make sense of a lot of things that felt unrelated. TLDR: everything sucks (at least, outside of narrow use-cases) but I have more empathy as I get a better handle on why :)

      3. 1

        I use a custom markup language that is an unholy mix of Markdown, Org Mode and my own custom quirks but the messages aren’t stored in that format, but in the rendered HTML. That way, I’m not stuck with any one Markdown (or other) format. And it’s from the HTML format that I re-render the output for gopher and Gemini.

        1. 2

          To each his own at the end of the day, but sticking to a spec takes a lot of burden off of yourself and lets you participate in a larger community. I’m curious what was missing from Org Mod for you? It seems it lacks some of the features of AsciiDoc or reStructuredText.

          1. 2

            My main point was not to write custom markup languages, but rather to use whatever markup language you want, but don’t store the blog post in said markup language, but the final HTML output.

            As to your question, my custom markup was inspired by OrgMode [1] (for the block level elements) and Markdown [2] (more for the inline elements). And I wasn’t trying to use a format to replace HTML, just make it easier to write HTML [3].

            I am also able to gear my custom markup to what I write. I know of no other markup language (admittedly, I don’t know many) that allow one to mark up acronyms. Or email. The table format I came up with is (in my opinion) lightweight enough to be tolerable to generate an HTML table (again, just to support what I need). I’ve also included some bits from TeX (using two grave accents (ASCII 96) to get real double quotes for example), as well as other characters (like ‘(C)’ will turn into the Unicode Copyright character). And a way to redact information (as can be seen here).

            Standards are good but they fall short in supporting what I need. And there is no way I would want to force my blogging quirks onto other people.

            [1] I don’t use Emacs, nor any other tool that support it.

            [2] There are too many different Markdown standards to pick just one.

            [3] Which is an element I think many people miss about Markdown—it wasn’t developed to replace HTML, but to make it easier for John Gruber to write his blog posts, which is why you can type HTML in his version of Markdown.

        2. 1

          Interesting! Do you edit the HTML directly or re-render it after changes?

          1. 1

            If I need to update an entry after it’s posted, I will edit the HTML directly.

    14. 3

      Ok (if often-repeated) point, bad title.

      What I’d personally find interesting to read about is applying such concepts to scenarios that are not “we run large web services on some cloud platform”, which seems to be the more or less implicit assumption most of the time.

      1. 2

        Something like logging commands / requests to a (shared-memory?) ring buffer, and dumping those to disk on a crash?

        Systems like that definitely exist. I like - partly for other reasons - apenwarr’s writeup; the part relevant to this discussion starts at “Userspace and kernel messages, in a single stream”. (Note that that particular implementation is for embedded Linux; you may need to adapt it for non-embedded *nix or non-Linux embedded systems.)

      2. 1

        People are usually not a fan of software collecting information about itself and sending it off to some centralized collection system (aka telemetry). Distributed metrics are a complex problem, but mostly in social sense, rather than technical one.

    15. 8

      but since I still wanted to maintain “clean code”, I decided to make a macro instead of the function

      What compiler are you using that doesn’t have an always-inline attribute that you can stick on a function to get the same overhead as a macro, without the ugliness? For that matter, what compiler doesn’t inline a function that’s likely to be lowered to six instructions?

      shortStatus.faults |= FAULT_BIT; longStatus.faults = shortStatus.faults; …but from looking at the PIC24 assembly code, that’s much larger

      I presume that the status here is that these globals (MMIO objects?) are volatile and so you can’t keep an intermediate in a temporary. The following would likely be smaller:

      int tmp = shortStatus.faults | FAULT_BIT;
      shortStatus.faults = tmp;
      longStatus.faults = tmp;
      

      But, I was quickly reminded that the PIC architecture doesn’t support passing constant string data due to “reasons”. (Harvard architecture, for those who know.)

      That is not a sufficient reason. According to the article, the following works:

      const char *msg = "This is my message";
      LCDWriteDataString (0, 0, msg);
      

      But this doesn’t:

      LCDWriteDataString (0, 0, "This is my message.");
      

      This sounds very odd because the assignment to the temporary shouldn’t affect code generation at all. I suspect that this is a compiler bug, not an artefact of the target. My guess, from the following line, is that the code memory is ROM, the data memory is RAM, and they are in separate address spaces. In addition, the compiler is not fully C compliant and so uses a char* that is not large enough to store a pointer to any object (as required by the spec). Generally, systems like this provide an address-space attribute. You would be able to pass a string as a function parameter if you annotated the parameter to specify the address space as the ROM segment.

      The above routine maintains a static character buffer of 3 bytes. Two for the HEX digits, and the third for a NIL terminator (0). I chose to do it this way rather than having the user pass in a buffer pointer since the more parameters you pass

      Again, an always-inline function would solve this problem.

      It sounds as if the compiler that the author is using is absolutely terrible. I’ve written a generic C++ format-string implementation that takes the format sting as a template parameter and does formatting and clang happily lowers it to something equivalent to this custom implementation. There was a PIC back end for LLVM a while ago, I’ve no idea what it’s status is, but in general if you have to fight your compiler this much then you should consider getting a better compiler. The kind of optimisations required here to generate code this good are the kind that you learn in an undergraduate compiler course. Just inlining and common subexpression elimination would have been sufficient.

      TL;DR: If you can make your code better by applying a trivial transform that doesn’t rely on any extrinsic knowledge, then your compiler should be able to make everyone’s code better by either applying the same transform everywhere or by applying the transform where instructed by annotations.

      1. 5

        What compiler are you using…

        Buried in the article:

        …using the CCS compiler tools

        The CCS compiler is almost C89/C90 compliant and if you check out the manual (p. 147) there is an #inline preprocessor directive to mark a function as inline. Whether it works in this case, I don’t know. It sounds like it should, but these old, embedded target compilers can be odd beasts.

        It’s also possible the author doesn’t know about #inline (or won’t use it) because it’s not in the standard. But it does seem like it should work in this case.

        And having to pass non-constant strings doesn’t make sense to me, unless the compiler is doing something very strange about memory placement and pointer types, as you mention.

        1. 3

          I’ve never used a PIC24, but there are definitely architectures where you have separate “load from ROM” and “load from RAM” instructions, which both take a word-sized pointer… and this is sometimes mapped into a language with C-like syntax by having (const char *)0x24 and (char *)0x24 point to different memory.

          (The alternative being to keep a tag byte for each pointer, telling you to which memory space it points. Which lets you use actual C, but which does not map to the underlying hardware very efficiently.)

        2. 1

          And having to pass non-constant strings doesn’t make sense to me, unless the compiler is doing something very strange about memory placement and pointer types, as you mention.

          I don’t know about the PIC architecture specifically, but on a lot of Harvard architecture systems you have different pointer sizes for the different address spaces. My guess is that PIC has 24-bit ROM pointers and 16-bit RAM pointers and places code and constants in ROM. Taking the address of a constant string gives a 24-bit pointer but, for some reason, they’ve decided that char* is a 16-bit pointer.

          The manual (page 44) talks about a rom qualifier that forces data to be placed in ROM and talks about rom pointers, so I believe the correct fix here would simply be to add the rom qualifier to the const char* parameter and then you’d be able to pass constant strings (but only constant strings) as arguments to the function.

          I was a bit surprised that printf in their examples can take a constant string given this blog post, but it appears (page 52) as if they support overloaded functions and so there’s a version of printf that takes a const char* and none that takes a rom const char *. Interestingly, only the ROM version is allowed to actually be a format string, which makes me suspect that the compiler is already generating a specialised version of the code for printf, but not for sprintf, motivating this blog post to start with.

          The fact that the language provides overloading is very interesting because it may mean that you can write a C++-style formatting implementation, though the lack of a tuple type or C11’s _Generic may mean that it isn’t possible.

    16. 2

      am i the only one thinking “or, you could just get more reliable networking gear” ?

      1. 2

        That would be half the fun and double the price and totally not worth it if you have just one uplink anyways.

        1. 2

          Yeah, the most likely scenarios for a residential house are:

          • power loss to the building (you might have a UPS, but you are unlikely to have an autostarting generator with an automatic transfer switch)

          • upstream ISP loss (fiber is pretty reliable, but a truck or a backhoe can happen to anyone)

          • power supply failure on a machine with only one power supply (buy more expensive hardware and probably lose some efficiency)

          In the last 20 years, I have experienced all of these – mostly while I was at home to fix the things that were in my power.

      2. 2

        I’m fascinated by these Stapelberg posts, but yes, not doing any of that tends to be the easier path.

        Note that this is all in support of

        For the guest WiFi at an event that eventually fell through, we wanted to tunnel all the traffic through my internet connection via my home router. Because the event is located in another country, many hours of travel away, (…)

        … where one might also consider, say, not tunneling all guest WiFi traffic through a home router “hours of travel away”. Or having a fall-over scenario to some gateway at a suitable hosting location.

        1. 4

          Oh, I also had a fail-over scenario prepared with another gateway on a dedicated server in Germany.

          But, tunneling through a residential connection is preferable for residential use-cases like this one :)

    17. 0

      I enjoy building and using git from source. It’s supper easy to setup (make install) and i get to use latest features, sometimes during development. Strongly recommend it.

      1. 5

        The author can’t use SSH signing because OpenSSH is too old.

        1. 1

          Ah i guess i was trying to recommend folks to ‘live at HEAD’ but failed in writing that out. 🥲

    18. 12

      This looks like a sensible set of features:

      The consexpr thing resolves an annoyance that C actually has a very well specified notion of a constant expression. This requires, for example, clang to have a constant expression evaluation engine in the front end that does a subset of things that could be done later in the optimisation pipeline, so that it can provide errors later on (C++ also requires it for constants in templates because the front end does need to handle these, but it’s necessary for C even without templates). Until now, the only way of reusing a constant expression was to define a macro with it, at which point the compiler ends up evaluating it at every instantiation point. Now you can assign it to an identifier. This has been possible in C++ via template hackery since C++98 and cleanly since C++11.

      The compound literals thing is a nice cleanup, I hope the C++ version is also adopted. Compound literals in C++ could do with a bit more work.

      Every use I’ve had for something like #emded has ended up being better served by something else but I’ve seen other places where it’s valuable, so good to see it in the standard.

      __VA_OPT__ is fine, I guess, but GCC has had a work-around for this limitation for over 20 years and I don’t really see myself changing from , ## to __VA_OPT__(,) any time soon.

      Allowing (...) is interesting. C originally passed all arguments on the stack and didn’t have prototypes, so every call was effectively variadic (in your implementation, you provided the set of types for your formal parameters but you could add variadic ones by taking the address of the last one subtracting the size of it to get at the next one. This was originally wrapped up in some horrible macros. With C++89 and function prototypes, the standard version of variadics made it very hard to implement in macros and so most compilers added builtins for handling the variadics. More complex calling conventions meant that the compiler had to get involved. Once you drop the need to implement variadics purely in macros, you don’t need the first argument (which exists because otherwise you can’t get the address of the start of the argument frame).

      All of that said, in C++ variadic functions are regarded as legacy crap. If you want a variadic interface then you define a variadic template as an inline function. You can then parcel the arguments up into some sensible type such as an array or a tuple and forward it to a function that does something with the arguments. This lets you capture type information and allows you to write type-safe variadics. C-style variadics are basically on the list of things that you should never use, because they are so easy to get wrong and introduce stack corruption. Anything that makes people more likely to write variadic functions in C makes me a bit sad.

      The every-enum-is-an-int thing has annoyed me since I first discovered it over 20 years ago. Nice to see a fix here. C++ requires that you explicitly declare the enum as a wider type. The auto-widening thing makes me a bit nervous because it makes it harder to guarantee stable ABIs that take enums. If I expose a function that takes an enum and the values all fit in an int, the function will take an int. If I add a new 33-bit value to that enum then the argument type for that function will change in the ABI and I’ve now broken any code that calls that function. In C++, at least, I get a warning and have to explicitly change the type to enum class Thing : uint64_t or similar and since I’ve explicitly changed the width of the type then I know that I’ve broken it. Oh, and because I have operator overloading and using I can have both the old and new definitions coexist in my library and ship the narrower version as a wrapper around the version that uses the wider one.

      Fortunately, from the next section, C has now also got support for explicit types for unions and so I can probably get a compiler warning if I do the dangerous thing. Probably should be enabled by default for any enum declared in a header. It’s a shame when a language adds a feature that immediately makes me want a compiler warning in case I use it though.

      The qualifier-preserving standard functions make me unreasonably happy. One of the first things that I tried to do in CHERI C was enforce const. Unfortunately, this was monotonic: CHERI doesn’t give you a mechanism for adding permissions (by design) and so you can’t have the compiler implicitly cast something to const and then have you rip it off and things like memchr in the standard library required explicitly this: they took a const pointer and returned a non-const pointer derived from it. This broke. Hard. I ended up adding __input and __output qualifiers to mean ‘read / write-only and I really mean it’. I should be able to add these in _Generic macros, so the __input qualifier is preserved nicely.

      The explanation of nullptr is exactly the thing that’s bitten me and one of the reasons that I hate variadics. Good to see, shame it wasn’t in C11.

      stdbit.h has been a long time coming, but it’s great that it’s there now. I have a non-portable header that implements wrappers around compiler intrinsics for these for clang / gcc / MSVC, I’m looking forward to throwing it away. On the stdc_ prefix, I have two comments:

      • I wish they had a consistent prefix on all of the standard functions.
      • I hope the C++ version drops the prefix and puts them all in the std namespace.

      auto in C++ is useful and I used GCC’s __auto_type a lot back before I gave up on C.

      I think memset_explicit came from OpenBSD. It’s surprisingly hard to implement without compiler support or inline assembly, so nice to see it in the standard.

      [u]intmax_t is one of those things that was obviously never a good idea, from the time that it was introduced. Given that 32-bit platforms implemented 64-bit integers via software paths, and some 16-bit architectures grew backwards-compatible extensions for 32- and 64-bit arithmetic, it was obvious that future versions of a platform would gain wider types than the current ones supported. I wish C23 had done the right thing here and deprecated these types. I have never seen any code that uses them and is correct, though I have seen them used quite a bit.

      All in all, it makes C a marginally less bad language, but there’s nothing in there that makes me want to start writing C again. It still feels like a crippled version of the worst bits of C++.

      1. 2

        I think memset_explicit came from OpenBSD. It’s surprisingly hard to implement without compiler support or inline assembly, so nice to see it in the standard.

        Why would it be hard to implement? It needs an optimisation barrier; translation unit boundaries usually act as such, unless you lto—and who statically links libc? Alternately, I’ve commonly seen it implemented as follows: void *(*volatile mmemset)(...) = memset; mmemset(...).

        1. 6

          It needs an optimisation barrier; translation unit boundaries usually act as such, unless you lto—and who statically links libc?

          That is one of the problems, yes. Lots of folks enable LTO with an LTO-built libc, and suddenly the compiler can see what’s happening. As far as I know, glibc and macOS libc are the only mainstream libcs that don’t support static linking (and Apple does, I believe, use thinLTO of their libc for app store builds, so the compiler can look inside libc for analysis, even if it doesn’t inline much).

          But that isn’t the biggest problem. It’s commonly used before free and the compiler will helpfully realise that it’s safe to elide any stores to an object just before free because any subsequent load is UB and the stores are therefore dead.

          Alternately, I’ve commonly seen it implemented as follows: void *(*volatile mmemset)(…) = memset; mmemset(…).

          Did you look at the generated code? Testing that in clang, the IR that it generates is identical to a normal memset. It expands to a memset intrinsic without the volatile flag. If this appears before a free, the compiler is at liberty to elide the memset, because it knows the semantics of memset as defined by the C standard and it knows that it is UB for any reads afterwards. There is no happens-before edge established via an atomic operation and so the stores are also not visible to another thread.

          I’ve seen a lot of folks try to implement functions like this and find that at least one compiler that they’re using will completely elide their security properties. I wrote a paper about this a few years back.

          1. 1

            Did you look at the generated code? Testing that in clang, the IR that it generates is identical to a normal memset. It expands to a memset intrinsic without the volatile flag. If this appears before a free, the compiler is at liberty to elide the memset, because it knows the semantics of memset as defined by the C standard and it knows that it is UB for any reads afterwards. There is no happens-before edge established via an atomic operation and so the stores are also not visible to another thread.

            How did you test? I do not get those results. The volatile qualifier means that the compiler does not get to constant-propagate from the definition of mmemset to its use, so at the latter point, not knowing which function is pointed to, it must assume that that function might have arbitrary side effects.

            Here is a godbolt link demonstrating this.

            1. 1

              I tested it locally with a different version of clang, but I had the assignment in the global scope. Clang happily lowered it to a non-volatile LLVM memset intrinsic. It didn’t, in this case, elide the memset even without the volatile qualifier, but there’s no guarantee that it won’t. LLVM has changed its interpretation of what volatile means a few times. There’s still ongoing discussion of whether it’s safe for the compiler to elide volatile stores if it can prove that the memory region is non-side-effecting and that the stores are not observable within the LLVM (or C) abstract machines. In general, using volatile for anything other than MMIO is relying on the compiler’s interpretation of underspecified bits of C and is very dangerous if you’re doing it for security.

              1. 1

                I don’t quite follow. There was a situation in which mmemset was not volatile, but the call to it was not optimised away? I do not observe such behaviour. Was this with your ‘different version of clang’?

                There’s still ongoing discussion of whether it’s safe for the compiler to elide volatile stores if it can prove that the memory region is non-side-effecting and that the stores are not observable within the LLVM (or C) abstract machines

                Interesting. How is it determined whether a memory region is ‘side-effecting’? As far as I know, the c and llvm abstract models make no explicit allowance for, for instance, char *text = (char*)0xb8000; but if I can’t write that and have it work ‘correctly’, I’ll sue.

                1. 2

                  I don’t quite follow. There was a situation in which mmemset was not volatile, but the call to it was not optimised away? I do not observe such behaviour. Was this with your ‘different version of clang’?

                  I ran this locally with Apple clang something or other (whatever was on the Mac I had in front of me last week) and looked at the generated IR. With and without the volatile cast, it generated an LLVM memset intrinsic call, with the volatile parameter for that call set to 0 (false).

                  Interesting. How is it determined whether a memory region is ‘side-effecting’? As far as I know, the c and llvm abstract models make no explicit allowance for, for instance, char text = (char)0xb8000; but if I can’t write that and have it work ‘correctly’, I’ll sue.

                  Automatic storage locations are created by the compiler. In theory, you could put your stack in an MMIO region, but if an automatic storage variable’s address is does not escape a function then the C standard does not provide any mechanism by which a store to that location is visible elsewhere, within the C abstract machine. This means that even a volatile store to a variable with automatic storage location may (the standard is not explicit either way) trigger an as-if rule that lets you elide it: You may not elide stores of volatile variables, but if you can prove that the store is not visible then the version of the program that elides is behaves as-if it were the version that did not.

                  This is really the root problem for trying to implement most of the things in this space. They’re intending to prevent vulnerabilities that exist outside of the C abstract machine, from within the C abstract machine itself. In the C abstract machine, once an automatic-storage variable goes out of scope or once a heap allocation is passed to free, then it is gone. Any access to it is undefined behaviour. Any store to it immediately prior to its deallocation that is not accompanied by something that establishes a happens-before edge that guarantees visibility in another thread and also a happens-before edge backwards that guarantees that the load in the other thread completes before the dealloaction is not observable within the abstract machine. In contrast, it often is observable in a concrete lowering of the abstract machine to mainstream hardware and so can leak secrets. You need the compiler to be aware of things beyond the abstract machine during mid-level optimisations to be able to guarantee the correct semantics.

                  My favourite thing from our paper was an idiom in OpenSSL that tried to do constant-time conditionals via xor. It turns out that multiple versions of GCC were clever enough to recognise this and convert it into a conditional in their mid-level IR and then, depending on the target and the optimisation level, either turn it into a conditional move (fine) or a branch (not fine). C doesn’t have any notion of time and so can’t express a notion of constant time and so the compiler didn’t feel any need to preserve a property that couldn’t be expressed in the source-language abstract machine at all.

                  1. 1

                    With and without the volatile cast, it generated an LLVM memset intrinsic call, with the volatile parameter for that call set to 0 (false)

                    I think you misread what I wrote. I did not cast anything. I created a volatile pointer to function, set it to point to memset, and then called the function. But the type of the function pointed to is exactly the type of memset. The goal was not to have a memset intrinsic call with a volatile argument; the goal was to generate a call to an unknown—and potentially side-effecting—function which just happens to be memset.

                    Automatic storage locations

                    Huh. I guess the idea is to optimise cases when variables are qualified ‘volatile’ only for the purpose of setjmp/longjmp? Seems somewhat marginal, but. (And presumably it only applies when the location is not aliased—so e.g. signals work fine?)

                    My favourite thing from our paper was an idiom in OpenSSL that tried to do constant-time conditionals via xor. It turns out that multiple versions of GCC were clever enough to recognise this and convert it into a conditional in their mid-level IR and then, depending on the target and the optimisation level, either turn it into a conditional move (fine) or a branch (not fine)

                    Fun. I once caught gcc/clang generating branches in a tight loop where there should have been conditional moves; rewrote it in assembly and got a nice speedup.

                    1. 2

                      Huh. I guess the idea is to optimise cases when variables are qualified ‘volatile’ only for the purpose of setjmp/longjmp? Seems somewhat marginal, but. (And presumably it only applies when the location is not aliased—so e.g. signals work fine?)

                      It’s also to handle cases where you have a C++ template that takes a volatile pointer so that it can be used with MMIO, but where you can also instantiate it with normal memory. I forget the codebase that this came up in (it was someone’s in-house thing) but apparently they got a big end-to-end win from the inlined versions of the template that were operating on stack memory being able to do non-volatile stores and allow the compiler to elide a load of them on a hot path. I’m not sure why they couldn’t pick up the volatile qualifier from the template parameter, but possibly it was in C++ generated from some other language and difficult to change. It was interesting to me because it’s an under-specified area of the language. The general consensus on volatile from WG14 in recent years has been ‘if you’re using it for anything other than MMIO, you’re probably using it wrong’ and so I’m deeply sceptical that any approach that involves using volatile on normal memory and expecting the compiler to interpret the standard in the same way as you will work.

      2. 1

        Responding to one specific point: explicit_memset() indeed appears to be inspired by OpenBSD’s explicit_bzero().

        The described changes are pretty nice, I think!

    19. 1

      This is great! My one suggestion would be to host the python reference code on a site where people can read it in their browser, instead of sending a tarball… especially because it’s so short, concise, and pretty.

      1. 2

        That’s actually a neat suggestion. I’ll see about distributing it online in addition to the tarball. I’ll also need to find a way to remove that pesky licence code from the HTML version… oh, and syntax highlighting…

        Do you know of a command line program that can turn Python code into pretty HTML?

        1. 4

          vim can export syntax highlighted code to styled HTML, built right in:

          see :h TOhtml or this useful SO post

        2. 3

          Pygments is the standard solution, I think.