Threads for olliej

    1. 9

      Apple’s CoreFoundation library makes extensive use of low-bit pointer tagging. This allows it to store small integers (and maybe floats?), booleans and short strings directly in a pointer without allocating memory.

      The encoding gets pretty complex, especially for strings; IIRC there are multiple string encodings, one of which can crunch characters down to 5 bits so it can store a dozen(?) characters in a 64-bit pointer. This sounds expensive, but I assume there are SIMD tricks to speed it up, and it’s still going to be way faster than allocating and dereferencing a pointer.

      1. 3

        Any reference on the 5 bit encoding? This is in APIs that get called from Objective C or Swift?

        The low bit tagging seems like an obvious win and portable win these days, although I know Lua avoided it because it’s not strictly ANSI C. Rust also got rid of small string optimization early in its life, apparently due to code size

        https://old.reddit.com/r/rust/comments/2slcs8/small_string_optimization_remove_as_mut_vec/

        Though honestly I would have expected fewer heap allocations and cache misses to be a win for most string workloads

        1. 5

          You got it — the reference I remember is from Mike Ash’s blog, which has been dormant for a few years, but the archives are a treasure trove of low-level info about Apple’s runtime.

          The CoreFoundation types are exposed in Obj-C as NSString, NSNumber, NSDate, NSValue. They also show up in Swift for bridging purposes, but the native Swift string and number classes are implemented differently (in Swift.)

        2. 1

          The various tagging schemes that objc (and by proxy swifts interop) uses are internal implementation details that can (and have) changed so it’s not API. Instead objc_msgSend and family handle it directly - similar to the myriad refcount stores and what not.

        3. 1

          I was actually looking at Mike Ash’s blog this week for info on tagged pointers:

          How about 5 bits? This isn’t totally ludicrous. There are probably a lot of strings which are just lowercase, for example. 5 bits gives 32 possible values. If you include the whole lowercase alphabet, there are 6 extra values, which you could allot to the more common uppercase letters, or some symbols, or digits, or some mix. If you find that some of these other possibilities are more common, you could even remove some of the less common lowercase letters, like q. 5 bits per character gives eleven characters if we save room for length, or twelve if we borrow a symbol and use a terminator.

          https://mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.html


          I was actually looking at the blog because I was wondering if ref counting in the unused 16 bits of a pointer might be feasible. It would give you up to 65k references, which is more than enough for many (most?) use cases. That would slim down the size of ref counted values and might make them as cache friendly as a GCed value. Might not be thread safe though.

          1. 3

            Wow, skimming the rest of the post, this is a lot more subtle than I would have expected, and also relatively recent – OS X 10.10 as of 2014.

            1. If the length is between 0 and 7, store the string as raw eight-bit characters.
            2. If the length is 8 or 9, store the string in a six-bit encoding, using the alphabet “eilotrm.apdnsIc ufkMShjTRxgC4013bDNvwyUL2O856P-B79AFKEWV_zGJ/HYX”.
            3. If the length is 10 or 11, store the string in a five-bit encoding, using the alphabet “eilotrm.apdnsIc ufkMShjTRxgC4013”

            The five-bit alphabet is extremely limited, and doesn’t include the letter b! That letter must not be common enough to warrant a place in the 32 hallowed characters of the five-bit alphabet

            Pretty crazy!

            I think if you control the entire OS and the same NSString is used everywhere, this makes more sense.

            For what I’m doing, we have to call into libc more, and pass it C strings, so we don’t control that part. The allocation to decode and make it look like a C string is problematic. Not just slow, but creates an ownership problem.

            1. 2

              I think if you control the entire OS and the same NSString is used everywhere, this makes more sense.

              It makes me wonder if that is a headwind for adoption to Swift on other platforms. Is the language so tuned to performance on a single platform that it makes the code difficult to port?

              It seems like they are also doing some pretty intricate things with C++ interop. As a outside observer (and a relatively ignorant one at that), it seems like it would be very difficult to generalize some of this work.

              1. 3

                This stuff is part of CoreFoundation, not Swift. Swift on Apple platforms has some pretty sophisticated interop / FFI with it, for compatibility with Objective-C code, but that isn’t present in Swift on other platforms.

              2. 3

                It shouldn’t be hard to port. In GNUstep, we adopted a compressed strings in pointers encoding some years before Apple (not the 5-bit one. Doing that well requires some analysis of data that I didn’t have access to from run-time profiling of a large set of real-world applications). The interface for iterating over strings allows the caller to provide a buffer. These strings are, by definition, of small bounded length and so converting to a C string for interoperability is trivial with the caller providing the buffer on its stack.

                It does have some very nice performance properties. A lot of dictionaries use small strings as keys. If you check the length on the way in and try to convert mutable strings used for lookup to small strings then you know that the result either is or isn’t a small string. This lets you skip a bunch of comparisons after you’ve found the hash bucket.

              3. 2

                Not really. In fact, Swift on Linux doesn’t have certain complexities that are present on Apple platforms due to the necessity of ObjC interop.

              4. 2

                I haven’t followed Swift closely, but I do have the feeling that the focus is entirely on Apple’s platform, and there are some fairly hard tradeoffs with respect to portability / open source.

                Just like I think of Google’s entire stack as a vertically integrated embedded system (hardware up to cloud and apps), Apple seems to be architected in a similar way. They get some pretty big benefits from the vertical integration and control

                Andreas Kling mentioned that Apple is a huge inspiration for SerenityOS – basically doing everything yourself and not taking dependencies, which is kind of the opposite of most open source, which is about reuse and portability

                It seems like if Swift were going to be popular on Linux or Windows, that would have already happened by now. Looks like it’s about 9 years since the first release now

          2. 2

            You can’t put the ref-count in a pointer, because a pointer is a reference. If you increment a count in a pointer, that doesn’t affect any other pointers to the object, so no one else knows you added a reference.

            CoreFoundation does (IIRC) store refcounts outside objects. My memory is hazy, but it might reserve a few bits for an internal refcount, and when that pins at its max value, the real refcount moves to a global hash table. The answer probably lies in Mike Ash’s blog. (Swift-native class objects don’t do this, though.)

            [Update: just saw your other post below. What CF does uses spare bits in a pointer field inside the object; I thought you meant putting the refcount in pointers to the object.]

            1. 2

              No, you were right in the first place. I was totally thinking about things wrong. I got wrapped around the tree a bit. Thank you for the correction.

            2. 1

              It was this one - I added a link to it this morning after this thread!

          3. 2

            The refcount for a normal objc object is kept in a few points of the isa pointer, and then moved to a side table once the refcount exceeds the available bits. The result is that there’s no major memory overhead to the refcount in normal objects.

            1. 2

              I did see that in this article from Mike Ash’s site:

              https://www.mikeash.com/pyblog/friday-qa-2013-09-27-arm64-and-you.html

              It made me happy to see that it wasn’t a totally stupid idea on my part! :]

      2. 2

        Yup. Plus the WebKit “Gigacage” and the v8 (Javascript runtime) uses this mechanism for sandboxing. I hear another browser engine is considering something similar.

      3. 2

        On Intel, the 5 bit encoding can be optimized by using pdep to expand each 5 bits into a full byte, and pshufb to do a SIMD table lookup. I don’t think Arm has something like pdep though.

        1. 1

          Pdep is a bit overpowered. You can also do it with a multishift, though arm doesn’t have those either…

          1. 1

            I don’t know much about ARM, but I guess you could:

            • Broadcast the lower and upper 32 bits of the NSString each into a simd register v1, v2.
            • Shift and mask each lane by a separate amount so that the relevant 5 bits are aligned with 16 bit lanes (and the relevant lanes don’t overlap between v1, v2).
            • Or v1, v2
            • Use a tbl instruction to recostruct the original order.
      4. 1

        Storing tags in a few low bits isn’t really what this is about I think, as changes to total address space (adding high bits) don’t really impact it.

        1. 5

          The article did talk about it, though.

        2. 2

          It’s not what led me to start writing the article but it’s definitely relevant. It’s also probably used way more in practice than putting data in the upper bits. Thanks @snej for the additional note - I actually came across some of the Mike Ash blog posts last night too. I’ll add a reference to them in.

          1. 7

            For CHERI C, we found a lot of cases of storing things in the low bits (not least in LLVM’s compressed integer pointer pair template). This is a big part of why we support (on big CHERI) out of bounds representation beyond the end of a pointer’s range: so you can store data in the low bits of a pointer to the end of an array (you can’t dereference it without bringing it back).

            Most (not quite all) of the places where things stored dat in the top bits, they were creating tagged unions of a pointer and some other data and rarely storing anything in the top bits other than a ‘this is not a pointer’ pattern, which can be replaced by the tag bit on CHERI (giving a full 128 bits of space for other data, something I started to explore but didn’t pursue very far and which I expect to be very useful for dynamic languages). A small number of things stored a a pointee type in the high bits.

            Morello separates the value field of the capability into address and flags. With TBI enabled, you can store data in the top 8 bits. I believe this was mostly done to allow MTE composition later. It has some fun implications on arithmetic.

    2. 2

      This is standard practice on macOS (i believe it’s literally the default behaviour out of the standard toolchains), I’m surprised it isn’t happening by default on linux as it is superior for most use cases.

      1. 3

        I think macOS does it a slightly different way. As I recall, their toolchains use a different format for the debug info that is linked by a custom tool. Symbols referenced by debug info are marked to not be discarded, but you can be very lazy about linking the debug info. Xcode, I think, starts running linked binaries immediately for testing and links the separate debug info files in the background.

        On ELF platforms, the same thing is built using existing relocation types and file formats. Apple was doing it for about 10 years before ELF toolchains got similar features (Windows has also done it for a long time).

    3. 11

      So I’ve read through this multiple times, and it sounds like Zig chooses by some opaque criteria to pass by “large” values by what c++ would call a const reference. But it does this whether or not the type is mutable, and thus it is not semantically equivalent to copy by value despite being syntactically identical. This seems like a language bug.

      1. 4

        Call it a language bug, or call it implementation-defined behavior. I agree that it is insidious and bug-prone.

      2. 1

        Aren’t function arguments in Zig always const anyway?

        1. 2

          The problem as I understand it from the article is that the way Zig is implemented (using C++ syntax here)

          int foo(some_type p0, some_type *p1);
          

          Could actually be implemented as (implicitly, at the compilers discretion):

          int foo(const some_type& p0, some_type *p1);
          

          Which is only safe if some_type is itself immutable. But if some_type is not immutable you get different behavior for:

          int foo(some_type p0, some_type *p1) {
            // Initialize p1
            bzero(p1, sizeof(p1));
            p1->field1 = p0.field1;
            p1->field2 = p0.field2;
            ...
          }
          
          some_type wat;
          foo(wat, &wat);
          

          if p0 is changed to some_type& you can hopefully see how the program behaviour would change, and possibly not in a good way. Because this seems to be an decision made by the compiler that does not occur 100% of the time, it seems that this could result in different versions of the compiler, or different callsites even (depending on inlining, etc) changing behaviour.

          Now I believe swift does do automatic conversion of value types to reference parameters, but that’s because value types in swift are semantically immutable (I think the moral equivalent to this particular code in swift would require UnsafeMutablePointer which is a pretty big red flag).

    4. 3

      This person is only actually complaining about the specific use case of embedded systems that already drop a bunch of C++.

      Claiming it is broken because you can’t use it in such an environment is just clickbait nonsense.

      1. 4

        I had to do some work with to generate small code using source location because, without the right inlining instructions, it wasn’t working with SROA and I ended up blowing out my (1 KiB) stack with a load of source locations for debug messages.

        The file name is constexpr, so you can use it in a template argument, you just need to wrap it in a class. I have something like this for a few places where I need a string as a template argument (I wish it lived in stdlib): you make a template class that contains a char array and it templated over its length. You then memcpy a constexpr char* into it in the constructor. It’s now a structural type that can be used as a template parameter.

        I believe the problem here may be that it isn’t executing in a constexpr context, but that’s fixable if you want to do some comstexpr string hashing and have the types and names all visible in the binary.

        1. 1

          maybe? I’ve spent a bunch of time recently futzing with constexpr misery and a lot of the semantics of C++ result in const char* rather than char[N], even in constexpr environments. That said it’s possible that my experience is due to the way clang internally handles constexpr and not integer values.

          1. 1

            Here is the code that allows you to wrap a string literal in a template parameter. It works reliably and should work with any string whose length is known at compile time, so should work with a correct source location implementation.

    5. 4

      This is a great write-up for making the case. But isn’t the bar for a new HTML element much higher?

      It seems to me, all of this can be done without new elements.

      1. 8

        Is it so high though? In my days HTML5 used to be pretty open to adding new elements and attributes. It’s got a bunch of niche less-used elements, like <mark>, <hgroup>, <dfn>, and <meter>. HTML5 also added elements like <header>, <main>, and <search> for predefined ARIA roles.

        So I don’t think the current set is some holy perfection that shouldn’t be disturbed. If there’s a common use-case, which does something useful in user-agents and search indexes, then it could be easily added.

        1. 4

          I think <meter> would be more useful if it was possible to style it without unstandardized vendor prefixes that don’t work half of the time.

        2. 2

          A lot of programming languages and data format specs could use a little deprecation and cleanup. An HTML 6 that cleans house with respect to some of the tags you mentioned would be nice. Backwards compatibility to the early 80s can become a liability, not an asset.

          1. 1

            early 80s

            Wow, HTML is older than I thought!

            1. 3

              SGML is. The Web was created in 1990.

            2. 1

              Okay, you’re right… early 90s… https://www.w3.org/wiki/HTML/Specifications Still, a long time ago, long enough that maybe the original specs aren’t worth adhering to any longer, in every single detail.

              1. 3

                I’d hate for the stuff I wrote for the web in the mid-90s to no longer be readable in modern browsers, but maybe that’s just me being vain.

                1. 1

                  but maybe that’s just me being vain.

                  Proposed solution: s/ I/ people/; s/, .* vain//

        3. 2

          Around a month or two ago, we got search. But that wasn’t a big ask since it filled a hole in HTML for an ARIA role with no corresponding HTML element.

      2. 4

        indeed, if anything this would be a new ARIA role

      3. 4

        I think it can for any one site but I think it can be much more powerful if this was native to the browser.

        1. Consistent interfaces, from simple like click-to-reveal to more complex like keyboard shortcuts.
        2. Ability for the user to select what content they want hidden by default and what they want shown by default (but hopefully not revealed to the website so it doesn’t add to their fingerprint).
        3. Possibly better interaction with accessibility tools, and almost certainly better by default.
        4. No need to re-implement on every different site.
        5. No need for JS.
    6. 1

      If you ever encounter a person taking the ‘the government/“police” should be able to read any communications to stop [crime/pedos/whatever]’ position, ask them if they would be ok with a government regulation mandating cameras in every room of every building (your home, your workplace, your religious building of choice, bathrooms, etc).

      It’s an obviously abhorrent concept, but unlike breaking security of pretty much everything you do regularly, rolling back decades of improvements to security that allows, say, online banking, the “cameras inside your home” law has the benefit of actually helping solve and prevent crimes, because you can’t trivially circumvent it.

      1. 2

        It’s well-known that spies and terrorists often communicate face to face. Therefore we should monitor every conversation! /s

        1. 1

          That’s my point though: they can have a face to face conversation, but there’s conveniently a camera there. There’s no potential for abuse in my perfect system :D

    7. 20

      This is an article that’s worth saving to link back the next time someone insists that game dev is the home of People Who Care About Performance or that The Market Will Not Allow Poor Performing Games or whatever.

      Except of course it’s not just this instance with this game. Games with awful problems like these are released pretty regularly, even from (some might say especially from) the biggest and most financially successful studios and publishers. There’s nothing actually unique about game dev or about the people who do it that makes them better at or more aware of or more committed to performance than any other field of programming; they screw it up at least as often as everyone else.

      1. 15

        I think the interesting part about this report is that it seems to show that CO did care about performance, at least in certain areas. They went all-in on Unity’s DOTS system because they wanted the simulation itself to run as smoothly as possible - then they got bit by Unity’s tendency to release features before they’re fully ready and/or drop them half-baked.

        1. 12

          The delicious irony here is that, as I understand it, DOTS or at least the architecture of it was a Mike Acton project.

          You know, the guy who dunks on people for not caring about performance, who gives talks saying that people who don’t live up to his personal standards for performance should all be fired, is often cited approvingly as someone who really cares about and pushes others to care about performance, etc.

          1. 15

            As far as I understand it, the game-tick-simulation parts that heavily use DOTS are working really well here. The part that fell down was where Unity half-assed the connection to their render pipeline and shipped it anyway, so if you want to use DOTS for your game, you have to do a bunch of gymnastics to get it to play nicely with HDRP

        2. 1

          A city builder has hundreds of thousands of buildings and people, so it doesn’t seem surprising that a framework built for more typical games that have a much smaller number of entities might fall over.

          1. 13

            Per the article,

            They chose DOTS as the architecture to fix the CPU bottlenecks their previous game suffered from and to increase the scale & depth of the simulation, and largely succeeded on that front. CO started the game when DOTS was still experimental, and it probably came as a surprise how much they had to implement themselves even when DOTS was officially considered production ready. I wouldn’t be surprised if they started the game with Entities Graphics but then had to pivot to custom solutions for culling, skeletal animation, texture streaming and so on when they realized Unity’s official solution was not going to cut it.

            so the actual running of the simulation sounds like they made the right call. It was the dodgy connection between that and the rendering that caused the problem.

      2. 4

        We already know game devs don’t actually care about perf in the manner that’s implied: the number of games that are still 32bit despite that easily eating 15-20% cpu performance is mind blowing - this would only be reasonable if your game never hits max cpu load (although even then all you’re doing is needlessly wasting battery life on laptops).

        1. 5

          the number of games that are still 32bit despite that easily eating 15-20% cpu performance is mind blowing

          That’s highly data-structure dependent. On x86, the biggest win from 64-bit is being able to assume SSE, but most games are likely to compile with that anyway. Beyond that, you get much faster position-independent code (doesn’t matter for Windows, because even DLLs are not position independent in 32-bit mode) and you get more registers. On the flip side, you have bigger pointers.

          If your game’s main data structure is a large scene graph then the size of the pointers will have a big impact on cache hit rates and smaller pointers can easily be a bigger win than you lose from fewer registers. Worse, the performance variation across different systems is a lot bigger from cache misses than it is from fewer registers and so you’re likely to have performance cliffs in different places on different CPUs even within a single range from Intel or AMD.

      3. 3

        https://www.metacritic.com/game/cities-skylines-ii/

        User reviews are “3.3: generally unfavorable.” Lots of people mentioning the performance. So the market seems to be in the process of not allowing poor performance.

        1. 1

          Now do the reviews for Minecraft :)

          1. 2

            User score 7.8.

            Googling tells me that some people experience performance problems with Minecraft, but it’s not the default.

            1. 1

              The joke here is that if you hang out on Minecraft forums the performance is a fairly common complaint and has been for basically the entire time the game has existed. A lot of guides to mods and add-ons for Minecraft recommend to all users that they install OptiFine, a mod whose sole purpose is to try to make the game’s performance more acceptable.

              And yet the game continues to be popular and loved. Which is a strong counterexample to the “markets will punish poor software performance” theory.

    8. 2

      requires a JSON-like format that provides blobs (binary large objects) for things like public keys and encrypted secrets

      I might be living in an alternate universe, but around here, public keys and encrypted secrets are generally much smaller than the average message my services pass across. So how exactly are they “large”, and why would I need a binary representation for them?

      Or are these encrypted secrets stuff like the entire payload, encrypted?

      To process a continuation byte, shift the kim accumulator left by seven bits, and then add the 7 data bits to the kim accumulator. If the continue bit is 0, we are done. Otherwise, repeat with the next byte.

      Soo… the Text type has 4 bits(!) for the length (and if you need more than that, use the continuation bit), and each character may also have a continuation bit. So you have exactly zero chance of knowing how long a particular value is ahead of time, until you consume it ‘till the end, and there’s little hope in being able to consume this format in a stream. Then you can layer arrays and records on top of it, which also make use of continuation bits.

      This is a hell to parse and consume. I’m steering clear, thanks.

      Also, I might be reading it wrong, but it looks like the Blob type lets us have 4 bits for the length, which describes how many bits are in the payload. So that’s like… 2 bytes? And that’s a binary large object? If I’m reading this right, encoding sizable data as a blog would be incredibly wasteful, with 1/3 of it being the preamble.

      Damn.

      1. 2

        If you’re making bad crypto choices, then RSA key exchanges are large. The solution is of course to not use RSA :D (there is a questionable advantage that if your messages is small enough you can fit it inside an RSA key exchange which saves space, but you have to be very careful as you can easily get leakage as the security of RSA in practice is from its use to exchange a random key rather than non-random data)

        The various PQC algorithms have larger key exchanges (the big difficulty in PQC is making the keys/exchange practical - some of the basic algorithms have Mbits of key and exchange material).

      2. 1

        Encrypted data can be arbitrarily large, yes. And it’s useful to have a separate type for binary data so a reader can tell (without access to a schema) which things are meant to be binary — looking at a JSON string there’s no reliable way to know whether or not it’s supposed to be base64-decoded.

        But yeah, this format sounds bonkers.

    9. 9

      This includes some really quirky points of view. The one that caught my attention first:

      The representation of numbers is completely independent of obsolete number formats like IEEE 754.

      I think this is fairly emblematic of much of what follows. Lest you think I’m cherry-picking, he follows this shortly with:

      Nota departs from JSON by using counts instead of brackets. The representation of counts comes from Kim. Kim is also the representation of characters, being simpler and more compact than UTF-8.

      Kim is another new thing he’s proposing here, to replace UTF-8 for transmitting Unicode. At least he acknowledges the obvious downside there:

      UTF-8 is one of the world’s great inventions. While Kim is simpler and more efficient, it is not clear that it is worth the expense of transition.

      There is… a lot more like this.

      1. 3

        Sure. I think a lot of Crockford’s stuff is like this. His base32 is pretty opinionated; he has an article about spelling reform on his blog; etc. He did discover JSON, so that’s cool, but I wouldn’t come in expecting his other ideas to catch on as well.

      2. 2

        Yeah I like this statement as a casual aside:

        Now that JavaScript is being phased out, an association with it is no longer desirable.

        I recall he simply declared that JavaScript was over in one of his recent talks too :)

        1. 4

          Fun fact, Crockford is part of the reason JS has for(of) instead of for(:), as he and Mark Miller IIRC didn’t want any syntax that might have conflicted with their type annotations for JS, and for(of) was my suggestion to get anything to happen. I still regret not just pushing through for(:) on the basis that you could syntactically disambiguate their type annotations (J4X was already a standard so it had burned for each/foreach even though it was clearly not a standard that was actually relevant for anything, and I believe is now officially dead)

    10. 2

      I know it’s probably something wrong with me but where can I find this proposed legislation (containing this sinister article 45) and read it? Also, if it is really true, I don’t understand how any country in Europe would agree to this. Will eg. Hungary be able to create a new certificate for riksdagen.se and browsers in the whole Europe will just accept it? Or is there any more detail to this story?

      1. 15

        Also, if it is really true, I don’t understand how any country in Europe would agree to this. Will eg. Hungary be able to create a new certificate for riksdagen.se and browsers in the whole Europe will just accept it? Or is there any more detail to this story?

        It looks as if this is another case of well-intentioned legislation being written by people with no understanding of the subject at hand and without proper consultation. I believe (judging from the analysis in the letters) the intent was to require that EU countries are able to host CAs that are trusted in browsers, without the browser vendors (which are all based outside the EU) being able to say ‘no, sorry, we won’t include your certificate’. Ensuring that EU citizens can get certificates signed without having to trust a company that is not bound by the GDPR (for example) is a laudable goal.

        Unfortunately, the way that it’s written looks as if it has been drafted by various intelligence agencies and introduces fundamental weaknesses into the entire web infrastructure for EU citizens. It’s addressing a hypothetical problem in a way that introduces a real problem. I’m aware that any sufficiently advanced incompetence is indistinguishable from malice, but this looks like plain incompetence. Most politicians are able to understand the danger if US companies can act as gatekeepers for pieces of critical infrastructure. They’re not qualified to understand the security holes that their ‘solution’ introduces and view it as technical mumbo-jumbo.

        1. 4

          Ensuring that EU citizens can get certificates signed without having to trust a company that is not bound by the GDPR (for example) is a laudable goal.

          As far as I am aware of, the real reason for this is to facilitate citizen access to public services using certificates that actually certify to citizens that they are in fact communicating with the supposed public organization.

          EU would be served well by a PKI extension that would allow for CAs that can only vouch for a limited set of domains where the list of such domains is public and signed by a regular CA in a publicly auditable way.

          Or even simpler, they could just define a standard of their own with an extra certificate for the regular certificate. Then they could contribute to browsers some extra code that loads a secondary certificate and validates it using a different chain when a site sends a X-From-Government: /path/to/some.cert header and displays some nice green padlock with the agency name or something.

        2. 2

          intelligence agencies

          Looking how misguided this legislation is, I’m not sure if these agencies are from any European country. This level of incompetence feels utterly disappointing. Politicians should be obliged to consult experts in a domain that given new legislation affects. I also don’t understand what’s so secret about it that justifies keeping it behind closed doors. Sounds like an antithesis of what EU should stand for. Perfect fuel for some people that tend to call EU 2nd USSR and similar nonsense like this.

          1. 5

            Looking how misguided this legislation is, I’m not sure if these agencies are from any European country

            It depends. Several EU countries have agencies that clearly separate the offensive and defensive parts and this is the kind of thing that an offensive agency might think is a good idea: it gives them a tool to weaken everyone.

            Politicians should be obliged to consult experts in a domain that given new legislation affects

            This is tricky because it relies on politicians being able to identify domain experts and to differentiate between informed objective expert opinion and biases held by experts. A lot of lobbying evolved from this route. Once you have a mechanism by which politicians are encouraged to trust outside judgement, you have a mechanism that’s attractive for people trying to push an agenda. I think the only viable long-term option is electing more people who actually understand the issues that they’re legislating.

            I also don’t understand what’s so secret about it that justifies keeping it behind closed doors. Sounds like an antithesis of what EU should stand for

            The EU has a weird relationship with scrutiny. They didn’t make MEPs voting records public until fairly recently, so there was no way of telling if your representative actually voted for or against your interests. I had MEPs refuse to tell me how they voted on issues I cared about (and if they’d lied, I wouldn’t have been able to tell) before they finally fixed this. I don’t know how anyone ever thought it was a good idea to have secret ballots in a parliamentary system.

            1. 1

              I think the only viable long-term option is electing more people who actually understand the issues that they’re legislating.

              This is the sort of thing that an unelected second chamber is better at handling. Here’s an excerpt from an interview with Julia King, who chairs the Select Committee on Science and Technology in the UK’s House of Lords:

              You get a chance to comment on legislation because we are a revising chamber. We’re there to make legislation better, to ask the government to think again, not to disagree permanently with the government that the voters have voted in because we are an unelected House, but to try and make sure that legislation doesn’t have unintended consequences.

              You look at the House of Commons and there’s probably a handful now of people with science or engineering backgrounds in there. I did a quick tot up - and so it won’t be the right number - of my colleagues just on the cross-benches in the House of Lords and I think there must be around 20 of us who are scientists, engineers, or medics. So there’s a real concentration of science and engineering in the House of Lords that you just don’t get in the elected House. And that’s why I think there is something important about the House of Lords. It does mean we have the chance to make sure that scientists and engineers have a real look at legislation and a real think about the implications of it. I think that’s really important.

              That’s from an episode of The Life Scientific.

            2. 1

              I think the only viable long-term option is electing more people who actually understand the issues that they’re legislating.

              I don’t see how this could ever be viable given existing political structures. The range of issues that politicians have to vote on is vast, and there just aren’t people that exist that are simultaneous subject matter experts on all of them. If we voted in folks that had a deep understanding of technology, would they know how to vote on agriculture bills? Economics? Foreign policy?

              1. 1

                You don’t need every representative to be an expert in all subjects, but you need the legislature to contain experts (or, at least, people that can recognise and properly interrogate experts) in all relevant fields.

                I’m not sure if it’s still the case, but my previous MP, Julian Hubert, was the only MP in parliament with an advanced degree in a science subject and one of a very small number of MPs with even bachelors degrees in any STEM field. There were more people with Oxford PPE degrees than the total of STEM degrees. Of the ones with STEM degrees, the number that had used their degree in employment was lower. Chi Onwurah is one of a very small number of exceptions (the people of Newcastle are lucky to have her).

                We definitely need some economists in government (though I’ve yet to see any evidence that people coming out of an Oxford PPE actually learn any economics. Or philosophy, for that matter), but if we have no one with a computer science or engineering background, they don’t even have the common vocabulary to understand what experts say. This was painfully obvious during the pandemic when the lack of any general scientific background, let alone on in medicine, caused huge problems in trying to convert scientific advice into policy decisions.

      2. 8

        You currently cannot as per the first paragraph. The working documents are not public.

        1. 1

          My reading comprehension clearly leaves a lot to be desired, my bad.

      3. 3

        Because as with all legislation here, every person involved cannot understand the threat model. They believe that obviously this will only do good things, and don’t understand that different places have different ideas of what is “good”. They similarly don’t understand that the threat model includes people compromising the issuer, and don’t understand that given the power of their own CAs they will be extremely valuable, and also part of a section of all governments is generally underfunded.

        Fundamentally they don’t understand how trust works, and why the CARB policies that exist, exist.

      4. 3

        Also, if it is really true, I don’t understand how any country in Europe would agree to this.

        Considering how the EU works, this was probably proposed by a member government, and with how it’s going, many member governments probably support it.

      5. 2

        I don’t know what’s in the proposed legislation, but the version of eIDAS that was published in 2014 already contains an Article 45 about certificates (link via digital-strategy.ec.europa.eu):

        Article 45 - Requirements for qualified certificates for website authentication
        1. Qualified certificates for website authentication shall meet the requirements laid down in Annex IV [link].
        2. The Commission may, by means of implementing acts, establish reference numbers of standards for qualified certificates for website authentication. Compliance with the requirements laid down in Annex IV shall be presumed where a qualified certificate for website authentication meets those standards. Those implementing acts shall be adopted in accordance with the examination procedure referred to in Article 48(2) [link].

        I suppose the proposed legislation makes this worse.

    11. 2

      The article has a concise summary of where ‘things are at today’ wrt MS’s UI toolkit strategy. It still amazes me how incomprehensible that strategy is though (given that MS has had lots of experience in this area)

      For Windows, owning 3 separate UI frameworks wasn’t sustainable, and the OneCore effort refactored all Windows devices to be based on the same core operating system. This first came to fruition with Windows 10, and the application model was called the Universal Windows Platform (UWP).

      After a few years of UWP not being a popular platform for developers, the Windows team shifted strategy and shipped Windows App SDK. The UI Framework within the WinAppSDK is known as WinUI 3, which is where things are at today.

      1. 1

        this was when I stopped developing for and even using windows. abandoning win32 was an absolutely foolish decision.

        1. 2

          The problem wasn’t abandoning win32, it was saying this has been replaced by X, where X couldn’t do a bunch of fundamental things and wasn’t generally very compatible with win32, and then a few years later rather than fleshing out the functionality of X they release Y, which has a different set of things it can and cannot do. On Mac/iOS I’m still not sold on SwiftUI being a magic panacea as people still seem to need to use AppKit or UIKit for anything non-trivial, so it’ll be interesting to see if it becomes apple’s WinForms/UWP/WinUI - it mostly seems to be riding the wave of half assed chrome wrappers claiming “reactive” design is the solution to all problems.

          1. 3

            The difference is you can mix and match SwiftUI with AppKit/UIKit effectively (compare whatever the hell this is). SwiftUI and AppKit/UIKit are the only options, and their intent is clear (compare to Microsoft, again).

            1. 1

              I thought you could, but I haven’t done enough with it (or UI programming in swift in general) to just state that as a fact :D

    12. 5

      Where is Nix putting things such that the OS installer is clobbering them? There shouldn’t be a need to reinstall anything after an OS update unless they’re being put in places that they shouldn’t be.

      1. 8

        In /etc. Specifically things like updating zshrc to add nix to the environment. Apple wipes these global configuration files because of course they do.

        1. 10

          /etc on macOS gets blown away by updates because it controls system configuration in ways that can - and do - result in OS’s being broken post-update. The solution is to ensure etc is not broken, by resetting it.

          The problem here is Nix is modifying a system directory, that the documentation from apple has said is not something it should be modifying (I assume the only reason etc isn’t in the system volume is because of historical software being broken if it can’t write to it - lots of software believes it can modify etc/hosts, etc). There are non-global locations that Nix should be writing to, your example of zshrc for example is already handled by the user’s zshrc which is not touched.

          1. 13

            Global shell files exist for a reason, so that system administrators can modify configuration for all users of a system. Hence why nix wants to write to /etc/zshrc. This is a common pattern across almost all (or all?) unix based systems. They treat /etc configuration files as just that, configuration, and provide non invasive upgrade procedures that maintain site specific customization. Apple may claim that you shouldn’t write global configuration there, but unless they provide an alternative then people have no other choice.

            1. 1

              I think that the special file here is /etc/zshrc that gets reset. If I recall correctly, /etc/bashrc does not get reset on OS upgrades. It’s specific to zsh.

              1. 2

                And zsh is the interactive shell on MacOS so quite important to have nix support in. :)

            2. 1

              The problem, I believe, is that Apple sees “the system” as their domain which really only Apple should modify, while Nix sees nix-darwin as a part of “the system”.

              1. 1

                They have an entire top level directory called /System for the system. The issue at hand here doesn’t have anything to do with nix-darwin, but with the configuration added by the Nix installer.

      2. 3

        Not sure if that’s still the case but home brew also broke for me regularly over Mac OS updates in the past.

      3. 3

        My understanding is that all of the Nix stuff is tucked away neatly in its own volume on /nix, but macOS regularly wipes out all the symlinks and dotfile customizations within the OS host volume.

    13. 7

      This seems superfluous when we already have URI templates

      1. 3

        As I read them, the op URL Patterns spec is for matching and extracting variable parts of a fully formed address, but URI Templates is for expanding variable data in order to create a new addresses

      2. 2

        Definitely. It seems strange for the URL Pattern definition not to even mention URI Templates, let alone explain the differences for those of us already familiar with them.

        A quick web search found this comment in a WHATWG document about URL Patterns that in turn cites the Limitations section in the URI Templates RFC, but that brief dismissal is all that my search found.

        Given how much the WHATWG have written about URL Patterns, it’s a shame they seem to have written so little comparing them to prior art.

        1. 5

          It seems strange for the URL Pattern definition not to even mention URI Templates, let alone explain the differences

          It’s not at all surprising to me, considering the much more fundamental https://github.com/whatwg/url/issues/703 (archived in case they delete it).

          WHATWG seems to me like they think, and want people to think, that IETF’s definition of a URL has no use or relevance and ought to be forgotten, while they (WHATWG) refuse to acknowledge that their definition of a URL has tradeoffs rather than being strictly better. I have no idea what political dynamics lead them to act in this way.

          1. 3

            The WHATWG came out of “the standard specification for X does not match reality” (where X is HTML, the DOM, URIs, etc). The problem with that is it means no one can look to the specs to know what to implement, and you can’t offload that on publishers producing invalid URIs if the URIs work everywhere - no browser is going to deliberately break something that works everywhere else purely for “this doesn’t match the spec” (in general back when I worked on browsers the first step in any bug report is “does this work in other browsers” and eventually fixing the spec). The overarching goal here is specifications where “if I implement this specification as written I will deny the same URIs that other browsers deny, and it will accept, and interpret in the same way, the exact same URIs the other browsers accept”. The proliferation of weirdities in older aspects of the web spec was the result of the specs not matching reality, so you literally could not just “implement the spec” - first they were incomplete even at the time, then what was specified was oftentimes ambiguous, filled with gaps, or simply wrong. For most of the decade or so I worked on browsers, a huge amount of time was just working out what the specs actually needed to be. At least the anal retentive nature of ECMA meant the JS spec was not too bad, but even that left and odious amount of required information unspecified or ambiguous.

            This particular spec seems to just be providing an API specification for (I’m guessing internally) leveraging the browser’s spec compliant parser, rather than yet another ad-hoc parser/splitter.

      1. 18

        That’s a discussion about a borderline incoherent rant which isn’t really relevant here because Let’s Encrypt provide tools that can be used to prevent the kind of MITM discussed in the present post.

        1. 23

          Especially irrelevant because not using Let’s Encrypt would’ve done nothing to prevent this MITM.

      2. 6

        As others have said that post is just a rant, but it also ignores the two basic (and core) issues of any PKI:

        1. It doesn’t matter if you use let’s encrypt (or any CA) if someone can use that CA or force a CA to issue a cert, then that cert is issued.

        2. DV vs OV (or even the nonsense that is EV) doesn’t matter, see (1) if a CA is legally compelled they may have no choice but to issue - plenty of countries, including the US have secret courts that issue secret rulings that you cannot publish, comment on, or appeal (because appealing them damages national security in a way that having secret courts mysteriously does not).

        The client side solution here is for all clients to do the same bare minimum that browsers do (seriously: if a browser has a TLS policy, and you do not also have that TLS policy, you are wrong): mandate all certs have the appropriate CT flags. I know library devs are afraid of breaking things, but seriously, you can’t expect downstream projects to make changes to their call configurations unless there is some signal that things have changed. I personally feel that every TLS library (OpenSSL, BoringSSL, etc) should either default to requiring CT, or gate the API that does not test CT on a compile time flag: easy enough to set, but the dev has to do it, and acknowledge that they are.

        Then if you are offering a service that allows for people to communicate with each other with some perception of privacy or anonymity you have an ethical duty to monitor all the trusted CT logs for [mis-]issuance of certs for you domains.

    14. 27

      This is an excellent analysis.

      As a Hetzner customer, I’m looking forward to their response. Whether this was a “lawful” MITM or not, their response will be of great interest.

      1. 10

        Another Hetzner customer here, can’t wait for their response. The irony is I migrated from Linode to Hetzner after Akamai acquisition, and it looks like I might have to jump ship again.

        I don’t do anything crazy, but now I’m slightly uncomfortable with my personal Jabber and Mastodon instances residing on Hetzner infra.

        1. 10

          Where would you go? Most western countries have the same laws that would require a provider to cooperate

          1. 1

            I don’t know, Switzerland comes to mind. Also, it’s slightly closer to my home country so the latency would be a few ms shorter which would be another win for me.

            I’m going to wait for an official response (if any) before I make a decision.

            1. 14

              If the German government is sending a national security letter over your Mastodon instance’s VPS, I think you might have bigger problems.

              1. 3

                True that, but that’s not my main concern here. A bigger concern is if Hetzner refuses to issue a public statement about this, and I simply don’t want to get caught up in a future potential MITM attack that might target a whole hypervisor my VM is on or a similar scenario.

                    1. 4

                      At this point I think it’s clear to say that withdrawing a canary would be considered to have announced it, so might itself be prosecuted, even though that is itself an exciting example of forced speech. But if an organisation has taken the view that “intercepting and recording all communication with a channel used by thousands” I assume that organisation isn’t super interested in human rights.

                1. 5

                  If this is a “national security” thing, Hetzner are likely under a gag order.

            2. 1

              Latency is zero if you host at home. It’s also free and much more secure.

              1. 7

                Having hosted lots of servers at home, it is not free and would not protect against this form of attack. The government just has to wiretap you via your ISP rather than your VM host.

                A crappy home server can be almost-free and little extra work, but doing it well involves a bit more dedication than using a VPS.

              2. 5

                Except that it ain’t. You have to pay for the hardware, UPS, electricity and a business-grade Internet connection (residential connection can be fine most of the time, but not always, for various reasons) and then you have to spend your free time on monitoring, upgrades and the overall upkeep, which can be a lot or a little depending on how skilled you are. I mean, I’ve considered it myself many times and I’ve done it in high school for self-education, but I got a life in the meantime, so it’s not feasible for me right now.

                1. 2

                  You have to pay for the hardware, UPS,

                  Given you’re on Lobsters, there’s a likelihood approaching 100% that you already have the spare hardware. UPS is relatively inexpensive.

                  electricity and a business-grade Internet connection

                  The incremental cost of the electricity on top of what you already pay is nearly 0%. You don’t need to use a business-grade connection.

                  then you have to spend your free time on monitoring, upgrades and the overall upkeep,

                  You would have to do this with a VPS anyway.

                  but I got a life in the meantime, so it’s not feasible for me right now.

                  You already host personal internet services on a VPS on Hetzner. With all due respect, using this argument to justify not hosting at home isn’t very convincing.

                  1. 2

                    Maintaining a couple of VMs on a hypervisor and a network managed by an enterprise-grade company is vastly easier than doing everything yourself, especially if you’re starting from scratch.

                    Also, self-hosting at home isn’t zero latency because I have to leave the house every now and then and I also travel to far away countries for business and pleasure. I’m not against self-hosting at home per se, but as with everything else in life - you gain some, you lose some.

                    1. 1

                      Maintaining a couple of VMs on a hypervisor and a [virtual] network managed

                      Admittedly this is extra work on top of maintaining a VPS itself but not very much more on a relative basis.

                      self-hosting at home isn’t zero latency because I have to leave the house every now and then and I also travel to far away countries for business and pleasure.

                      This would be the case even if you hosted it in a VPS in a data center near you. If you’re optimizing for latency to your home, the optimal case is on your home network.

                      1. 2

                        All the valid points, but those address just the technical/operational side of the equation. I also happen to live in a country that’s not really “democratic” by today’s standards, so even if I self host at home, set up a network and split horizon DNS, that still doesn’t make me protected from a good old search warrant.

                        Not that I’m doing anything that would get any government interested, but it’s about principles (we’re on Lobsters after all). I don’t want to just surrender my private info to just about anyone, let alone a government. The prospect of my keeping my stuff in Switzerland sounds the best course of action the more I think of it. Some fun weekend projects are on the horizon for me it looks like. Nice thing to happen in winter with less than optimal weather.

                        1. 2

                          I also happen to live in a country that’s not really “democratic” by today’s standards

                          The prospect of my keeping my stuff in Switzerland sounds the best course of action the more I think of it.

                          These sensitivities are more on topic. If you expect to be specifically and forcefully targeted by a government it may be wiser to host in a country with a high respect for privacy. Since you don’t, then the only security advantage of self-hosting is to avoid casual mass surveillance (e.g. like what happened in the article). Hosting at home seems to be more resilient against that.

                          (not arguing what you should do personally, please use your free time in a way that suits you best)

              3. 4

                You’re just as much at risk of lawful interception if you host at home as anywhere else, assuming you engage in activity that’s valid for such an order to be given.

                1. 2

                  Logistically it’s harder to do and easier to verify against. You can physically isolate your server and check for vulnerabilities. You can’t do the same with a VPS with the same degree of confidence.

                  1. 5

                    Lawful interception targeting a system you host at home is far more likely to begin with agents of your government knocking at your door with a court order.

                    Maybe you consider this good because you know it’s happening. Maybe they’re going to seize every electronic device you own for forensic imaging while continuing to host your server themselves to capture more data from your users. It’s certainly a different threat model.

                    1. 1

                      That wasn’t the scenario being discussed but even if it were, logistically that is much harder for a government to do (for many reasons) than for them to do it silently with the cooperation of your VPS host.

        2. 5

          We arrived at Hetzner by the same path!

          I reached out to Hetzner asking whether they’ll comment on the matter. Maybe you can do the same?

          https://www.hetzner.com/support-form

          It recently came to light (https://notes.valdikss.org.ru/jabber.ru-mitm/) that Hetzner may have been compelled to participate in a man-in-the-middle operation on one of its customers. 
          
          Should customers expect a response in this matter?
          
          Cheers
          

          With that said, and like others have mentioned, if state-supported coercion of providers is part of your threat model, I think you’ll be better off self-hosting. IMO complying with the laws of one’s jurisdiction is not a malicious act, but a necessity to operate. Your may have better luck in Switzerland, but you also may be foregoing some technical features and availability, which may be more important to you/your users than mitigating the off chance your communications are scooped up in a MITM operation.

          1. 1

            Yeah, I did ping them, but so far no response from them. Pretty much everything suggests this was a gag order by a government entity.

            1. 3

              This was their response

              Dear Sir or Madam,                                                                                                                                                                            
                                                                                                                                                                                                            
              thank you for your request.                                                                                                                                                                   
                                                                                                                                                                                                            
              We can assure you that there has been no security incident at our company. There is no risk to data security, neither for you nor for our other customers. We take the fulfillment of our cont
              ractual and legal obligations very seriously.                                                                                                                                                 
                                                                                                                                                                                                            
              Mit freundlichen Grüßen / Kind regards                                                                                                                                                        
                                                                                                                                                                                                            
              Legal Team      
              

              I interpret this to mean that they were legally compelled to MITM the customer.

    15. 20

      I think there’s one argument that’s missing: E2EE is easy. The algorithms for doing it were published decades ago. There is open-source code for doing it. If you ban it, that won’t stop bad people from using it. You might be able to use traffic analysis to find people that are using it, but it’s also fairly easy to combine with steganography. You’ll just see apps that take a phrase, encrypt it into another innocuous-looking phrase, and then paste it into WhatsApp.

      Trying to ban E2EE is like trying to ban sharp sticks. Even if you stop everyone selling them, in spite of the fact that they’re really useful for a bunch of things, it’s very easy for anyone to sharpen their own stick or set up a black-market supply of sharp sticks.

      1. 12

        A tension I have noticed in my thinking is that I do favor strong gun regulation, but I don’t favor any kind of E2EE regulation. The argument against me is why say “criminals will just work around regulations” in one case but not the other?

        I think one answer is that while guns are good for hunting and sport shooting, in all other situations, the world is net better without guns. If I’m at a bar and someone steps on my toe and I get mad at them, the world is better if neither of us has a gun. If a criminal robs a 7-11 with a knife and the owner guards the store with a machete, that is a better world than one where they shoot each other. E2EE isn’t like that. Average people need E2EE to keep their ordinary, non-criminal conversations safe. There are many things that you discuss that aren’t criminal, but need to be kept away from others, particularly others who want to steal from you or blackmail you.

        Another answer is that we have no choice but to trust the government to use guns responsibly. A state just is a monopoly on the legitimate use of violence. They have to be armed for the whole thing to work. Obviously, it can and does go wrong sometimes, but that’s why you have other checks against government overstepping. (Parenthetically, my bugbear is that the actual meaning of the second amendment is totally erased from contemporary discussion. “Keep and bear arms” is unambiguously old fashioned jargon for “join and serve in an armed force.” This is not a debate. Any honest look at old legal texts reveals this to be the working definition of KABA at the time of the amendment. The right to KABA is the right to join and serve in your state militia, not the right to carry a six shooter to Wal-Mart. Parenthetical rant over.) If you subvert E2EE on the other hand by giving the government a backdoor, there is just no chance it won’t be compromised. Worse, the nature of computers is you can tell if someone in government is misusing a gun, but there is often no way to know that someone in the government is spying on their girlfriend, selling corporate secrets, engaging in stock manipulation, etc. etc. Government can use their monopoly on violence to learn about secret communications when it needs to. It does not need a blanket ability to just poke its head into conversations on the off chance that maybe there’s something juicy in them.

        Here is a proposed compromise legislation for the UK: E2EE is fine, but E2EE apps have to retain records for X period of time, with no ability to erase records sooner than X. That way if you catch a suspected criminal, you can arrest them and get them to unlock their phone to look at their records for the X period, but you can’t eavesdrop without getting the person to unlock it. What do you think?

        1. 8

          The argument against me is why say “criminals will just work around regulations” in one case but not the other?

          I think, for me, it’s a matter of three things:

          First, crating a gun is hard. Anything that fires more than one short with less than a 50% chance of taking your hand off requires some specialised equipment. Creating a system with end to end encryption requires a piece of paper and a pencil (one-time pad encryption is easy to do by hand).

          Second, there are many ways that using a gun directly harms other people. Shoot someone and they will be unhappy. Encrypt a message and no one is harmed.

          Third, enforcement at point of use is easier. If you wave a gun around in public, it’s obvious. If you use strong encryption in public, it’s almost indistinguishable from using weaker encryption or, in some cases, hard to tell apart from unencrypted communication unless you’re doing invasive snooping.

        2. 4

          Something that worries me about the app that retains records is that it could be abused by criminals in the opposite direction.

          If a criminal threatens me into unlocking my device, they can now review my message history and use them for blackmail or to steal private credentials etc.

          Admittedly, the “a criminal threatens me into unlocking my device” attack already has far reaching negative consequences, so maybe this doesn’t really make anything particularly worse.

          1. 2

            Good objection.

        3. 2

          but E2EE apps have to retain records for X period of time

          So criminals will use an app with that retention feature compiled out. What are you going to do, sue them?

          1. 2

            That’s fine. Yes, they can pay a penalty for using an illegal app and look bad in front a jury when they have to explain why they were using it.

            1. 1

              How do you tell the difference? You’re relying on computer forensics experts (who are already totally overwhelmed with their current workload, to the extent that cases are not being prosecuted because there are insufficient forensics experts) being tell the difference. If I were doing this, I’d have a separate application that monitored and deleted messages with a particular keyword, which erased itself if I didn’t explicitly tell it to run every day. By the time the phone has made it to analysis, all traces are gone.

        4. 2

          A tension I have noticed in my thinking is that I do favor strong gun regulation, but I don’t favor any kind of E2EE regulation. The argument against me is why say “criminals will just work around regulations” in one case but not the other?

          And on the flipside, https://xkcd.com/504/ . In any case, gun rights vs encryption rights are separate political issues, even if some of the arguments in favor of one are applicable to the other, and there are constituencies of people who are interested in both.

          Worse, the nature of computers is you can tell if someone in government is misusing a gun, but there is often no way to know that someone in the government is spying on their girlfriend, selling corporate secrets, engaging in stock manipulation, etc. etc.

          To the extent that these things are bad (and e.g. I’m not convinced that a lot of what counts as stock manipulation should actually be illegal), I don’t really care if the people doing them are government employees or people who do some other sort of job. I’m fine if everyone has legal access to the same encryption technologies, even if they work in government.

          Here is a proposed compromise legislation for the UK: E2EE is fine, but E2EE apps have to retain records for X period of time, with no ability to erase records sooner than X. That way if you catch a suspected criminal, you can arrest them and get them to unlock their phone to look at their records for the X period, but you can’t eavesdrop without getting the person to unlock it. What do you think?

          I’d happily write E2EE apps that delete records immediately and publish them in the UK without regard for that proposed UK law, and support people in the UK who violate the law in order to use this software.

        5. 1

          No. That’s nonsense.

          The uses of a gun are extremely restricted, and not a requiring for basic society to function. Guns actively and directly harm others as that is the design goal. The sole purpose of a handgun for instance is to kill people.

          Private communication is an absolutely fundamental right. Being able to have private communication does not harm someone: someone robbing a bank may discuss the plan to do so securely, but they can’t actually rob the bank with encryption - they’ll likely use a gun. See above, the purpose of guns is to kill people.

          More over gun registration does not break guns, nor does it undermine your ability to have a gun, it just means some agency is aware you have a gun. Based on my experience gun owners are perfectly happy putting pro-gun stickers all over their vehicles and talking about having guns in public. There is literally no harm to gun registration.

          On the other hand your only option for regulation of cryptography is to break it. There is no other option.

          1. 1

            Yes, I think the specifics make a difference in principles.

      2. 2

        E2EE is easy.

        Oh my gosh, I could not possibly disagree more!

        The myriad of APIs you have to use, in just the right order, with just the right parameters… and if you get just one thing wrong, your encryption is potentially completely broken!

        E2EE should be easy (at the programming level), but in my humble opinion – it’s not easy, not at all.

        1. 3

          Maybe I should clarify: building a scalable and usable end to end encrypted messaging system is hard. Building a thing to send end to end encrypted messages to a set of known peers is easy.

          If you just want to send text, you can do it by hand with a pen and paper if you print off a few pages of random characters (this must be cryptographically secure random, but most operating systems provide something like Fortuna fed with true entropy sources, so that’s easy). Once you have a page of random text, hand a copy to both parties. When you want to encrypt your message, take the letter number (0-25 for the alphabet, you probably want digits, spaces, and at least a couple of punctuation characters.) for each letter in your message, add it to the next one on the page, and the type the result into an insecure messaging system. At the receiving end, use the same subtraction. This is not breakable. It leaks the message length, but if you’re using SMS as your transport then you can just pad each message to 160 characters by copying the next characters from your one time pad (for other transports, pick a message length).

          If people are going to be in the same room periodically, you can just pre-share a pile of random data to use as one-time pads. That’s feasible for a lot of criminal activities, where you want to ensure that the recipient of your messages is a person that you’ve met.

          With those requirements, you wouldn’t actually use a one-time pad, you’d do offline key exchange. The sealed box API in libsodium that @Corbin pointed to encrypts each message with XSalsa20-Poly1305 (a symmetric cypher) with a random key and then encrypts the key with X25519 (an elliptic curve asymmetric cypher) so that it can be decrypted by whoever holds the private key corresponding to the public key that it used. Wrapping that in a tool that shares keys with QR codes or as English pass phrases and gives you encrypted text that can be sent via an insecure channel is under a hundred lines of code.

          If you want people to be able to do E2EE messaging with people that they haven’t met, you need a bit more work to do key exchange. The simplest thing to do is skip this. Trust keys received over an insecure channel and then provide a mechanism for validating them via some out of band channel. If an attacker is not actively tampering with traffic, this is fine. If someone sends the same key over SMS and email, it’s probably safe unless the NSA or equivalent is actively targeting you. This is what Signal does. You can also build on network effects. If you have two mutual friends and they both pass on the same key, it’s probably safe.

          But if you just want a simple tool for bad people to use to exchange messages that law enforcement can’t read, it’s a trivial amount of code.

        2. 1

          Which libraries have you tried? My typical recommendation is libsodium, which has an “easy” sealed-box API that is suitable for end-to-end delivery of messages.

          1. 3

            Unfortunately that sealed box doesn’t have nice modern properties such as forward secrecy or key compromise impersonation resistance. To have that you need to reach for something like Noise, and when you implement it you quickly realise that key exchange protocols are no picnic (and Noise is one of the simplest!).

    16. 8

      I understand the issue for the specific lawsuit mentioned. But I didn’t follow why there was a preexisting preference for GPL to be a license and not a contract.

      Anyone know what the perceived benefit was for arguing that GPL is a license?

      1. 8

        My third hand and long after the fact impression is that the idea was that while a contract restricts the users rights, a license only grants more rights. So by arguing that it was a license they were trying to argue “this is strictly beneficial (more liberal in user rights) compared to proprietary software”.

        1. 13

          I believe the key thing in the USA is that a contract must provide something to both parties. This is why a bunch of contracts have the weird nominal dollar thing.

          The GPL does not provide anything to the copyright owner. The recipient of the GPL’d work receives the right to use the work. Customers of the recipient receive a set of rights. The author receives nothing as a direct result of the GPL (they may subsequently get code back if someone downstream happens to share it, but the GPL does not require this).

          It’s quite surprising to see the defendant arguing that this should be treated as copyright infringement because the statutory penalties are much higher in that case, especially with the precedent that the RIAA set that each copy distributed counts as a separate incident and triggers the punitive damages again.

          1. 6

            This is why a bunch of contracts have the weird nominal dollar thing.

            In legal terms, this is called a peppercorn.

          2. 3

            It’s quite surprising to see the defendant arguing that this should be treated as copyright infringement because the statutory penalties are much higher in that case, especially with the precedent that the RIAA set that each copy distributed counts as a separate incident and triggers the punitive damages again.

            My suspicion (don’t quote me on this) is that a copyright claim would have to go through federal court, which lacks California’s rule allowing the SFC to sue as a third-party beneficiary.

            1. 3

              Yes the OP mention it…

      2. 4

        Anyone know what the perceived benefit was for arguing that GPL is a license?

        It has to do with standing: This is a contract-court.

        Vizio wants to argue it is a license (and so it deals with e.g. copyright infringement) in the legal sense of the word so that this court cannot hear the case.

        1. 1

          So why did SFC make this a state contract suit rather than a federal copyright suit in the first place? Federal judiciary too right wing?

          1. 10

            As mentioned, in a license law, the SFC has no standing as they are a third party. A concept that is not accepted in license court, as the SFC does not represent the copyright owner

            1. 3

              Conservancy can represent several of the relevant copyright holders, but they cannot use a copyright case to ask for source code, nor wouud that get them a precedent that all other USA software users also have standing to sue.

            2. 1

              As mentioned where?

              1. 11

                In the OP….

                1. -2

                  You are mistaken.

                  1. 8

                    From the OP:

                    Vizio argued that the lawsuit is really a copyright infringement lawsuit, and therefore belongs in federal court, not state court. Painting Conservancy’s legal claim as really about copyright could also help them avoid the whole issue of third-party beneficiaries, a contracts-law concept. So naturally, Vizio’s lawyers went online and dug up a bunch of places where Free Software people, including FSF and Conservancy people, wrote that the GPLs are licenses, not contracts, and that only copyright holders can enforce them

                    There are actually multiple relevant paragraphs, but this seems the most explicit about the issue

                    1. 2

                      Yeah, and the closest thing is this:

                      Painting Conservancy’s legal claim as really about copyright could also help them avoid the whole issue of third-party beneficiaries, a contracts-law concept.

                      which requires a leap of inference to get to “SFC has no standing under license law.” It seems plausible, but it was certainly not mentioned in the article. Are we supposed to know that the contracts-law “third-party beneficiary” concept is the only legal device that could give the SFC standing to sue?

                      Would it really be that hard for the SFC to find a Linux/bash/glibc contributor to sign on to the suit, if that’s even necessary?

                      1. 4

                        which requires a leap of inference to get to “SFC has no standing under license law.” It seems plausible, but it was certainly not mentioned in the article. Are we supposed to know that the contracts-law “third-party beneficiary” concept is the only legal device that could give the SFC standing to sue?

                        Again, the article says that, it goes into great detail on the matter. It does not have a single sentence saying “by making this a copyright license case, Vizio is forcing this into federal courts where there is no concept of a third party beneficiary, and so SFC would have no legal standing to bring the case and it would be dismissed.” It does have numerous paragraphs that together make this point. e.g. one paragraph details how contract vs license suits are preempted by federal law, one details how third-party standing differs between contracts and licenses in CA vs federal law, one explains how everything combines to completely remove the SFCs right to sue.

                        Would it really be that hard for the SFC to find a Linux/bash/glibc contributor to sign on to the suit, if that’s even necessary?

                        In principle anyone could have done that already. The whole point is SFC wants to do this unilaterally without working with actual authors because that’s more expensive. My not at all a lawyer take is the issue is this:

                        If SFC has no standing all they can do is provide lawyers. But lawsuits take time and money for the non-lawyer person as well: you have to attend depositions, for which you may have to travel, that can take 10s of hours. So you not only need someone who’s copyright was violated, but they have to have the time and money to be able to handle the case workload. These cases are generally about forcing source code to be released, not extracting monetary damages, so the end result is you are fundamentally out of pocket (you can get compensated for say hotel time, but not for vacation time, etc). e.g. the end result may still be the contributor being out of pocket.

                        The other side is developers working for companies that contribute to open source (e.g. the ones that might be able to afford the time) are likely not the copyright owners - e.g. for the last 15 or 20 years all my open source code belongs to large corporations, so for that code I would not have standing either. So you actually need to convince the company to be part of the suit not the individual developers there.

                        1. 1

                          You don’t think the discussion is worth continuing?

                        2. -2

                          Again, the article says that, it goes into great detail on the matter.

                          I now realize you are not the original person I asked about this, so why are you saying “again”? Did you already say the article says SFC has no standing under license law? I can address the rest of your comment but want to make sure I’m not having an aneurysm. Whats with the “again”?

      3. 3

        Wild guess: MIT and BSD licenses already existed, so it’s less of an adoption hurdle to think about GPL as a license.

        Otherwise users would be confused about putting contracts on their code.

        TBH I don’t think this article was very well written - it doesn’t give references or definitions

    17. 7

      I was originally going to be glib and go “oh it’s an always broadcast” network in my dismissal. I was even going to overlook the abysmal performance (the goal is 1gbit but it currently maxes out at 20mbit).

      But this is built on aes128 in cbc mode which is just not reasonable today. There is little to no reason to use CBC in any new protocol due to the myriad footguns of the schema, and a new protocol using AES128 is unconscionably weak.

    18. 13

      Git is hard because it’s a terrible version control system. We keep using it because people think it’s reasonable to write blog posts putting the burden of understanding entirely on the end-user. It’s unreasonable to have a bullet list of things you need to understand about it where the first thing is:

      A commit is its entire worldline

      Seriously just realize you have Stockholm syndrome and stop victim blaming other people who haven’t developed it yet.

      Worldline. Jesus christ.

      1. 10

        I believe that we use git precisely because it is terrible. There is a process (which someone told me a couple of years back had an actual name, but I’ve forgotten it) where systems reinforce the weakest link. If you build a tool that is useful, people will use it. If you build a tool that is almost useful, people will build things around it to strengthen it. Things like GitHub were able to thrive because using git without additional tooling was so painful. There are a lot of git GUIs (I am a particular fan of gitui, since it means I don’t need to leave my terminal) that wouldn’t exist if people didn’t have such a strong desire to avoid the git CLI. Subversion GUIs were always an afterthought and most people just went to the command line. If you said you used a GUI for svn, people would be surprised. If you say you use one for git, people will want to discuss it and see if it has any features that make it better than the one they use.

        Alan Kay said that evolution in computing is a process of building a stack of abstractions and then squashing the lower-level ones into something simpler. I believe that the lack of progress in computing is because we are far better at the first step than the second. Something that replaces git could do so by looking at the abstractions that people build to avoid dealing with git and building a tool for that, but it will be competing with dozens of things layered atop git.

        1. 4

          Around the time git was started I was still avoiding subversion because it had gone through a lot of churn and instability in its repository formats (including data loss). Svn was just about settling down and becoming something I might trust as a repo admin. I was using it casually for contributing to Apache SpamAssasin and FreeBSD, and the main impression I got was that svn was incredibly slow, much slower than CVS. I think this was because the svn protocol was (is?) ridiculously chatty, so if you are working over a long latency link then it sucks. And it still lacked realistic support for merges. So I continued to use CVS when I had to host repositories.

          Then along comes git, and it’s much faster, it supports merges, it has better tools for wrangling my working copy. I was not an early adopter, but it made sense for me to continue using cvs a few more years and skip svn entirely.

          At the time I was also watching bzr and hg, and it wasn’t entirely clear which would be worth adopting. I remember the discussion about bzr’s rich patch representation, versus git’s approach of working out what was a rename (etc) after the fact - I was persuaded by Linus’s arguments. Also bzr’s lack of stable repo format was not great. Mercurial had a well-documented format, but it was file-based (like cvs) and it was unclear to me that it could handle renames well. And hg’s branching model seemed rigid and awkward compared to git.

          So from my point of view, git was terrible, but it was much better than the alternatives.

          1. 4

            I stayed with svn for a long time because one of my collaborators did some fantastic work in a local branch with svk and then lost his laptop. That put me off the idea of distributed revision control for a long time. I wanted people to push things to public branches as quickly as possible and not lose work because their laptop or VM broke.

            FreeBSD was very late to adopt git (though a fairly early adopter of svn). The faster download speed was a big selling point. I could do a git clone of the ports or source repo (with all history) with git in less time than updating from a week-old checkout with subversion. No idea how they managed to make the protocol so slow. I think it needed multiple round trips to request each revision, rather than just telling the server ‘give me everything from revision x’, which is very silly given that there’s basically no client state with subversion.

        2. 3

          I don’t think that can be the whole story. It’s much better than what came before (cvs, svn) for most people AND nothing has yet been created that’s clearly better for a great many people.

        3. 1

          If you said you used a GUI for svn, people would be surprised. If you say you use one for git, people will want to discuss it and see if it has any features that make it better than the one they use.

          I’m surprised to hear this and I wonder how much the state of Git GUIs has changed since circa 2014, when the messaging I would hear about Git GUIs was like “Please, please don’t use them! They’re (even more) confusing, they’ll hinder you in learning Git, and they’re especially bad because, when you need to ask for help with using Git, it will be far more difficult for us, the Internet, to help you than if you used the CLI.”

          1. 1

            I was using GitX from about 2008ish, maybe a bit earlier, and the advice then was ‘there are some complex things that you can’t do in the GUI, it shouldn’t be your only tool’, but doing commits of part of a file without a GUI is incredibly painful and that was one of the big selling points of git (if you have some small bug fixes done alongside a new feature you can commit them separately and merge them into the main branch without merging the whole feature).

      2. 5

        For the CS-inclined you can just say “path to the commit from the root of the tree” (or equivalently, “the parent matters”) instead which I think captures the idea without talking about alternate worlds or parallel universes or whatever. We’re professionals, it should take skill to use our tools.

        1. 6

          We’re professionals, it should take skill to use our tools.

          Things should only require skill to use if it isn’t possible to engineer a high skill requirement out of it. If the tooling has a high skill requirement for no reason beyond access gating then it is definitionally a poor tool that should be replaced. It isn’t acceptable for a tool to require significant training simply because the users are expected to be highly skilled in some other domain.

          This honestly feels like when people complain about tools being simplified because it means more people who aren’t “skilled” are using the tool. It also implies that people are only skilled if their skill is in specific fields.

          1. 1

            People say that, but then they can’t point to a version control system (of the many that exist over the decades they’ve existed) that solves all the problems git does with an elegance that is assumed to be possible. The proof should be easy!

            1. 2

              But proof is easy.

              Mercurial, for example, doesn’t require knowledge of worldlines and branch go-karts. It is an extremely usable tool that doesn’t defy most people’s mental models.

              Look at Sapling’s features like sl absorb. That would take 1 second to teach someone vs whatever it takes to teach interactive rebase, fixup commits, autosquash, etc.

              Better tools exist and have existed for a long time.

              Heck, just look at what people have layered on top of git itself to fix its warts, eg git-branchless. It exposes extremely important workflows missing in git (like a sensible default log and branch rebasing) in a sane and usable cli.

              If you explore other vcs you’ll find a million examples of concrete ways of doing things better.

        2. 3

          We’re professionals, it should take skill to use our tools.

          The industrial revolution was successful precisely because it removed skill from the tooling - instead of having a general-purpose hammer that you whacked hot iron with in a carefully-aimed direction, you had a block of iron that you milled by twiddling knobs to the correct number and pulling the lever. Pulling the lever did not require years of training.

          “It should take skill to use our tools” is an unjustified assertion that we should put this iron collar around our necks and can be trivially dismissed. The save button should not require skill, that’s insane.

          1. 4

            The industrial revolution was successful precisely because it removed skill from the tooling - instead of having a general-purpose hammer that you whacked hot iron with in a carefully-aimed direction, you had a block of iron that you milled by twiddling knobs to the correct number and pulling the lever

            I agree with the intention of this statement and how it relates to Git but it also tells me you don’t have much first-hand experience either as a blacksmith or as a machinist.

          2. 3

            The save button indeed does not require skill to use. Git isn’t a save button though, it’s a system for handling conflicting saves. Otherwise just paste your code into a google doc and let their auto-merge functionality sort it out. A system following your Industrial Revolution example would require knowing what the synthesis of two conflicting programs should be. Maybe AI will be able to do that some day but good luck otherwise.

      3. 5

        Other than being hard to learn for many people, what makes it bad in your opinion?

        1. 2

          Is that not enough?

          1. 7

            No. First of all, if you take the trouble to understand it, it can be a lot easier to use than other systems, and it solves almost everyone’s problems.

            In my experience all dvcses are confusing to some people. Git gets a lot of complaints because it’s what people actually use.

      4. 3

        I think we keep using it because git, for all its faults, solves the problem well enough for many people, whether that’s through learning the idiosyncrasies, or mitigating it’s shortcomings with external interfaces like Github etc.

        For many people, life starts and stops and clone, checkout, push, and pull. If you’ve found a way to make that work for you safely and consistently, I think version control understandably becomes a problem that you might not be so invested in.

        I think the current situation speaks to how these external GUI tools and websites have smoothed over the cracks of the git experience.

    19. 3

      I thought the argument in favour of bun was that it gave you a non v8 node environment?

      Making Node conceptually a runtime with multiple implementations.

      Because let’s be clear the speed and performance of node.js or bun is not because of the application layer. It’s the result of the JS engines neither embedding environment is involved in any of performance sensitive work.