Threads for edk-

  1. 1

    This is pretty good, but I think it could be even better if it used a Gödel-like system to encode the sets.

    E({}) = 1
    # Assume x0, x1... are increasing
    E({x0, x1, x2, ...}) = 2^E(x0) * 3^(E(x1) - E(x0)) * 5^(E(x2) - E(x1) - E(x0)) * ...
    

    All natural numbers correspond to sets, but never fear; it makes up for this advantage by being incredibly inefficient. Most numbers are duplicates, and the encodings of interesting sets are generally much longer than their symbolic representations.

    1. 14

      Really surprised fstrings are faster than string.format. I thought it was just syntactic sugar for that.

      1. 18

        From disassembling both versions, it looks like Python added a new bytecode, FORMAT_VALUE, specifically for f-strings, while format is a plain function call. I’d guess most of the savings is skipping the function call (Python function calls are not fast).

        >>> dis.dis('f"{x} {y}"')
          1           0 LOAD_NAME                0 (x)
                      2 FORMAT_VALUE             0
                      4 LOAD_CONST               0 (' ')
                      6 LOAD_NAME                1 (y)
                      8 FORMAT_VALUE             0
                     10 BUILD_STRING             3
                     12 RETURN_VALUE
        
        >>> dis.dis('"{} {}".format(x, y)')
          1           0 LOAD_CONST               0 ('{} {}')
                      2 LOAD_METHOD              0 (format)
                      4 LOAD_NAME                1 (x)
                      6 LOAD_NAME                2 (y)
                      8 CALL_METHOD              2
                     10 RETURN_VALUE
        

        Edit: Digging a bit more, f-strings were initially implemented as syntactic sugar for a call to format, but that was changed due to semantic concerns rather than speed: If a user redefined the format function, the behavior of f-strings would change, which was considered strange/undesirable. So a new opcode was added to avoid f-string behavior depending on the current binding of format, with a side bonus of being faster. I guess format can’t compile to that new opcode precisely because user code is allowed to redefine it.

        1. 5

          I mucked about with this a lot for a silly optimization in a web framework and concluded that it’s not any specific opcode that makes f-strings faster, it’s fundamental to the design (the design it’s settled on now; no idea about the history). (Although, of course, loading format for each value might be bad enough to make the whole thing consistently slower than other options).

          So it’s not exactly hard to be faster than str.formatstr.__mod__ (or "%s%s" % (foo, bar)) is also often signfiicantly faster. There are two reasons for this: special method lookups are faster than named method lookups, and %-formatting is inherently faster than {}-formatting due to its simpler parser and non-generic architecture. Getting back to f-strings, they can be faster than either because not only is there, as you’ve observed, no method call, there’s almost no (runtime) implementation to call either. Processing the f-string happened at compile time. FORMAT_VALUE takes care of formatting values, with a hardcoded shortcut for values that are already strings, while BUILD_STRING just concatenates a bunch of pieces.

          But although f-strings can be faster than the other two, they aren’t always. They are pretty much always faster than str.format, but %-formatting sometimes beats them; the simplest case that I know where this is true is formatting a bunch of ints:

          $ python3 -m timeit -s 'a, b, c, d = 1, 2, 3, 4' '"%d%d%d%d" % (a, b, c, d)'
          200000 loops, best of 5: 1.59 usec per loop
          $ python3 -m timeit -s 'a, b, c, d = 1, 2, 3, 4' 'f"{a}{b}{c}{d}"'
          200000 loops, best of 5: 1.97 usec per loop
          

          This happens because of that non-generic architecture thing. %-formatting has to parse at runtime, but its parser is very fast. On the other hand, since it just innately knows what ints are, it doesn’t have to ask its operands how to format themselves. This also means it can just print the formatted int out directly into what will become the resulting string, while the f-string version needs to put it in a temporary string of its own to be concatenated by BUILD_STRING later.

          1. 1

            I find it a little surprising that % formatting sometimes wins because it needs an extra memory allocation for the tuple for the arguments.

            1. 3

              It does, but the f-string version needs 4 extra allocations (for “1”, “2”, “3”, “4”).

          2. 2

            Perl probably would have solved a similar problem by having the sugar compile to a call to CORE::GLOBAL::format rather than looking up format in the current scope.

            Then again, Perl would have solved this specific problem by having an op for string formatting. :)

        1. 7

          As long as you’re going to store your TOTP secrets directly on your machine, this feels way “crappier” than just using KeePassXC or similar. Am I missing something?

          1. 6

            It does seem to be missing the point somewhat. TOTP is a protocol for doing crypto with a human in the middle. Putting the human in the middle is there precisely so that the private key can be on a completely different machine to the one that you’re using to log in. The ‘crappy’ authenticator apps (I’ve used the one from F-Droid, which is a fork of the last open-source release of the Google one) manage this for you.

            If you’re willing to store the shared secret on the same machine that you use for logging in, then an attacker who compromises that machine can just exfiltrate it directly. If it’s stored in cyphertext on disk (protected by some key) then you’re not vulnerable to offline attacks if someone steals your computer but you are vulnerable to online attacks. If someone compromises your computer then they can read the secret out of memory and impersonate you.

            With TOTP and a separate device then they need to do online, live attacks: just exfiltrating the one-time password doesn’t help because it’s stops being valid a few seconds later and they need to log into the remote system in that time window.

            You get a small amount of extra security from OTP vs passwords by storing the secret on the same machine. If the attacker is able to compromise your browser then they can leak your password but they can’t leak the secret without a separate sandbox escape that allows them to compromise the credentials manager.

            Both are weaker than using WebAuthn. Now that pretty much all desktop and mobile browsers support WebAuthn with platform authenticators, there’s no excuse for using passwords or TOTP. With a TPM, Windows can protect your credentials with biometrics and require a full OS compromise to gain the ability to fake signatures (but not to . Android and iOS / macOS both provide mechanisms. Android will use a separate hardware store if available, a TrustZone-isolated component if not. iOS and macOS will use the Secure Element, on older Macs there’s an emulator that stores credentials in the Keychain, which is much less good for security, but the best you can do on the older hardware. Not sure what the situation is on Linux / *BSD.

            1. 2

              There is another angle to this which might be of more practical relevance to most people: on some services, enabling 2FA makes it materially harder for support personnel to reset your credentials. I suspect that the vast majority of people are more likely to have their accounts hacked via social engineering than any sort of credential theft, at least if they’re switched-on enough to avoid installing banking trojans. In that light, enabling 2FA is just a way to opt into something like AAA service you don’t have to be an eight-figure customer for, and it doesn’t really matter how secure your TOTP secret is.

              1. 1

                makes it materially harder for support personnel to reset your credentials

                Yep, ask microsoft about that if they think you’re not using that account often enough. Fun support call times.

            2. 3

              I think that really comes down to what exactly you’re trying to secure with TOTP or rather 2FA.

              In the original sense: An attacker has to have one more thing than your password and at best more control than only over your PC (with which you’re logging in). Except I don’t trust my phone more than I can throw it and I do use a password manager with long randomized passwords. And I do use a yubikey as 2FA for my password manager. (with this I have probably now leaked everything important about me)

              So TOTP to me is mostly a pain that protects me against websites that counter brute force attacks with 2FA*. And it makes those websites shut up about setting up TOTP (let’s be honest: “please enter your phone number”).

              And then there is the issue with most people logging into websites also on mobile. So you’re down to one device and the hopes that the per-app isolation of iOS/Android is better than on your PC (which may be true). But that doesn’t help the case that an attacker only requires access to your phone now.

              *Except there are backup-codes which are essentially static passwords. Which also highlights the problem of using only your phone for 2FA, I wouldn’t bet on recovering a google account with 2FA after you lost your phone and backup codes - even if you did send them your passport to verify your youtube-over-18-age / bought a lot of stuff with some form of traceable credit card you can verify yourself with. In fact I had to go though multiple hoops because microsoft found my account usage weird and didn’t trust my TOTP to identify myself.

              1. 2

                I never understood putting 2FA into your primary password manager.

                This makes it a single point of failure. If your password manager is hacked, leaked, or your database is lost/deleted, then you’ve lost everything all at once.

                Maybe I’m wrong on this - but I keep my TOTP stuff in a separate app to my password manager (Aegis, in this case). And backup the encrypted keys/database when I add a new TOTP code (which doesn’t happen very often these days).

                1. 1

                  You’re completely correct here. Some prefer to take the risk in exchange for the convenience.

                  1. 1

                    There is not much convenience in needing to enter a second form, and no added security either, since if your password is already in a password manager, your TOTP code is too.

                    1. 1

                      Well, there is still security against weird unlikely scenarios like “you’ve entered the credentials into a public PC that was keylogged”. (Some would consider the “compromised personal device” scenarios just as unlikely…) And as mentioned in a comment above, recovery policy improvements that happen in some places due to enabling any 2FA. But yeah, TOTP is really unimpressive in terms of what security it can provide.

                2. 1

                  Sometimes you’re happy with a password but someone else has decided that you have to have 2FA.

                  1. 1

                    I meant to suggest that using oathtool was a “crappier” approach to using TOTP than using KeePassXC or similar. Not that using TOTP altogether was crappy.

                    1. 1

                      Right, and if you think that TOTP altogether is crappy, then it’s just fine to use oathtool :)

                1. 6

                  Huh, Linux actually documents supporting null argv[0] as a feature? Interesting.

                  1. 3

                    I assume it is a feature that they would like to remove, but has been kept around to ensure backwards compatibility:

                    On  Linux, argv and envp can be specified as NULL.  In both cases, this
                    has the same effect as specifying the argument as a pointer to  a  list
                    containing  a  single null pointer.  Do not take advantage of this non‐
                    standard and nonportable misfeature!  On many other UNIX systems, spec‐
                    ifying  argv as NULL will result in an error (EFAULT).  Some other UNIX
                    systems treat the envp==NULL case the same as Linux.
                    
                    1. 7

                      I believe this is a red herring. It’s talking about e.g. execv("foo", NULL), which Linux will translate to execv("foo", (char *[]){ NULL }). The latter invocation seems to be correct by all the relevant standards, as far as I can tell, and is sufficient to exploit pkexec.

                      I’ve spent too much of today arguing about this today already, but for my money it’s intuitively reasonable that a list of arbitrary length can be empty. If I’d been designing this API, I’d have taken quite a lot of convincing to impose a minimum size of parameter list. Sure, argv[0] is supposed to be our program name, but since we can’t assume our caller was honest enough to fill it correctly, why are we happy assuming it was filled at all?

                      1. 3

                        Sure, argv[0] is supposed to be our program name, but since we can’t assume our caller was honest enough to fill it correctly

                        Well, keep in mind a single binary can have multiple possible correct values for argv[0]. busybox is one example.

                        I personally always check the argc == 0 case, but I’m persuaded that it would be good for it to be impossible.

                        I think of the “argc must be > 0” requirement as equivalent to “we have a new argument to main/execve called char *progname, and now argv[0] is your first argument instead of argv[1]”.

                        1. 2

                          we can’t assume our caller was honest enough to fill it correctly

                          It’s not that obvious that the caller is actually another program. If you don’t think too hard, this argv[0] thing is “always there” right in the program entry point — so that sure feels like something coming from “the system”!

                          1. 1

                            the “not thinking to hard” about environmental assumptions is why we have had interesting bugs in the past like:

                            1. LD_PRELOAD across suid (SunOS and friends had no problem with this early on, but there might be something in glibc…).
                            2. close(0); close(1); close(2); exec(“suid_thing”); – what happens to open+printf calls.. (all sane envs now /dev/null this for good reason).
                            3. /proc/sys/kernel/core_pattern – several LPEs, ubuntu using crash handler in python for this was precious
                            4. wayland-xkb “what can you do with the descriptor?”
                            5. wayland-ffi +mremap(+x) ..
                            6. recent intel-gpu dmabuf scraping (in fairness, everyone have had them and there are more to come)
                      2. 2

                        I mean it’s certainly useful if you ever need to run something as root, but don’t have the necessary permissions :D

                      1. 4

                        I thought Blockchain stuff was off topic to lobsters?

                        1. 2

                          Oh, I had no idea. My bad!

                          1. 4

                            It’s a good article, I’m just surprised that it stayed up, in the past they remove any cryptocurrency related posts (I don’t care either way).

                            1. 18

                              It’s a decent writeup of implementing stuff and the tradeoffs of the large-scale design of the systems rather than being promotional or about businesses. I thought it was worth an experiment to leave it up. But the comments here are overwhelmingly not related to implementation; mostly we’re talking about marketing, scams, Signal’s cryptocurrency integration, etc. So this article is not producing normal, relevant discussions for the site and future links at a similar distance from implentation will get removed.

                              1. 7

                                Sad to hear that. I wouldn’t have read the article without your comment about its quality, but now that I did I wouldn’t want to see this content gone from lobsters. Edit: It’s the first article for me that actually dives deep into what all this means and by doing so uncovers the bad stuff simply by showing how it works.

                                1. 5

                                  There’s a lot of really wonderful writing on the web that’s just a little too far off-topic to prompt the kind of creative, collaborative discussion about programming that I think is the hard-to-find value of the site.

                                  The site’s core topic is the design and implementation of computer programs. Then lots of things that are a small step away in different directions: how to use various programming tools like editors and debuggers, useful libraries, broad views of strategies, sharing the things we’ve made, retrospectives, long-term trends in software development. Also larger steps away like how to collaborate on a project, how software licensing works, vulnerability news, fun community-building posts like the weekly + weekend threads. Sometimes this steps too far, into things that are clearly off-topic like entrepreneurship, stories of getting and having jobs, rage at poor customer support or business practices (usually at one or more companies in the tech industry), electoral politics. In the middle there’s a bunch of fuzzy borders and guessing where discussion is likely to go.

                                  So while I also got a lot out of this article and thought it might work out, the discussion here constantly vectored away from the core of the site. I’m heartened that it didn’t turn into any fights even as it touched on contentious areas that we’re often strongly divided on, but if this is where broader pieces about cryptocurrency lead, I don’t think I want to keep rolling the dice on them when we’re not getting much discussion that’s unambiguously in the site’s central focus.

                                  1. 5

                                    little too far off-topic

                                    I think this article was on-topic for showing how you actually develop a program in this crypto-system and what hurdles you’ll have to overcome. Also giving you an impression of whether this “framework” may fit your requirements or not.

                                    discussion here constantly vectored away from the core of the site

                                    Yes and no ? There was a lot of general “crypto” discussion, and I certainly don’t think we need many discussions about crypto. On the other side: We already had many hot discussions that are ultimately a question of opinion and belief in certain things (FOSS maintainer, performance and feature obligations for example). And I do think they’re on-topic as long as they’re civilized and not a repetition of the same arguments over and over. There were also some discussions about the technical possibility to overcome the limitations shown in the article.

                                    And if there is a community where I’m actually interested to hear about their thoughts on how much of an actual win this whole crypto dance/industry is, then it’s this one. As I’m more certain to get actual technical details, rather than outsiders with varying money interests.

                                    Edit: But just to clarify, I don’t oppose removing the next one, I just wanted to throw in my 2 cents of why I think at least this post is actually worth to keep here and maybe some others like this too.

                                    1. 4

                                      FWIW I saw this post/conversation as a sort of magnet or pressure release for this topic here. Lots of upvotes and comments, I think that says something. No biggie tho

                                      1. 3

                                        If engagement were a good fitness function we wouldn’t need moderation

                                  2. 6

                                    Thank you for taking on the ambiguous and onerous task of moderation. People are complicated, ever changing and it’s impossible to please. It ain’t easy!

                                  3. 2

                                    This post should be removed.

                              1. 2

                                At least in the Maven world, you don’t need to wait for your dependencies to update their versions of log4j2 (or anything else)—you can add your own dependency on the latest log4j2, and it will always win because it’s earlier in the dependency tree than any of those transitively-depended-on older versions. (Of course this will only work if the new version of log4j2 doesn’t break those packages, but such breakage is quite unlikely given the Java ecosystem’s conservative approach to backward compatibility).

                                The OG Maven approach to dependency versioning was chosen because it makes builds deterministic, depending only on what’s explicitly mentioned in the POM (and transitively in explicitly depended-on POMs, of course). This does have disadvantages, but one of its corollaries is that if you explicitly mention ask for log4j 2.17.1 in your root POM, you will always get it.

                                It would be a shame if people were waiting for random dependencies to update instead of just fixing this themselves. (To be clear, I don’t think adding a dependency just to force it on your other dependencies is a good long-term solution, but it’s clearly better than the alternative here.)

                                1. 6

                                  I guess the fine difference in meaning is lost to me as a non-native speaker. If it arrives (or not) but is discarded immediately, it has not propagated. I wouldn’t even think to ponder if it’s push or pull. Either it wasn’t pushed, wasn’t pulled, or was pushed and ignored or was pulled and ignored.

                                  1. 16

                                    Propagate is used all the time in English without any sort of ‘push’ semantics. I think the assertion that DNS doesn’t “propagate” will get caught up in quibbles over various people’s understandings of the word, whether they’re native speakers or not. A better title might avoid the confusion by talking about push vs pull explicitly.

                                    1. 5

                                      Propagate can be used to any kind of information transfer, with push, pull, or other semantics.

                                      1. 4

                                        I said above that to me it suggested some kind of push mechanism, but I guess what’s confusing about it is not so much push vs. pull, but the implication that anything necessarily gets transferred at all – propagate definitely implies motion, whereas what’s really happening is cache entries are expiring. “Propagating” suggests that it’s something that new (not already cached) entries have to do too, but they don’t, because that’s not what’s happening. You could probably argue that there’s something more abstract that is “propagating” here, but the point is that the term generates the wrong intuitions.

                                    2. 10

                                      As a native speaker, it is a bit ambiguous, but “propagate” definitely feels more like it implies push, and I agree with the author that I think it’s misleading. In any case, this is definitely something I’ve seen people confused about a lot, trips people up a lot, and having a correct mental model is really useful.

                                      1. 3

                                        Not a native speaker and I see where the push/pull similarity comes in. However I think the main expectation for me for “propagate” is that the name would end up on other servers (whether they pull for updates, or it gets pushed), even without me asking for the resolution. If I have to trigger the process by asking -> no propagation.

                                        1. 2

                                          Yeah, this is a bit more precise; see my other comment:

                                          https://lobste.rs/s/p80qly/dns_doesn_t_propagate#c_bpwbbd

                                      2. 5

                                        There are lots of native speakers who would also disagree with this post. Propagation is not a “wrong” word, and while it’s true that a lot of people are confused about how DNS works, a quick terminology hack isn’t going to do much about that.

                                        1. 2

                                          I guess for me the term “propagate” doesn’t say anything about pull vs push, but does have connotations of filtering through multiple layers. DNS doesn’t really do this either—layered caches only really exist within individual networks, and anyway DNS cache expiration is absolute, occurring simultaneously at every level. So I’ve often considered the term to be misleading because it seems to imply some sort of gradual spread, when the reality is more point-to-point.

                                        1. 3

                                          Silly or buggy? Setting aside the integer overflow thing, this doesn’t really seem unreasonable to me; it’s just that actual usage evolved in a different direction.

                                          Well, most actual usage. At a previous job (in 2016 or so, iirc) we had more than one customer ask us to implement parsing for the classful format, so someone somewhere must have a use for it, though I didn’t manage to find out what.

                                          1. 8

                                            I’ve been looking for articles like this. It’s a good article but only covers the outages from one angle. I’d love to see a writeup on what products were affected and why. For instance a bunch of coffeeshops stopped being able to take credit cards because they were using old iPad-based POS terminals that couldn’t handle the certificate change. Things like that broke all over the world, would love to see an analysis.

                                            1. 7

                                              My smart TV (purchased in 2020!) no longer connects to my Plex server because of this.

                                              All it requires is a firmware update that includes the new root certificate from Let’s Encrypt, but we all know how companies are once they’ve got your money.

                                              1. 1

                                                It looks like Plex is serving the cross-signed chain. If they can fix that I’d guess that at least some of those devices will work again. I don’t have a device that’s affected, but I’ve tried modifying the chain in my Plex server, and it seems to work and not make anything worse.

                                                If you want to try that I’m happy to go into what I did. But the real solution would be for Plex to make that change themselves…

                                                1. 1

                                                  It’s a bit late, have already switched over to Jellyfin because Plex have made it clear they’re not going to do anything to improve the situation.

                                            1. 4

                                              Maybe I’m just missing it, but this writeup seems to gloss over the main event, at least as viewed through my filter bubble.

                                              The problem comes back to a similar concern around the client being outdated in some way and not having the new ISRG Root X1 installed, meaning it can no longer validate certificate chains as it has no Root CA to anchor on.

                                              The outdated clients (some not outdated by very much; GnuTLS was only patched in June 2020) in many cases did have ISRG Root X1 installed, but ignored it because they preferred the cross-signed version of ISRG Root X1 sent by the server. Removing the cross-signed root from one’s chain would have been enough to fix anything affected in this way. That’s what I did, figuring more people IRC from CentOS than ancient Androids, and it resolved pretty much everyone’s problems; so far I haven’t heard from anyone who needed the cross-signed cert.

                                              1. 1

                                                Do you have automation for that? AFAICT if I nuke part of my fullchain it’ll just come back in 90 days

                                                1. 1

                                                  We have some custom scripts around it anyway to request all the certs centrally and distribute them, so it was easy (if not very pretty) to hack in some awk to cut out the cross-signed root right before we send the certs out. But if you use dehydrated (which I’d recommend anyway) I think you can use --preferred-chain 'ISRG Root X1' to get the non-cross-signed chain straight from LE.

                                                2. 0

                                                  Yes, that was our (unpleasant) experience as well: updating ca-certificates was not enough, we also had to upgrade openssl and gnutls. In the end we solved the problem by switching to ZeroSSL (we have a ton of older VMs for testing and upgrading them all was not an option). The whole ordeal left quite a bad taste, I doubt we will touch Let’s Encrypt again if we can help it. And their attempt at spinning it as a good thing (“standing on our own feet”) just adds insult to injury.

                                                  1. 14

                                                    The whole ordeal left quite a bad taste, I doubt we will touch Let’s Encrypt again if we can help it.

                                                    I don’t really understand this at all. The bugs were in other software and could just as easily have been triggered by another cert, but you blame LE for it. And I don’t know what they were supposed to do differently. Were you expecting them to somehow have and get away with a perpetual non-expiring root cert?

                                                    1. 4

                                                      I broadly agree—the expiration was manifestly not LE’s fault—but I suspect that if they’d done more testing they might have chosen not to default to the chain with the expired cross-sign. It broke more things than anyone expected.

                                                      But… I didn’t test it either. Apparently hardly anyone did. And given it’s a public benefit doing this for free, I don’t particularly feel that they owed me the testing I couldn’t be bothered to do.

                                                      1. 6

                                                        I guess I just see this as the rehearsal run for the expirations of older and longer-lived certs from “traditional” root CAs. We were all going to have to deal with it sooner or later, and some of the bugs and faulty assumptions turned up by this one have been kind of scary and I think it’s good to be exposing them.

                                                        1. 1

                                                          I mean, it seems easier to me to get vendors to push a new root CA than figure out the exact mix of cross-signing rules that won’t peeve off a diverse set of implementations to me, anyways.

                                                          I’m tempted to say “fuck it, just ship a 20 year root certificate and we’ll replace it with all the other certs come 2038, and only sign 1 month certificates in case we need to revoke it”, but I suppose that isn’t security, isn’t it?

                                                      2. 1

                                                        The bugs were in other software and could just as easily have been triggered by another cert, but you blame LE for it.

                                                        Yes, but when a non-trivial portion of the web depends on your service, saying that it’s not our bug and therefore we are going to go ahead and break things is not a good strategy, IMO.

                                                        And I don’t know what they were supposed to do differently.

                                                        I am not an expert (especially when it comes to cross-signing, alternative paths, etc) and I could very well be wrong here (in which case please correct me) but from their initial announcement my take is that they could have looked for a new well-trusted root (probably by going to one of the older ones and perhaps paying them some non-trivial amount of money) but they decided not to.

                                                        And speaking of predictions, I did expect this to happen, I just didn’t expect it to be this bad: it’s one thing to update ca-certificates (which, at least on Debian, you can just copy from any newer version and it will install fine on any older) and another upgrading foundational libraries like libssl and libgnutls (for example, there are no fixed versions for older Debian releases).

                                                  1. 7

                                                    It doesn’t sound like they’ve fixed the complaints articulated here.

                                                    That’s enough to make me prefer other options when they’re available to me.

                                                    1. 12

                                                      No one likes breaking changes either. Damned if you do, damned if you don’t.

                                                      1. 4

                                                        Damned if you do, damned if you don’t.

                                                        I agree. And to strain the reference: So dammit, I will use something else.

                                                        1. 6

                                                          All JSON is valid YAML. Just use JSON.

                                                          1. 4

                                                            They don’t really target similar uses though? I don’t love YAML, but JSON is not really meant for human authoring, especially when large or repetitive

                                                            1. 5

                                                              TOML is really the best of both worlds IMO. Easy to read/write, hard to screw up.

                                                              I’d also say that there’s really no human serialization language that handles repetition well. YAML has its anchors and stuff but that’s footgun city. HCL has for_each which is good but also has a steep learning curve. Writing real code and dumping to something else is my preferred method if HCL isn’t an option.

                                                              1. 3

                                                                I don’t mind YAML anchors so much, but if I really want this kind of feature I’m reaching for Dhall for sure

                                                              2. 2

                                                                All JSON is also valid UCL as well. UCL actually supports all of the things I want from a human-friendly configuration language, such as well-defined semantics for loading multiple files with overrides (including deletion), so if you want to avoid complex parsing in your main application then you can have a stand-alone unprivileged process that parses UCL and generates JSON.

                                                                Anything where I might want YAML, I’ve found either JSON or UCL a better choice. JSON is much simpler to parse and has good (small!) high-performance parsers, UCL is more human-friendly. YAML is somewhere between the two.

                                                              3. 3

                                                                One small nit: the YAML spec disallows tabs, while JSON allows them. In practice, I don’t know of any YAML parser implementations that will actually complain, though.

                                                                1. 2

                                                                  I haven’t used a YAML parser that allows tabs in YAML syntax, but appearing inside inline/json syntax they may be more lax

                                                                2. 1

                                                                  This is the only good thing about YAML

                                                            2. 4

                                                              I agree with several of the issues pointed out on that page and it’s sibling pages, but some of the headache experienced is a direct result of libyaml(and thus pyyaml). It still doesn’t properly support YAML 1.2, which defines booleans as true/false. That is still a nasty thing though when wanting to use the literal word true or false, but at least it avoids the n,no,y,yes, and the various case differences. 1.2 only supports true, false.

                                                              libyaml also doesn’t generate errors on duplicate keys, which is incredibly frustrating as well.

                                                              The criticism of implicit typing, tagging, typing, flows, yaml -> language datatype, etc. are all spot on. They are prone to errors and in the latter case make it really easy to introduce security issues.

                                                              1. 2

                                                                That is still a nasty thing though when wanting to use the literal word true or false, but at least it avoids the n,no,y,yes, and the various case differences. 1.2 only supports true, false.

                                                                Are you sure? https://yaml.org/spec/1.2.2/#other-schemas seems to be saying that it’s fine to extend the rules for interpreting untagged nodes in arbitrary ways. That wouldn’t be part of the “core schema”, but then nothing in the spec says parsers have to implement the core schema.

                                                                1. 2

                                                                  Whoops. You are correct. The three described schemas are just recommendations. The lack of a required schema is frustrating.

                                                                  The described recommended schemas define booleans as true, false, which is a change from what versions less than 1.2 had.

                                                                  I’ll say that I personally get frustrated quickly with yaml.

                                                              2. 3

                                                                Any language that supports bare strings will have issues when you also have asciinumeric keywords.

                                                                An issue? Sure, but more a tradeoff than anything else.

                                                                1. 4

                                                                  The problem with the bare string interpretation, as I see it, is not so much the fact that it exists. It’s that it’s not part of the spec proper, but a set of recommended additional tags. What do you think 2001:40:0:0:0:0:0:1 is? Not the number 5603385600000001? The YAML spec, to the extent it has an opinion, actually agrees, but many YAML parsers will interpret it as 5603385600000001 by default, because they implement the optional timestamp type.

                                                                  YAML 1.2 doesn’t recommend https://yaml.org/type/ any more, but it doesn’t disallow it, either. The best part of all this is that there are no strict rules about which types parsers should implement. If you use a bare string anywhere in a YAML document, even one that already has a well-understood meaning, the spec doesn’t guarantee that it will keep its meaning tomorrow.

                                                              1. 18

                                                                Pattern matching has been available in functional programming languages for decades now, it was introduced in the 70s. (Logic programming languages expose even more expressive forms, at higher runtime cost.) It obviously improves readability of code manipulating symbolic expressions/trees, and there is a lot of code like this. I find it surprising that in the 2020s there are still people wondering whether “the feature provides enough value to justify its complexity”.

                                                                (The fact that Python did without for so long was rather a sign of closed-mindedness of its designer subgroup. The same applies, in my opinion, to languages (including Python, Go, etc.) that still don’t have proper support for disjoint union types / variants / sums / sealed case classes.)

                                                                1. 45

                                                                  Pretty much every feature that has ever been added to every language ever is useful in some way. You can leave a comment like this on almost any feature that a language may not want to implement for one reason or the other.

                                                                  1. 14

                                                                    I think it makes more sense in statically typed languages, especially functional ones. That said, languages make different choices. For me, Python has always been about simplicity and readability, and as I’ve tried to show in the article, at least in Python, structural pattern matching is only useful in a relatively few cases. But it’s also a question of taste: I really value the simplicity of the Go language (and C before it), and don’t mind a little bit of verbosity if it makes things clearer and simpler. I did some Scala for a while, and I can see how people like the “power” of it, but the learning curve of its type system was very steep, and there were so many different ways to do things (not to mention the compiler was very slow, partly because of the very complex type system).

                                                                    1. 22

                                                                      For the record, pattern-matching was developed mostly in dynamically-typed languages before being adopted in statically-typed languages, and it works just as well in a dynamically-typed world. (In the ML-family world, sum types and pattern-matching were introduced by Hope, an experimental dynamically-typed language; in the logic world, they are basic constructs of Prolog, which is also dynamically-typed – although some more-typed dialects exist.)

                                                                      as I’ve tried to show in the article, at least in Python, structural pattern matching is only useful in a relatively few cases

                                                                      Out of the 4 cases you describe in the tutorial, I believe your description of two of them is overly advantageous to if..elif:

                                                                      • In the match event.get() case, the example you show is a variation of the original example (the longer of the three such examples in the tutorial), and the change you made makes it easier to write an equivalent if..elif version, because you integrated a case (from another version) that ignores all other Click() events. Without this case (as in the original tutorial example), rewriting with if..elif is harder, you need to duplicate the failure case.
                                                                      • In the eval_expr example, you consider the two versions as readable, but the pattern-version is much easier to maintain. Consider, for example, supporting operations with 4 or 5 parameters, or adding an extra parameter to an existing operator; it’s an easy change with the pattern-matching version, and requires boilerplate-y, non-local transformations with if..elif. These may be uncommon needs for standard mathematical operations, but they are very common when working with other domain-specific languages.
                                                                      1. 1

                                                                        the change you made makes it easier to write an equivalent if..elif version

                                                                        Sorry if it appeared that way – that was certainly not my intention. I’m not quite sure what you mean, though. The first/original event example in the tutorial handles all click events with no filtering using the same code path, so it’s even simpler to convert. I added the Button.LEFT filtering from a subsequent example to give it a bit more interest so it wasn’t quite so simple. I might be missing something, though.

                                                                        In the eval_expr example, you consider the two versions as readable, but the pattern-version is much easier to maintain. Consider, for example, supporting operations with 4 or 5 parameters, or adding an extra parameter to an existing operator;

                                                                        I think those examples are very hypothetical – as you indicate, binary and unary operators aren’t suddenly going to support 4 or 5 parameters. A new operation might, but that’s okay. The only line that’s slightly repetitive is the “attribute unpacking”: w, x, y, z = expr.w, expr.x, expr.y, expr.z.

                                                                        These may be uncommon needs for standard mathematical operations, but they are very common when working with other domain-specific languages.

                                                                        You’re right, and that’s part of my point. Python isn’t used for implementing compilers or interpreters all that often. That’s where I’m coming from when I ask, “does the feature provide enough value to justify the complexity?” If 90% of Python developers will only rarely use this complex feature, does it make sense to add it to the language?

                                                                        1. 3

                                                                          that was certainly not my intention.

                                                                          To be clear, I’m not suggesting that the change was intentional or sneaky, I’m just pointing out that the translation would be more subtle.

                                                                          The first/original event example does not ignore “all other Click events” (there is no Click() case), and therefore an accurate if..elif translation would have to do things differently if there is no position field or if it’s not a pair, namely it would have to fall back to the ValueError case.

                                                                          You’re right, and that’s part of my point. Python isn’t used for implementing compilers or interpreters all that often.

                                                                          You don’t need to implement a compiler for C or Java, or anything people recognize as a programming language (or HTML or CSS, etc.), to be dealing with a domain-specific languages. Many problem domains contain pieces of data that are effectively expressions in some DSL, and recognizing this can very helpful to write programs in those domains – if the language supports the right features to make this convenient. For example:

                                                                          • to start with the obvious, many programs start by interpreting some configuration file to influence their behavior; many programs have simple needs well-served by linear formats, but many programs (eg. cron jobs, etc.) require more elaborate configurations that are DSL-like. Even if the configuration is written in some standard format (INI, Yaml, etc.) – so parsing can be delegated to a library – the programmer will still write code to interpret or analyze the configuration data.
                                                                          • more gnerally, “structured data formats” are often DSL-shaped; ingesting structured data is something we do super-often in programs
                                                                          • programs that offer a “query” capability typically provide a small language to express those queries
                                                                          • events in an event loop typically form a small language
                                                                      2. 14

                                                                        I think it makes more sense in statically typed languages, especially functional ones.

                                                                        In addition to the earlier ones gasche mentioned (it’s important to remember this history), it’s used to pervasively in Erlang, and later Elixir. Clojure has core.match, Racket has match, as does Guile. It’s now in Ruby as well!

                                                                        1. 3

                                                                          Thanks! I didn’t know that. I have used pattern matching in statically typed language (mostly Scala), and had seen it in the likes of Haskell and OCaml, so I’d incorrectly assumed it was mainly a statically-typed language thing.

                                                                          1. 1

                                                                            It is an important feature of OCaml.

                                                                            1. 3

                                                                              I am aware - was focusing on dynamically typed languages.

                                                                          2. 7

                                                                            For me, it is the combination of algebraic data types + pattern matching + compile time exhaustiveness checking that is the real game changer. With just 1 out of 3, pattern matching in Python is much less compelling.

                                                                            1. 1

                                                                              I agree. I wonder if they plan to add exhaustiveness checking to mypy. The way the PEP is so no hold barred makes it seem like the goal was featurefulness and not an attempt to support exhaustiveness checking.

                                                                              1. 2

                                                                                I wonder if they plan to add exhaustiveness checking to mypy.

                                                                                I don’t think that’s possible in the general case. If I understand the PEP correctly, __match_args__ may be a @property getter method, which could read the contents of a file, or perform a network request, etc.

                                                                          3. 11

                                                                            I find it surprising that in the 2020s there are still people wondering whether “the feature provides enough value to justify its complexity”.

                                                                            I find it surprising that people find this surprising.

                                                                            Adding features like pattern matching isn’t trivial, and adding it too hastily can backfire in the long term; especially for an established language like Python. As such I would prefer a language take their time, rather than slapping things on because somebody on the internet said it was a good idea.

                                                                            1. 3

                                                                              That’s always been the Scheme philosophy:

                                                                              Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.

                                                                              And indeed, this pays off: in the Scheme world, there’s been a match package floating around for a long time, implemented simply as a macro. No changes to the core language needed.

                                                                              1. 4

                                                                                No changes to the core language needed.

                                                                                I’m sure you recognize that this situation does not translate to other languages like in this case Python. Implementing it as a macro is just not feasible. And even in Scheme the usage of match macros is rather low. This can be because it is not that useful, but also might be because of the hurdle of adding dependencies is not worth the payoff. Once a feature is integrated in a language, its usage “costs” nothing, thus the value proposition when writing code can be quite different.

                                                                                1. 7

                                                                                  This is rather unrelated to the overall discussion, but as a user of the match macros in Scheme, I must say that I find the lack of integration into the base forms slightly annoying. You cannot pattern-match on a let or lambda, you have to use match-let and match-lambda, define/match (the latter only in Racket I think), etc. This makes reaching for pattern-matching feel heavier, and it may be a partial cause to their comparatively lower usage. ML-family languages generalize all binding positions to accept patterns, which is very nice to decompose records for example (or other single-case data structures). I wish Scheme dialects would embrace this generalization, but they haven’t for now – at least not Racket or Clojure.

                                                                                  1. 2

                                                                                    In the case of Clojure while it doesn’t have pattern matching built-in, it does have quite comprehensive destructuring forms (like nested matching in maps, with rather elaborate mechanisms) that works in all binding positions.

                                                                                    1. 2

                                                                                      Nice! I suppose (from your post above) that pattern-matching is somehow “integrated” in the Clojure implementation, rather than just being part of the base macro layer that all users see.

                                                                                      1. 2

                                                                                        I think the case is that Clojure core special forms support it (I suppose the implementation itself is here and called “binding-forms”, which is then used by let, fn and loop which user defined macros often end up expanding to). Thus it is somewhat under the base layer that people use.

                                                                                        But bear in mind this is destructuring, in a more general manner than what Python 2.x already supported, not pattern matching. It also tends to get messy with deep destructuring, but the same can be said of deep pattern matches through multiple layers of constructors.

                                                                            2. 8

                                                                              I agree about pattern matching and Python in general. It’s depressing how many features have died in python-ideas because it takes more than a few seconds for an established programmer to grok them. Function composition comes to mind.

                                                                              But I think Python might be too complicated for pattern matching. The mechanism they’ve settled on is pretty gnarly. I wrote a thing for pattern matching regexps to see how it’d turn out (admittedly against an early version of the PEP; I haven’t checked it against the current state) and I think the results speak for themselves.

                                                                              1. 6

                                                                                But I think Python might be too complicated for pattern matching. The mechanism they’ve settled on is pretty gnarly.

                                                                                I mostly agree. I generally like pattern matching and have been excited about this feature, but am still feeling out exactly when I’ll use it and how it lines up with my intuition.

                                                                                The part that does feel very Pythonic is that destructuring/unpacking is already pretty pervasive in Python. Not only for basic assignments, but also integrated into control flow constructs. For example, it’s idiomatic to do something like:

                                                                                for key, val in some_dictionary.items():
                                                                                    # ...
                                                                                

                                                                                Rather than:

                                                                                for item in some_dictionary.items():
                                                                                    key, val = item
                                                                                    # ...
                                                                                

                                                                                Or something even worse, like explicit item[0] and item[1]. So the lack of a conditional-with-destructuring, the way we already have foreach-with-destructuring, did seem like a real gap to me, making you have to write the moral equivalent of code that looks more like the 2nd case than the 1st. That hole is now filled by pattern matching. But I agree there are pitfalls around how all these features interact.

                                                                              2. 2
                                                                                for i, (k, v) in enumerate(d.items(), 1): pass
                                                                                

                                                                                looks like pattern matching to me

                                                                                1. 2

                                                                                  Go aims for simplicity of maintenance and deployment. It doesn’t “still don’t have those features”. The Go authors avoided them on purpose. If you want endless abstractions in Go, embedding Lisp is a possibilty: https://github.com/glycerine/zygomys

                                                                                  1. 5

                                                                                    Disjoint sums are a basic programming feature (it models data whose shape is “either this or that or that other thing”, which ubiquitous in the wild just like pairs/records/structs). It is not an “endless abstraction”, and it is perfectly compatible with maintenance and deployment. Go is a nice language in some respects, the runtime is excellent, the tooling is impressive, etc etc. But this is no rational excuse for the lack of some basic language features.

                                                                                    We are in the 2020s, there is no excuse for lacking support for sum types and/or pattern matching. Those features have been available for 30 years, their implementation is well-understood, they require no specific runtime support, and they are useful in basically all problem domains.

                                                                                    I’m not trying to bash a language and attract defensive reactions, but rather to discuss (with concrete examples) the fact that language designer’s mindsets can be influenced by some design cultures more than others, and as a result sometimes the design is held back by a lack of interest for things they are unfamiliar with. Not everyone is fortunate to be working with a deeply knowledgeable and curious language designer, such as Graydon Hoare; we need more such people in our language design teams. The default is for people to keep working on what they know; this sort of closed-ecosystem evolution can lead to beautiful ideas (some bits of Perl 6 for example are very nice!), but it can also hold back.

                                                                                    1. 3

                                                                                      But this is no rational excuse for the lack of some basic language features.

                                                                                      Yes there is. Everyone has a favorite feature, and if all of those are implemented, there would easily be feature bloat, long build times and projects with too many dependencies that depend on too many dependencies, like in C++.

                                                                                      In my opinion, the question is not if a language lacks a feature that someone wants or not, but if it’s usable for goals that people wish to achieve, and Go is clearly suitable for many goals.

                                                                                  2. 3

                                                                                    Ah yes, Python is famously closed-minded and hateful toward useful features. For example, they’d never adopt something like, say, list comprehensions. The language’s leaders are far too closed-minded, and dogmatically unwilling to ever consider superior ideas, to pick up something like that. Same for any sort of ability to work with lazy iterables, or do useful combinatoric work with them. That’s something that definitely will never be adopted into Python due to the closed-mindedness of its leaders. And don’t get me started on basic FP building blocks like map and folds. It’s well known that Guido hates them so much that they’re permanently forbidden from ever being in the language!

                                                                                    (the fact that Python is not Lisp was always unforgivable to many people; the fact that it is not Haskell has now apparently overtaken that on the list of irredeemable sins; yet somehow we Python programmers continue to get useful work done and shrug off the sneers and insults of our self-proclaimed betters much as we always have)

                                                                                    1. 25

                                                                                      It is well-documented that Guido Van Rossum planned to remove lambda from Python 3. (For the record, I agree that map and filter on lists are much less useful in presence of list comprehensions.) It is also well-documented that recursion is severely limited in Python, making many elegant definitions impractical.

                                                                                      Sure, Python adopted (in 2000 I believe?) list comprehensions from ABC (due to Guido working with the language in the 1980s), and a couple of library-definable iterators. I don’t think this contradicts my claim. New ideas came to the language since (generators, decorators), but it remains notable that the language seems to have resisted incorporating strong ideas from other languages. (More so than, say, Ruby, C#, Kotlin, etc.)

                                                                                      Meta: One aspect of your post that I find unpleasant is the tone. You speak of “sneers and insults”, but it is your post that is highly sarcastic and full of stray exagerations at this or that language community. I’m not interested in escalating in this direction.

                                                                                      1. 7

                                                                                        less useful in presence of list comprehension

                                                                                        I’m certainly biased, but I find Python’s list comprehension an abomination towards readability in comparison to higher-order pipelines or recursion. I’ve not personally coded Python in 8-9 years, but when I see examples, I feel like I need to put my head on upsidedown to understand it.

                                                                                        1. 6

                                                                                          It is also well-documented that recursion is severely limited in Python, making many elegant definitions impractical.

                                                                                          For a subjective definition of “elegant”. But this basically is just “Python is not Lisp” (or more specifically, “Python is not Scheme”). And that’s OK. Not every language has to have Scheme’s approach to programming, and Scheme’s history has shown that maybe it’s a good thing for other languages not to be Scheme, since Scheme has been badly held back by its community’s insistence that tail-recursive implementations of algorithms should be the only implementations of those algorithms.

                                                                                          You speak of “sneers and insults”, but it is your post that is highly sarcastic and full of stray exagerations at this or that language community.

                                                                                          Your original comment started from a place of assuming – and there really is no other way to read it! – that the programming patterns you care about are objectively superior to other patterns, that languages which do not adopt those patterns are inherently inferior, and that the only reason why a language would not adopt them is due to “closed-mindedness”. Nowhere in your comment is there room for the (ironically) open-minded possibility that someone else might look at patterns you personally subjectively love, evaluate them rationally, and come to a different conclusion than you did – rather, you assume that people who disagree with your stance must be doing so because of personal faults on their part.

                                                                                          And, well, like I said we’ve got decades of experience of people looking down their noses at Python and/or its core team + community for not becoming a copy of their preferred languages. Your comment really is just another instance of that.

                                                                                          1. 8

                                                                                            I’m not specifically pointing out the lack of tail-call optimization (TCO) in Python (which I think is unfortunate indeed; the main argument is that call stack matters, but it’s technically fully possible to preserve call stacks on the side with TC-optimizing implementations). Ignoring TCO for a minute, the main problem would be the fact that the CPython interpreter severely limits the call space (iirc it’s 1K calls by default; compare that to the 8Mb default on most Unix systems), making recursion mostly unusable in practice, except for logarithmic-space algorithms (balanced trees, etc.).

                                                                                            Scheme has been badly held back by its community’s insistence that tail-recursive implementations of algorithms should be the only implementations of those algorithms.

                                                                                            I’m not sure what you mean – that does not make any sense to me.

                                                                                            [you assume] that the programming patterns you care about are objectively superior to other patterns [..]

                                                                                            Well, I claimed

                                                                                            [pattern matching] obviously improves readability of code manipulating symbolic expressions/trees

                                                                                            and I stand by this rather modest claim, which I believe is an objective statement. In fact it is supported quite well by the blog post that this comment thread is about. (Pattern-matching combines very well with static typing, and it will be interesting to see what Python typers make of it; but its benefits are already evident in a dynamically-typed context.)

                                                                                            1. 4

                                                                                              and I stand by this rather modest claim, which I believe is an objective statement.

                                                                                              Nit: I don’t think you can have an objective statement of value.

                                                                                              1. 4

                                                                                                Again: your original comment admits of no other interpretation than that you do not believe anyone could rationally look at the feature you like and come to a different conclusion about it. Thus you had to resort to trying to find personal fault in anyone who did.

                                                                                                This does not indicate “closed-mindedness” on the part of others. They may prioritize things differently than you do. They may take different views of complexity and tradeoffs (which are the core of any new language-feature proposal) than you do. Or perhaps they simply do not like the feature as much as you do. But you were unwilling to allow for this — if someone didn’t agree with your stance it must be due to personal fault. You allowed for no other explanation.

                                                                                                That is a problem. And from someone who’s used to seeing that sort of attitude it will get you a dismissive “here we go again”. Which is exactly what you got.

                                                                                            2. 4

                                                                                              This is perhaps more of a feeling, but saying that Rust isn’t adopting features as quickly as Ruby seems a bit off. Static type adoption in the Python community has been quicker. async/await has been painful, but is being attempted. Stuff like generalized unpacking (and this!) is also shipping out!

                                                                                              Maybe it can be faster, but honestly Python probably has one of the lowest “funding amount relative to impact” of the modern languages which makes the whole project not be able to just get things done as quickly IMO.

                                                                                              Python is truly in a funny place, where many people loudly complain about it not adopting enough features, and many other loudly complain about it loudly adopting too many! It’s of course “different people have different opinions” but still funny to see all on the same page.

                                                                                              1. 3

                                                                                                It is well-documented that Guido Van Rossum planned to remove lambda from Python 3

                                                                                                Thank you for sharing that document. I think Guido was right: it’s not pythonic to map, nor to use lambdas in most cases.

                                                                                                Every feature is useful, but some ecosystems work better without certain features. I’m not sure where go’s generics fall on this spectrum, but I’m sure most proposed features for python move it away from it’s core competency, rather than augmenting a strong core.

                                                                                                1. 1

                                                                                                  We have previously discussed their tone problem. It comes from their political position within the Python ecosystem and they’re relatively blind to it. Just try to stay cool, I suppose?

                                                                                                  1. 6

                                                                                                    I really do recommend clicking through to that link, and seeing just what an unbelievably awful thing I said that the user above called out as “emblematic” of the “contempt” I display to Python users. Or the horrific ulterior motive I was found to have further down.

                                                                                                    Please, though, before clicking through, shield the eyes of children and anyone else who might be affected by seeing such content.

                                                                                                2. 5

                                                                                                  To pick one of my favorite examples, I talked to the author of PEP 498 after a presentation that they gave on f-strings, and asked why they did not add destructuring for f-strings, as well as whether they knew about customizeable template literals in ECMAScript, which trace their lineage through quasiliterals in E all the way back to quasiquotation in formal logic. The author knew of all of this history too, but told me that they were unable to convince CPython’s core developers to adopt any of the more advanced language features because they were not seen as useful.

                                                                                                  I think that this perspective is the one which might help you understand. Where you see one new feature in PEP 498, I see three missing subfeatures. Where you see itertools as a successful borrowing of many different ideas from many different languages, I see a failure to embrace the arrays and tacit programming of APL and K, and a lack of pattern-matching and custom operators compared to Haskell and SML.

                                                                                                3. 1

                                                                                                  I think the issue is more about pattern matching being a late addition to Python, which means there will be lots of code floating around that isn’t using match expressions. Since it’s not realistic to expect this code to be ported, the old style if … elif will continue to live on. All of this adds up to a larger language surface area, which makes tool support, learning and consistency more difficult.

                                                                                                  I’m not really a big fan of this “pile of features” style of language design - if you add something I’d prefer if something got taken away as well. Otherwise you’ll end up with something like Perl 5

                                                                                                1. 9

                                                                                                  Is there some write up which spells out Alpine+Rust technical problem exactly? From this article, I infer the following:

                                                                                                  • Alpine has two year support cycle, and they need to stick to the same version of the compiler throughout the cycle.
                                                                                                  • Rust, however, releases every 6 weeks, and only officially supports the latest stable. Eg, a security issue found in the compiler will be backported to the current stable, but not to the stable from two years ago.

                                                                                                  Would this be a fair summary of the problem?

                                                                                                  1. 31

                                                                                                    It sounds like it. Note that Clang has the same problem: LLVM has a release every 6 months and upstream supports only the latest version. In FreeBSD, we maintain our own backports of critical issues and move the base system’s toolchain to newer versions forward in minor releases (a major release series has about a five-year support lifecycle).

                                                                                                    This is not ideal from a stability perspective because sometimes a newer clang can’t compile things an older clang could but this also needs to be balanced against the desire of people to actually compile stuff: If we didn’t ship to a base system compiler that supported C++17 in FreeBSD 12 (EOL 2024) then by the end of its support lifecycle the base system compiler would be a waste of space and everyone would use one from ports (most of the stuff that I work on has been C++17 for a while and is moving to C++20 at the moment).

                                                                                                    Note that the reason that we have clang in the base system at all is that POSIX requires a cc binary. Without that, we’d probably move to supporting clang only in ports (where we carry multiple versions and anyone who wants to install an old one can, putting up with the bugs in the old one instead of the bugs in the new one).

                                                                                                    I don’t have a huge amount of sympathy with a lot of these complaints because they often come from policies that the packaging system is enforcing. You want to maintain a complete set of packages for two years with security backports? That means that you are committing to back-porting security fixes for two years. If you’re RedHat then you might have customers who are willing to do that (so that they can then run Alpine / Debian / Ubuntu in containers because nothing in their host OS is sufficiently recent to be useable) but if you don’t then you need to ask yourself why you want to make that commitment. Can you relax it and say that you’ll bump the version of the Rust compiler?

                                                                                                    Packaging policies should first be driven by what is possible, then by what users want. Promising to do the impossible or promising to stick to some arbitrary standards that users don’t actually care about doesn’t help anyone.

                                                                                                    1. 18

                                                                                                      I don’t have a huge amount of sympathy with a lot of these complaints because they often come from policies that the packaging system is enforcing. You want to maintain a complete set of packages for two years with security backports? That means that you are committing to back-porting security fixes for two years.

                                                                                                      Exactly! As someone who’s been on the receiving end of, “Bbbbut but we promised client X we would provide these guarantees even though we have no way to enforce them and it’s not our lane! We (and by ‘we’ I mean you) have to find a way to provide said guarantees!”

                                                                                                      Note that in the OP, and unlike in my anecdote, it looks like Alpine is doing the right thing here by moving those packages they can’t support into community, so kudos on making the hard uncomfortable calls.

                                                                                                      1. 3

                                                                                                        I don’t have a huge amount of sympathy with a lot of these complaints because they often come from policies that the packaging system is enforcing. You want to maintain a complete set of packages for two years with security backports? That means that you are committing to back-porting security fixes for two years.

                                                                                                        Of course it’s a policy that someone made up. The point is that it’s a useful policy for users. I think software developers in general often have quite different interests from the people who run their software—they want to move fast and break things, while as a sysadmin I just don’t have the bandwidth to keep up with everything breaking all the time. Distros don’t backport just for the sake of it; they’re bridging a gap between what FOSS developers want to make and what users need.

                                                                                                        But also, in general, they do backport. I think it’s reasonable to complain when upstream projects make that intractably difficult, though, to the extent that it’s reasonable for anyone to complain about something that’s free. As a user who relies on stable distros, if they weren’t complaining for me, I’d be complaining myself.

                                                                                                        1. 5

                                                                                                          The point is that it’s a useful policy for users

                                                                                                          No, the point is that it may be useful in some cases for some users in some contexts and probably was useful for a large subset of visible users at the time that the policy was created.

                                                                                                          I think software developers in general often have quite different interests from the people who run their software—they want to move fast and break things, while as a sysadmin I just don’t have the bandwidth to keep up with everything breaking all the time

                                                                                                          As a developer, I want access to the latest versions of my tools and libraries. That’s easy.

                                                                                                          As a sysadmin, I don’t want security vulnerabilities in the things that I deploy. I also don’t want unexpected changes that break things for my users. In the LTS model, these two constraints very often come into conflict. Security vulnerabilities are easy to fix in the mainline branch but if you want back-ports then someone needs to do that work. If you want volunteers to do that, then you’re asking them to do work. If you’re RedHat (IBM) then you’ve got a load of customers who are paying you to pay engineers to do that. If you’re FreeBSD or Alpine? You have users demanding it but not being willing to do the work or pay for it, so you have to ask why you’re devoting effort to it (in the case of FreeBSD, most of the big companies that use it run -HEAD so don’t care about this at all).

                                                                                                          As a user who relies on stable distros, if they weren’t complaining for me, I’d be complaining myself.

                                                                                                          How much are you / your employer paying (and who are they paying) to ensure that you have support for a stable distro?

                                                                                                        2. 2

                                                                                                          the reason that we have clang in the base system at all is that POSIX requires a cc binary

                                                                                                          I thought it has more to do with the base/ports split and the “base builds base” tradition. It’s not impossible to just ship system images with a pkg-installed llvm :)

                                                                                                          (Also who cares about that aspect of POSIX. Not the Linux distros where you have to pacman -Sy gcc, haha)

                                                                                                      1. 26

                                                                                                        There are a lot of extensions that automatically select the ‘reject all’ or walk the list and decline them all. Why push people towards one that makes them agree? The cookie pop-ups are part of wilful misinterpretation of the GDPR: you don’t need consent for cookies, you need consent for tracking and data sharing. If your site doesn’t track users or share data with third parties, you don’t need a pop up. See GitHub for an example of a complex web-app that manages this. Generally, a well-designed site shouldn’t need to keep PII about users unless they register an account, at which point you can ask permission for everything that you need to store and explain why you are storing it.

                                                                                                        Note also that the GDPR is very specific about requiring informed consent. It is not at all clear to me that most of these pop-ups actually meet this requirement. If a user of your site cannot explain exactly what PII handling they have agreed to then you are not in compliance.

                                                                                                        1. 4

                                                                                                          Can’t answer this for other people, but I want tracking cookies.

                                                                                                          When people try to articulate the harm, it seems to boil down to an intangible “creepy” feeling or a circular “Corporations tracking you is bad because it means corporations are tracking you” argument that begs the question.

                                                                                                          Tracking improves the quality of ad targeting; that’s the whole point of the exercise. Narrowly-targeted ads are more profitable, and more ad revenue means fewer sites have to support themselves with paywalls. Fewer paywalls mean more sites available to low-income users, especially ones in developing countries where even what seem like cheap microtransactions from a developed-world perspective would be prohibitively expensive.

                                                                                                          To me, the whole “I don’t care if it means I have to pay, just stop tracking me” argument is dripping with privilege. I think the ad-supported, free-for-all-comers web is possibly second only to universal literacy as the most egalitarian development in the history of information dissemination. Yes, Wikipedia exists and is wonderful and I donate to it annually, but anyone who has run a small online service that asks for donations knows that relying on the charity of random strangers to cover your costs is often not a reliable way to keep the bills paid. Ads are a more predictable revenue stream.

                                                                                                          Tracking cookies cost me nothing and benefit others. I always click “Agree” and I do it on purpose.

                                                                                                          1. 3

                                                                                                            ‘an intangible “creepy” feeling’ is a nice way of describing how it feels to find out that someone committed a serious crime using your identity. There are real serious consequences of unnecessary tracking, and it costs billions and destroys lives.

                                                                                                            Also I don’t want ads at all, and I have no interest in targeted ads. If I want to buy things I know how to use a search bar, and if I don’t know I need something, do I really need it? If I am on a website where I frequently shop I might even enable tracking cookies but I don’t want blanket enable them on all sites.

                                                                                                            1. 4

                                                                                                              How does it “costs billions and destroys lives”?

                                                                                                              1. 2

                                                                                                                https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2020/csn_annual_data_book_2020.pdf see page 8. This is in the US alone and does not take the other 7.7b people in the world into account. I will admit it is not clear what percentage of fraud and identity theft are due to leaked or hacked data from tracking cookies so this data is hardly accurate for the current discussion, but I think it covers the question of ‘how’. If you want more detail just google the individual categories in the report under fraud and identity theft.

                                                                                                                Also see this and this

                                                                                                                But I covered criminal prosecution in the same sentence you just quoted from my reply above so clearly you meant ‘other than being put in prison’. Also, people sometimes die in prison, and they almost always lose their jobs.

                                                                                                                1. 4

                                                                                                                  The first identity theft story doesn’t really detail what exactly happened surrounding the ID theft, and the second one is about a childhood acquaintance stealing the man’s ID. It doesn’t say how exactly either, and neither does that FTC report as far as I can see: it just lists ID theft as a problem. Well, okay, but colour me skeptical that this is cause by run-of-mill adtech/engagement tracking, which is what we’re talking about here. Not that I think it’s not problematic, but it’s a different thing and I don’t see how they’re strongly connected.

                                                                                                                  The NSA will do what the NSA will do; if we had no Google then they would just do the same. I also don’t think it’s as problematic as often claimed as agencies such as the NSA also do necessary work. It really depends on the details on who/why/what was done exactly (but the article doesn’t mention that, and it’s probably not public anyway; I’d argue lack of oversight and trust is the biggest issue here, rather than the actions themselves, but this is veering very off-topic).

                                                                                                                  In short, I feel there’s a sore lack of nuance here and confusion between things that are (mostly) unconnected.

                                                                                                                  1. 2

                                                                                                                    Nevertheless all this personal data is being collected, and sometimes it gets out of the data silos. To pretend that it never causes any harm just because some stranger on the internet failed to come up with a completely airtight example case in 5 minutes of web searching is either dishonest or naive. If you really want to know, you can do the research yourself and find real cases. If you would rather just feel comfortable with your choice to allow all tracking cookies that is also totally fine. You asked how, I believe my answer was sufficient and roughly correct. If you feel the need to prove me wrong that is also fine, and I will consider any evidence you present.

                                                                                                                    1. 2

                                                                                                                      The type of “personal data” required for identity theft is stuff like social security numbers, passport numbers, and that kind of stuff. That’s quite a different sort of “personal data” than your internet history/behaviour.

                                                                                                                      To pretend that it never causes any harm just because some stranger on the internet failed to come up with a completely airtight example case in 5 minutes of web searching is either dishonest or naive. If you really want to know, you can do the research yourself and find real cases.

                                                                                                                      C’mon man, if you’re making such large claims such as “it costs billions and destroys lives” then you should be prepared to back them up. I’m not an expert but spent over ten years paying close attention to these kind of things, and I don’t see how these claims bear out, but I’m always willing to learn something new which is why I asked the question. Coming back with “do your own research” and “prove me wrong then!” is rather unimpressive.

                                                                                                                      If you would rather just feel comfortable with your choice to allow all tracking cookies that is also totally fine.

                                                                                                                      I don’t, and I never said anything which implied it.

                                                                                                                      If you feel the need to prove me wrong that is also fine, and I will consider any evidence you present.

                                                                                                                      I feel the need to understand reality to the best of my ability.

                                                                                                                      1. 1

                                                                                                                        I feel the need to understand reality to the best of my ability.

                                                                                                                        Sorry I was a bit rude in my wording. There is no call for that. I just felt like I was being asked to do a lot of online research for a discussion I have no real stake in.

                                                                                                                        GDPR Article 4 Paragraph 1 and GDPR Article 9 Paragraph 1 specify what kind of information they need to ask permission to collect. It is all pretty serious stuff. There is no mention of ‘shopping preferences’. Social security numbers and passport numbers are included, as well as health data, things that are often the cause of discrimination like sexuality/religion/political affiliation. Also included is any data that can be used to uniquely identify you as an individual (without which aggregate data is much harder to abuse) which includes your IP, your real name.

                                                                                                                        A lot of sites just ask permission to cover their asses and don’t need to. This I agree is annoying. But if a site is giving you a list of cookies to say yes or no to they probably know what they are doing and are collecting the above information about you. If you are a white heterosexual English speaking male then a lot of that information probably seems tame enough too, but for a lot of people having that information collected online is very dangerous in quite real and tangible ways.

                                                                                                                2. 3

                                                                                                                  I am absolutely willing to have my view on this changed. Can you point me to some examples of serious identity theft crimes being committed using tracking cookies?

                                                                                                                  1. 2

                                                                                                                    See my reply to the other guy above. The FTC data does not specify where the hackers stole the identity information so it is impossible for me to say what percentage are legitimately caused by tracking cookies. The law that mandates these banners refers to information that can be used to identify individuals. Even if it has never ever happened in history that hacked or leaked cookie data has been used for fraud or identity theft, it is a real danger. I would love to supply concrete examples but I have a full time job and a life and if your claim is “Sure all this personal data is out there on the web, and yes sometimes it gets out of the data silos, but I don’t believe anyone ever used it for a crime” then I feel like its not worth my time spending hours digging out case studies and court records to prove you wrong. Having said that if you do some searching to satisfy your own curiosity and find anything definitive I would love to hear about it.

                                                                                                                  2. 2

                                                                                                                    someone committed a serious crime using your identity

                                                                                                                    because of cookies? that doesn’t follow

                                                                                                                  3. 1

                                                                                                                    Well this is weird. I think it’s easy to read that and forget that the industry you’re waxing lyrical about is worth hundreds of billions; it’s not an egalitarian development, it’s an empire. Those small online services that don’t want to rely on asking for donations aren’t billion-dollar companies, get a deal entirely on someone else’s terms, and are almost certainly taken advantage of for the privilege.

                                                                                                                    It also has its own agenda. The ability to mechanically assess “ad-friendliness” already restricts ad-supported content producers to what corporations are happy to see their name next to. I don’t want to get too speculative on the site, but there’s such a thing as an ad-friendly viewer too, and I expect that concept to become increasingly relevant.

                                                                                                                    So, tracking cookies. They support an industry I think is a social ill, so I’d be opposed to them on that alone. But I also think it’s extremely… optimistic… to think being spied on will only ever be good for you. Advertisers already leave content providers in the cold when it’s financially indicated—what happens when your tracking profile tells them you’re not worth advertising to?

                                                                                                                    I claim the cost to the individual is unknowable. The benefit to society is Cambridge Analytica.

                                                                                                                  4. 2

                                                                                                                    The cookie law is much older than GDPR. In the EU you do need consent for cookies. It is a dumb law.

                                                                                                                    1. 11

                                                                                                                      In the EU you do need consent for cookies. It is a dumb law.

                                                                                                                      This is not true. In the EU you need consent for tracking, whether or not you do that with cookies. It has to be informed consent, which means that the user must understand what they are agreeing to. As such, a lot of the cookie consent UIs are not GDPR compliant. Max Schrems’ company is filing complaints about non-compliant cookie banners.

                                                                                                                      If you only use functional cookies, you don’t need to ask for consent.

                                                                                                                      1. 3

                                                                                                                        https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:31995L0046 concerns consent of user data processing.

                                                                                                                        https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32002L0058 from 2002 builds on the 1995 directive, bringing in “cookies” explicitly. Among other things it states “The methods for giving information, offering a right to refuse or requesting consent should be made as user-friendly as possible.”

                                                                                                                        In 2009 https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32009L0136 updated the 2002 directive, closing a few loop holes.

                                                                                                                        The Do-Not-Track header should have been enough signal to cut down on cookie banners (and a few websites are sensible enough to interpret it as universal rejection for unnecessary data storage), but apparently that was too easy on users? It went as quickly as it came after Microsoft defused it by enabling it by default and parts of adtech arguing that the header doesn’t signify an informed decision anymore and therefore can be ignored.

                                                                                                                        If banners are annoying it’s because they’re a deliberate dark pattern, see https://twitter.com/pixelscript/status/1436664488913215490 for a particularly egregious example: A direct breach of the 2002 directive that is typically brought up as “the cookie law” given how it mandates “as user-friendly as possible.”

                                                                                                                        1. 2

                                                                                                                          I don’t understand what you’re trying to say. Most cookie banners on EU sites are not at all what I’d call a dark pattern. They’re just trying to follow the law. It is a stupid law which only trained people to click agree on all website warnings, making GDPR less effective. Without the cookie law, dark patterns against GDPR would be less effective.

                                                                                                                          1. 3

                                                                                                                            The dark pattern pgeorgi refers to is that on many cookie banners, the “Refuse all” button requires more clicks and/or more careful looking than the “Accept all” button. People who have trained themselves to click “Accept” mostly chose “Accept” because it is easier — one click on a bright button, and done. If “Refuse all” were equally easy to choose, more people would train themselves to always click “Refuse”.

                                                                                                                            Let’s pretend for a moment the cookie law no longer exists. A website wants to set a tracking cookie. A tracking cookie, by definition, constitutes personally identifiable information (PII) – as long as the cookie is present, you can show an ad to specifically that user. The GDPR recognizes 6 different conditions under which processing PII is lawful.

                                                                                                                            The only legal ground to set a tracking cookie for advertising purposes is (a) If the data subject has given consent to the processing of his or her personal data. I won’t go over every GDPR ground, but suffice it to say that tracking-for-advertising-purposes is not covered by

                                                                                                                            • (b) To fulfil contractual obligations with a data subject;
                                                                                                                            • nor is it covered by (f) For the legitimate interests of a data controller or a third party, unless these interests are overridden by interests of the data subject.

                                                                                                                            So even if there were no cookie law, GDPR ensures that if you want to set a tracking cookie, you have to ask the user.

                                                                                                                            Conversely, if you want to show ads without setting tracking cookies, you don’t need to get consent for anything.

                                                                                                                            1. 2

                                                                                                                              I feel the mistake with the whole “cookie law” thing is that it focuses too much on the technology rather than what people/companies are actually doing. That is, there are many innocent non-tracking reasons to store information in a browser that’s not “strictly necessary”, and there are many ways to track people without storing information in the browser.

                                                                                                                            2. 1

                                                                                                                              I’m not saying that dark patterns are employed on the banners. The banners themselves are dark patterns.

                                                                                                                              1. 1

                                                                                                                                The banners often come from freely available compliance packages… It’s not dark, it’s just lazy and badly thought out, like the law itself.

                                                                                                                                1. 1

                                                                                                                                  What about the law do you think is badly thought out?

                                                                                                                                  1. 1

                                                                                                                                    The cookie part of the ePrivacy Directive is too technological. You don’t need consent, but you do have to inform the user of cookie storage (or localstorage etc) no matter what you use it for. It’s unnecessary information, and it doesn’t protect the user. These are the cookie banners that only let you choose “I understand”, cause they only store strictly necessary cookies (or any kind of cookie before GDPR in 2016).

                                                                                                                                    GDPR is the right way to do it. The cookie part of EPR should have been scrapped with GDPR. That would make banners that do ask for PII storage consent stand out more. You can’t make you GDPR banner look like an EPR information banner if EPR banners aren’t a thing.

                                                                                                                        2. 2

                                                                                                                          Usually when I see the cookie consent popup I haven’t shared any personal information yet. There is what the site has from my browser and network connection, but I trust my browser, uBlock origin and DDG privacy tools to block various things and I use a VPN to somewhere random when I don’t want a site to know everything it can about my network location.

                                                                                                                          If I really do want to share personal info with a site, I’ll go and be very careful what I provide and what I agree too, but also realistic in that I know there are no guarantees.

                                                                                                                          1. 8

                                                                                                                            If you’re using a VPN and uBlock origin, then your anonymity set probably doesn’t contain more than a handful of people. Combined with browser fingerprinting, it probably contains just you.

                                                                                                                            1. 2

                                                                                                                              Should I be concerned about that? I’m really not sure I have properly thought through any threats from the unique identification that comes from that. Do you have any pointers to how to figure out what that might lead to?

                                                                                                                              1. 9

                                                                                                                                The point of things like the GDPR and so on is to prevent people assembling large databases of correlated knowledge that violate individual privacy. For example, if someone tracks which news articles you read, they have a good first approximation of your voting preferences. If they correlate it with your address, they can tell if you’re in a constituency where their candidate may have a chance. If you are, they know the issues that are important to you and so can target adverts towards you (including targeted postal adverts if they’re able to get your address, which they can if they share data with any company that’s shipped anything physical to you) that may influence the election.

                                                                                                                                Personally, I consider automated propaganda engines backed by sophisticated psychological models to be an existential threat to a free society that can be addressed only by some quite aggressive regulation. Any unique identifier that allows you to be associated with the kind of profile that these things construct is a problem.

                                                                                                                              2. 2

                                                                                                                                Do you have a recommendation?

                                                                                                                            2. 2

                                                                                                                              The problem with rejecting all the tracking is that without it most ad networks will serve you the worst/cheapest untargeted adverts which have a high chance of being a vector for malware.

                                                                                                                              So if you reject the tracking you pretty much have to also run an ad-blocker to protect yourself. Of course if you are running an ad blocker then the cookies arent going to make much difference either way.

                                                                                                                              1. 1

                                                                                                                                I don’t believe it makes any difference whether you agree or disagree? the goal is just to make the box go away

                                                                                                                                1. 2

                                                                                                                                  Yes. If I agree and they track me, they are legally covered. If I disagree and they track me then the regulator can impose a fine of up to 5% of their annual turnover. As a second-order effect: if aggregate statistics say 95% of people click ‘agree’ then they have no incentive to reduce their tracking, whereas if aggregate statistics say ‘10% leave the page without clicking either, 50% click disagree’ then they have a strong case that tracking will lose them business and this will impact their financial planning.

                                                                                                                                1. 4

                                                                                                                                  I think a lot of people get caught in a trap when they start thinking about “open source” definitions, and this essay was no exception: it’s very hard to avoid falling into the moral/political/philosophical approach of the Free Software movement, yet rejecting the moral/political/philosophical approach in favor of what’s practical and useful is probably the simplest way to describe the difference between the Free Software and the Open Source camps.

                                                                                                                                  And so any argument that’s based on an idea of principles and values is one that’s going to struggle to work. If you want to start moralizing, you more or less need to adopt the Free Software side of things and be done with it. And then you don’t worry about things like “source available” licenses, because you have an easy answer: the manifesto says that’s not Free Software and is therefore to be shunned. If you want to talk about “source available” licenses as related to Open Source, though, you have to shift to an Open Source context, where your analysis is based on practical utility rather than on rigid adherence to a set of moral commandments.

                                                                                                                                  So consider the current trend of “open source” projects which really just wanted to be “open source” as a way to score free labor for their SaaS product, proceeded to get their butts kicked by cloud providers, and decided to go to whatever label eventually gets agreed on for these look-but-don’t-touch licenses that are designed to forbid you from competing with the original authors’ startup (yes, I know the exact terms don’t say that, but the clear intent and practical effect is to ensure that only the original authors could run a successful SaaS business around the software). From a practical perspective, these licenses are just a failure, and I’ve already made clear why: the goal isn’t to harness diverse and disparate contributors to build something more useful than any one of them would have achieved alone. The goal is to turn “contributors” into, effectively, unpaid interns.

                                                                                                                                  Actual Open Source projects let me treat my contribution as a transaction with benefits for both of us: the project gets something (most often, improved code), and I get something (most often, the ability to build/expand my business, or leverage my familiarity with and contributions to the project into future employment at businesses which use it). Everybody wins. It’s practical. “Source available” licenses are one-way streets: the company gets all the real benefit, and I get at best “exposure” that won’t pay my bills and in fact may decrease my employability (since companies may not want to risk a lawsuit if they hire me and then on the job I accidentally use forbidden knowledge gleaned from a “source available” project). Impractical!

                                                                                                                                  These are the kinds of terms that discussions of Open Source (as distinct from Free Software) need to be framew in if they’re going to be productive at all.

                                                                                                                                  1. 1

                                                                                                                                    Actual Open Source projects let me treat my contribution as a transaction with benefits for both of us: the project gets something (most often, improved code), and I get something (most often, the ability to build/expand my business, or leverage my familiarity with and contributions to the project into future employment at businesses which use it). Everybody wins. It’s practical.

                                                                                                                                    You get software you can use as a starting point, but you don’t get anything back in return for your contribution. The whole thing is predicated on good will: the creators of the open-source thing assume (a subset of) users will contribute back, and contributors assume the thing they’re contributing to will stay open. Are those safe assumptions?

                                                                                                                                    I don’t know if there’s a better answer, but… well, if faith worked we wouldn’t need contracts.

                                                                                                                                    1. 1

                                                                                                                                      The whole thing is predicated on good will: the creators of the open-source thing assume (a subset of) users will contribute back, and contributors assume the thing they’re contributing to will stay open. Are those safe assumptions?

                                                                                                                                      It’s certainly a counterintuitive result – as you note, generally there’s an assumption that we have to force people to cooperate and share, and that’s also the approach that the Free Software side takes with copyleft licenses – but empirical evidence suggests enough people do so, for reasons that range from altruistic to cynically pragmatic, to produce impressive results.

                                                                                                                                  1. 15

                                                                                                                                    “It tests CS fundamentals.” “It tests reasoning through a new problem.” These answers are contradictory

                                                                                                                                    They aren’t really – they’re alternative answers to the same question, and they both describe a useful outcome. Put together, they give you an idea if the interviewee knows the CS fundamentals and/or can figure them out as they go along.

                                                                                                                                    Interview at one of the clients I work with involves implementing a binary tree (just the basics, here’s the API, implement adding a new element to it – not even lookups). People we interview generally know what a binary tree is, and usually they have no trouble implementing it – it’s trivial enough that you shouldn’t have a problem with it even if you last saw it at university 10 years ago. We recently had a candidate who was a self-taught programmer with a mathematics degree. We started off showing her a linear search function to code review. She described it well, suggested improvements. The list in the example was sorted, and we asked if the search could be improved somewhat given the patterns in example data. “Well, you could split the list into two halves…” and she described binary search. “Cool, that does algorithm have a name you’re familiar with?” “No, I don’t think so”. Okay then.

                                                                                                                                    So then there’s the binary tree. The looks at the drawing of it, describes the properties, the implements in in one go, correctly, calling the Node a Knot (translated to Polish they’re the same word). Again, she had no idea if this data structure had any specific name. Naturally she got the job.

                                                                                                                                    Linked lists may be useless in all applications nowadays (given that CPU caches generally make it an inferior choice in almost every scenario) (bad generalization, see comments below), but they do have a value on interviews imho: they tell you if the candidate knows what the data structures and algorithms lie beyond the APIs they use in their code every day, and if they don’t – even better! It’s easy enough figure it out and to write on the spot – and even if what you write isn’t fully correct, seeing the process and discussing their bugs is still valuable.

                                                                                                                                    That’s actually a bit offtopic from this article – which is well worth reading in full indeed, and makes sense if you think about it. I think the Spolsky quote actually nails it, even without the historical context – people would still prefer (or think that they’d prefer) to hire Real Programmers [tm] who can deal with pointers rather than “script jocks” who just copy code around.

                                                                                                                                    1. 20

                                                                                                                                      Linked lists may be useless in all applications nowadays (given that CPU caches generally make it an inferior choice in almost every scenario)

                                                                                                                                      This is not the first time I heard this (I recall a specific rust tutorial expressing this in particular), but this is such a weird take to me.

                                                                                                                                      I work on high-performance datapath in software-defined networking. Those are highly optimized network stack using the usual low-latency/high-io architectures. In all of them, linked-list were fundamental as a data structure.

                                                                                                                                      One example: a thoroughly optimized hash-table using cuckoo hash, with open addressing and concurrent lookup. It implements buckets as a linked-list of arrays of elements, each cell taking exactly one cache-line. The link-list is so fundamental as a construct there that arguably it is eclipsed by the other elements.

                                                                                                                                      Another example: a lockless list of elements to collect after the next RCU sync round. Either the RCU allocates arrays of callbacks + arguments to call once synchronized, or the garbage collector consist in a list of nodes embedded in other objects and the reclaiming thread will jump from object to object to free them. The array-based one has issues as each thread will need to reallocate a growing array of callbacks, requiring bigger and bigger spans of contiguous memory when attempting to free elements.

                                                                                                                                      Another example: when doing TSO on TCP segments, those are linked together and passed to the NIC to offload the merging. To avoid copying around segments of data during forwarding, they must be linked together.

                                                                                                                                      There are countless other examples, where the high-performance requirement makes the linked-list a good solution actually. It avoid allocated large swath of memory in contiguous chunks, and allows being much more nimble in how memory is managed.

                                                                                                                                      That being said of course, at the source of those elements (when new packets are received), everything must be done to attempt to pack the maximum of useful data for a fastpath in contiguous memory (using hugepages). But as you go higher in the stack it becomes more and more untenable. Then simple structures will allow building ancillary modules for specific cases.

                                                                                                                                      I don’t see how linked-list will ever become completely useless. Maybe I’m just too far gone in my specific branch of software?

                                                                                                                                      1. 11

                                                                                                                                        Software is full of rules of thumb, some of them are true. I’m pretty sure someone wrote an article years ago on mechanical sympathy where they pointed out that a CPU can chew through contiguous memory a bunch faster than it can chase unknown pointers and since then “everyone knows” that linked lists are slow.

                                                                                                                                        The reality is always more complicated and your experience optimizing real world systems definitely trumps the more general recommendations.

                                                                                                                                        1. 1

                                                                                                                                          I’m pretty sure someone wrote an article years ago

                                                                                                                                          Bjarne Stroustroup did! https://m.youtube.com/watch?v=YQs6IC-vgmo

                                                                                                                                        2. 6

                                                                                                                                          Linked lists make sense in reality because solutions to real problems are bigger than benchmarks. Complicated custom datastructures with linked lists threaded through them are everywhere, but nobody looks at them when they’re looking to make a blanket statement about how fast linked lists are. (Such statements seem to rest on shaky foundations anyway; linked lists aren’t a thing that is fast or slow, but a technique that is more or less applicable).

                                                                                                                                          I use linked lists mainly for bookkeeping where objects are only iterated occasionally, but created and deleted often, where they appeal mostly because they avoid headaches; their constant factors may not be great, but they scale to infinity in an uncomplicated way (and if I’m creating or deleting an object I’m allocating something in any event). I don’t often see linked list operations in flame graphs, and when I do it’s normally accidentally-quadratic behaviour, so I replace them with things that aren’t list-like at all.

                                                                                                                                          Final idle thought: not scientific at all, but I note that linked list nodes can live inside the element itself without problems, while array elements are often indirected through an extra pointer to deal with move issues. While it doesn’t absolutely have to, code using such arrays often looks like it suffers from the same data dependency issues linked lists tend to.

                                                                                                                                          1. 2

                                                                                                                                            People like to make performance claims without ever generating any numbers to backup their assertions.

                                                                                                                                            1. 2

                                                                                                                                              True – same is true for “linked list are faster at insertions/deletions” though. I brought it up because I’ve heard it mentioned time and time again in my university days, and now time and time again at interviews etc, always at a scale where it doesn’t matter or is straight up incorrect. I’m sure they’ll always (or: for the foreseeable future) have their place, even if that place is not the common case. All the more reason to understand them even if you won’t often use them.

                                                                                                                                              1. 1

                                                                                                                                                There seems to be a long tradition of this kind of garbage. Decades ago I used to hear things like “malloc is slow” from my senior colleagues with nobody being able to explain to me (the young punk) what lead to the assertion.

                                                                                                                                            2. 7

                                                                                                                                              people would still prefer (or think that they’d prefer) to hire Real Programmers [tm] who can deal with pointers rather than “script jocks” who just copy code around.

                                                                                                                                              Real Programmers™ is already a fallacy though that tends to amplify the issues on teams by hiring a group of people with very similar strengths, flaws, and viewpoints.

                                                                                                                                              Put together, they give you an idea if the interviewee knows the CS fundamentals and/or can figure them out as they go along.

                                                                                                                                              This may be true for some questions, like your binary search example, but I don’t think it’s true of the majority of linked list and pointer type questions that are asked in an interview setting. Linked list cycle finding (i.e. tortoise and the hare) comes to mind—a lot of people in the early 2010’s were defending this as something that either tested if you knew fundamentals or if you could piece them together, but it’s been pointed out half to death by now that the algorithm itself wasn’t developed until years after cycle finding was a known problem with people trying to solve it—almost everyone who passed a tortoise and the hare question either knew it in advance or was given hints by the interviewer that led them there without the interviewer believing they’d given it away (which is a pretty fraught and unnormalized thing).

                                                                                                                                              In general, I think this is a high ideal that is really hard to build for and people convince themselves they have solved for it at far too high a rate. When I first started interviewing for software jobs (~2010), I learned quickly that if you knew the answer or had seen the question before, the right thing to do was to pretend you hadn’t and perform discovering the answer. This is a problem with nearly all knowledge-based interview questions; there will always be a degree to which you’re testing if the candidate is connected enough to know what the questions will look like ahead of time and what kinds of things will be tested.

                                                                                                                                            1. 3

                                                                                                                                              I know it’s beside the point, but the title of this post seems like a grammatical illusion I haven’t seen before. The first impression upon reading it is of the intended meaning, but then you read it again and… ???

                                                                                                                                              I don’t think there’s any math I know that would help computing but isn’t, but I tend to learn math to help me do computer stuff, so.

                                                                                                                                                1. 2

                                                                                                                                                  What rough math
                                                                                                                                                  its hour come at last
                                                                                                                                                  slouches to the computer to be born?

                                                                                                                                                1. 21

                                                                                                                                                  The article never mentions the, in my humble opinion, most important part of good logging practices and that is structured logging. Without it you end up with weird regexes or other hacks trying to parse your log messages.

                                                                                                                                                  1. 4

                                                                                                                                                    As a sibling post notes, if you use structured logging you’re mostly throwing away the idea that the entries must be easily parsable by a human. If that’s the case, and we’ll need a custom method of displaying the structured logs in a human friendly way, I believe we should forego plain text all together and gain the benefits of logging directly to binary.

                                                                                                                                                    1. 5

                                                                                                                                                      You can do human readable structured logging if you use key="value" formats inside text messages. Some people still prefer json, but there is a middle ground.

                                                                                                                                                      1. 2

                                                                                                                                                        If you need just key=value, that’s not really structured in my opinion.

                                                                                                                                                        1. 4

                                                                                                                                                          Why not?

                                                                                                                                                          1. 2

                                                                                                                                                            Because the amount of information added by this format would be infinitesimal over a line based logger with manual tokenization. The reason why you’d want a structured logger is to allow proper context to a message. Unless you’re working with simple cases, the structure that would offer such context is more than one level deep.

                                                                                                                                                            1. 3

                                                                                                                                                              Hmm, definitely not.

                                                                                                                                                              Structured logging is about decorating log events with just enough of a schema to make them machine parseable, so that searching, aggregating, filtering, etc. can more than a crapshoot. Deeply nested events significantly increase the complexity of that schema, and therefore the requirements of the consumer.

                                                                                                                                                              By default, structured logs should be flat key/value pairs. It gets you the benefits of richer parseability, without giving up the ability to grep.

                                                                                                                                                    2. 2

                                                                                                                                                      Excellent point. That’s become such second nature to me by now, that I forgot to even mention it!

                                                                                                                                                      1. 2

                                                                                                                                                        I’m surprised it wasn’t mentioned, but the larger advantage of passing a logger around to constructors is the ability to then have nested named loggers, such as

                                                                                                                                                        Battery.ChargingStatus.FileReader: Failed to open file { file: "/tmp/battery charge", error: ... }
                                                                                                                                                        Battery.ChargingStatus: Failed to access status logs, skipping report
                                                                                                                                                        
                                                                                                                                                        1. 1

                                                                                                                                                          On top of that, structured logger if implemented properly, can often be faster and be operated at granular levels (like the other comments pointed out, sometimes you do want to on-fly turn on some logs at some locations, not all logs at all locations).

                                                                                                                                                          1. 1

                                                                                                                                                            I love structured logging, with one caveat: the raw messages emitted (let’s assume JSON) are harder for me to scan when tailing directly (which I usually only do locally as we have better log querying tools in the cloud), in contrast to a semi-structured simple key-value format. Do you all use a different format than JSON? Or a tool that transforms structured logs to something more friendly to humans, eg. with different log levels displayed in different appropriate colors, eg. JSON syntax characters diminished, for local tailing?

                                                                                                                                                            1. 5

                                                                                                                                                              At Joyent, we used the Bunyan format. Each line in the file was a separate JSON object with standard properties, some mandatory and some optional, and freeform additional properties. We shipped a tool, bunyan, that was capable of acting as a filter that would render different human readable views of the JSON. For example, you would often run something like:

                                                                                                                                                              tail -F $(svcs -L manatee) | bunyan -o short
                                                                                                                                                              

                                                                                                                                                              It also had some rudimentary filtering options. It also had a relatively novel mode that would, instead of reading from a file or standard input, use DTrace probes for different log levels to allow you to dynamically listen for DEBUG and TRACE events even when those were not ordinarily present in the log files. The DTrace mode could target a particular process, or even all processes on the system that emitted Bunyan logs.

                                                                                                                                                              1. 1

                                                                                                                                                                Hi, what were the required fields? Was it just a unique request ID? Thanks for sharing about bunyan. Even though it’s been out for a while I was unaware of it.

                                                                                                                                                              2. 5

                                                                                                                                                                Do you all use a different format than JSON? Or a tool that transforms structured logs to something more friendly to humans, eg. with different log levels displayed in different appropriate colors, eg. JSON syntax characters diminished, for local tailing?

                                                                                                                                                                We use JSON and the only tools I use are grep and jq. And although I am pretty much still a novice with these two, I found that with the power of shell piping I can do almost anything I want. Sometimes I reach for the Kibana web interface, get seriously confused and then go back to the command line to figure out how to do it there.

                                                                                                                                                                I wrote a simple tutorial for the process, just a couple of weeks ago.

                                                                                                                                                                1. 1

                                                                                                                                                                  If you rely on external tools to be able to make sense of your logs, why not go all the way, gain the speed and size benefits that binary logs would bring, and write your own log pager? I feel like the systemd folks had the right idea even when everyone was making fun of them.

                                                                                                                                                                  1. 3

                                                                                                                                                                    I don’t think the average employer would be happy subsidizing an employee writing a log pager instead of implementing something that would bring a tangible result to the business. The potential money savings by using binary logs probably doesn’t outweigh the new subs/increased profits of churning out more features.

                                                                                                                                                                    1. 1

                                                                                                                                                                      To me that sounds like an excuse. The world is not made up of only software that is beholden to the all mighty shareholder.

                                                                                                                                                                      1. 1

                                                                                                                                                                        I mean, yes, if you’re developing something in your personal time, go bananas on what you implement.

                                                                                                                                                                        But I also know my manager would look at me funny and ask why I’m not just shoving everything into CloudWatch/<cloud logging service>

                                                                                                                                                                    2. 2

                                                                                                                                                                      I’m sure most problems with systemd journals are fixable, but they’ve left a very bad taste in my mouth for two main reasons: if stuff gets deleted from under them they apparently never recover (my services continue to say something like “journal was rotated” until I restart them), and inspecting journals is incredibly slow. I’m talking magnitudes slower than log files. This is at its worst (I often have time to make a cup of tea) when piping the output into grep or, as journalctl already does by default, less, which means every byte has to be formatted by journalctl and copied only to be skipped over by its recipient. But it’s still pretty bad (I have time to complain on IRC about the wait) when giving journalctl filters that reduce the final output down to a few thousand lines, which makes me suspect that there are other less fundamental issues.

                                                                                                                                                                      I should note that I’m using spinning disks and the logs I’m talking about are tens to hundreds of GB over a few months. I feel like that situation’s not abnormal.

                                                                                                                                                                      1. 1

                                                                                                                                                                        If you rely on external tools to be able to make sense of your logs, why not go all the way, gain the speed and size benefits that binary logs would bring, and write your own log pager?

                                                                                                                                                                        It’s hard to imagine a case at work where I could justify writing my own log pager.
                                                                                                                                                                        Here are some of the reasons I would avoid doing so:

                                                                                                                                                                        • Logs are an incidental detail to the application.
                                                                                                                                                                        • Logs are well understood; I can apply a logging library without issues.
                                                                                                                                                                        • My application isn’t a beautiful and unique snowflake. I should use the same logging mechanisms and libraries as our other applications unless I can justify doing something different.
                                                                                                                                                                        • JSON is boring, has a specification, substantial library support, tooling, etc.
                                                                                                                                                                        • Specifying, documenting, and testing a custom format is a lot of work.
                                                                                                                                                                        • Engineering time is limited; I try to focus my efforts on tasks that only I can complete.
                                                                                                                                                                        1. 2

                                                                                                                                                                          Logs are an incidental detail to the application.

                                                                                                                                                                          I think this is trivially disproved by observing that if the logs stop working for your service, that is (hopefully!) a page-able event.

                                                                                                                                                                          Logs are a cross-cutting concern, but as essential as any other piece of operational telemetry.

                                                                                                                                                                          1. 1

                                                                                                                                                                            Logs are a cross-cutting concern, but as essential as any other piece of operational telemetry.

                                                                                                                                                                            I rely heavily on logging for the services I support but the applications I wrote for work have only error reporting. They are used by a small audience and problems are rare; I might get a crash report every 18 months or so.

                                                                                                                                                                            1. 1

                                                                                                                                                                              Ah, yeah, I presume the context here is services.

                                                                                                                                                                      2. 1

                                                                                                                                                                        Agreed. jq is a really nice tool. It made the decision to transition to using JSON for logging very easy.

                                                                                                                                                                      3. 3

                                                                                                                                                                        Don’t use JSON, use logfmt.

                                                                                                                                                                        1. 1

                                                                                                                                                                          Yes! Logfmt is the good stuff. But it’s only semi-structured. Why not use JSON and a tool to transform to logfmt (with nested data elided probably) when needing to scan as a human?

                                                                                                                                                                          1. 1

                                                                                                                                                                            Logfmt is fully structured, it just doesn’t support nesting, which is an important feature! Structured logs should be flat.

                                                                                                                                                                    1. 5

                                                                                                                                                                      I wonder whether they got permission from all their open source contributors to re-license the code? Or maybe they use a CLA like Shopify and co. do, where you waive all your rights to the code you own once it’s merged to the main tree?

                                                                                                                                                                      1. 12

                                                                                                                                                                        It sounds like it was previously mit, and if I understand the law correctly you can make modifications to mit software and release the modified version under gpl without issue (so long as you preserve the original mit license text).

                                                                                                                                                                        1. 3

                                                                                                                                                                          Hmm… but relicensing code requires the permission of the code’s author, no? For the company’s own code that’s probably fine, but what about any outside contributors that might not agree with the license change? They might have the right to rescind their code.

                                                                                                                                                                          1. 22

                                                                                                                                                                            They gave that permission by using under the MIT License. It is when you go in the ‘other’ direction that you need to ask for everyone’s consent/permission. ej. Racket had a huge multiyear thread asking everyone if they were OK when changing from LGPL to MIT.

                                                                                                                                                                            Btw I remember in the 00’s some BSD complained that Linux developers would take their driver code, use it and license it under the GPL, making it impossible to merge any improvements upstream.

                                                                                                                                                                            https://opensource.stackexchange.com/a/5833

                                                                                                                                                                            1. 8

                                                                                                                                                                              Btw I remember in the 00’s some BSD complained that Linux developers would take their driver code, use it and license it under the GPL, making it impossible to merge any improvements upstream.

                                                                                                                                                                              I mean, isn’t that exactly the purpose of MIT? “Here’s some code, do whatever you want with it, you don’t have to contribute improvements back”.

                                                                                                                                                                            2. 12

                                                                                                                                                                              Technically the old code would still be MIT and the new code would be AGPL. However, since AGPL has more strict requirements the whole project is effectively AGPL. They’d still need to preserve the original MIT license text though.

                                                                                                                                                                              1. 7

                                                                                                                                                                                The code’s authors licensed their code under the MIT license, which allows that code to be relicensed by anyone else under new terms (such as the AGPL).

                                                                                                                                                                                1. 1

                                                                                                                                                                                  No, re-licencing is not permitted. If I write fila A of project X, using MIT, and someone else writes file B under AGPL - then another user who gets A and B would get both under AGPL - however, they could still (in general) use A according to MIT.

                                                                                                                                                                                  If this makes a difference or not will dependa lot on the projectas a whole, and the content of A.

                                                                                                                                                                                  A could be a self contained c allocator, or a clever implementation of a useful ADT. Or it could be a smallpart of what B provides, like an implementationof a print macro/trait for Canadian post codes.

                                                                                                                                                                                  1. 2

                                                                                                                                                                                    Sure. But say you write some file and license it publicly under the MIT license. I can then take that same file and, in accordance with the terms of the former license, license it to someone else under the terms of the AGPL license. They will then not be able to use it under the terms of the MIT license.

                                                                                                                                                                                    In practice, this is not such a big deal, since the original version is likely still available and indistinguishable from the version I provide. However if I change something small—like, say, the wording—then my changed version is distinct from your original, and if I license it as AGPL it won’t be possible to use it under the terms of the MIT license.

                                                                                                                                                                                    1. 2

                                                                                                                                                                                      No, as far as I understand this is not correct - a bsd or mit license is connected to copyright - and you need to make substantial changes in order to claim copyright. Without copyright you cannot re-license.

                                                                                                                                                                                      Remember -in most jurisdictions, the default is copyright. If I write a poem here - you could quote me, but not publish my poem - you have no license to redistribute it. If I explicitly give you a license - you cannot change that license.

                                                                                                                                                                                      This does get a bit muddy with the various viral licenses you point out -but as far as I understand mixing file A under mit, with file B under gpl (or AGPL)- does not really allow you, the distributor of A and B, and the recipient of A to re-license A.

                                                                                                                                                                                      Your downstream users would/should still get A with mit copyright notice, and will be free to distribute/use A (and only A) under mit.

                                                                                                                                                                                      Doing so would not make the GPL license for A and B invalid.

                                                                                                                                                                                      Ie: you include an mit malloc in your “ls” utility. A user that gets the source from you, could go in and see that, ok, this malloc bit (assume it’s not modified)- I can use that as mit.

                                                                                                                                                                                      This is because you as the distributor, do not have copyright to the upstream mit bit.

                                                                                                                                                                                      People will claim differently, and I don’t think it’s been tested in court - but AFAIK this how the legal bits land.

                                                                                                                                                                                      1. 8

                                                                                                                                                                                        You don’t need to claim copyright over something to relicense it. You can grant a license to a copyrighted work if your own license to that work permits it, which MIT explicitly does.

                                                                                                                                                                                        including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software

                                                                                                                                                                                        1. 3

                                                                                                                                                                                          Ah, thank you. I wasn’t aware of this difference between MIT and bsd. I suppose I’ll have to check apache too.

                                                                                                                                                                                          Some more on mit vs bsd: https://opensource.stackexchange.com/questions/217/what-are-the-essential-differences-between-the-bsd-and-mit-licences

                                                                                                                                                                                      2. 1

                                                                                                                                                                                        Note that this is different from explicit grants of re-license, like GPL v2 (i think) has a provision “or any later version”.

                                                                                                                                                                                        So if I get a gpl2 file I can choose to distribute it as gplv3.