1. 11

    Perhaps unfortunately, I think this article is more interesting for the assumptions the author makes, or glosses over than the actual content. A few I noticed were:

    • There’s no metadata header required per object in memory (there must be somewhere, as you’ll need it for both knowing which fields in an object need to be traced by the GC, and also run type casts)
    • That other languages that experience fragmentation do so simply because they don’t use a “modern” allocator, and that it’s not a tradeoff between space utilisation, allocation / deallocation speed and runtime overhead.

    The hackernews comments have a lot more details on mistakes, too, if that’s your jam.

    1. 9

      Agreed. I mostly agree with the author, (and like him I prefer Go to Java) but there’s a lot I think he overlooks or got wrong.

      • He did actually talk about the object metadata in Java; that’s the Klass* in the OpenJDK header. But Go must need something similar, even if it’s statically typed, since the GC doesn’t know compile-time types.
      • He mentions Java objects having to store mutexes. It’s true that synchronized was one of Java’s biggest mistakes, but to my knowledge the HotSpot runtime stores mutexes in a side table not in the object header.
      • He mentions cache coherency, but not that relocating objects is terrible for cache performance.
      • “Sooner or later you need to do compaction, which involves moving data around and fixing pointers. An Arena allocator does not have to do that.” Yes, but on the flip side, an arena allocator keeps the whole arena in memory as long as even one object in it is still in use — thi can cause memory bloat requiring you to track down how pointers are escaping.
      • “it’s likely to be more efficient to allocate memory using a set of per-thread caches, and at that point you’ve lost the advantages of a bump allocator.” My understanding is that you get around that with per-thread bump allocators.
      • “Modern memory allocators such as Google’s TCMalloc or Intel’s Scalable Malloc do not fragment memory.” — There’s no such thing as a non-fragmenting allocator, unless all your allocations are the same size. Modern allocators just fragment less.
      1. 1

        Yes, but on the flip side, an arena allocator keeps the whole arena in memory as long as even one object in it is still in use — thi can cause memory bloat requiring you to track down how pointers are escaping.

        You can still selectively free unused pages in a nearly-empty arena to save RSS. But that’s a tricky latency/throughput tradeoff since madvise() is so expensive.

      2. 4

        Go’s GC needs to know the start and size for all objects, and where pointers are in them. Start and size are easy for objects up to 32 Kb, which are allocated in size classes in a slab-style allocator; given a memory address, you can determine what slab class it’s in, which gives you both start and size using no per-object metadata. For pointers, the simple version is that Go keeps a bitmap of where there are active pointers in a given area (page, etc) of memory. The pointer alignment requirements mean that this bitmap can be quite dense. As an optimization, there are special slab classes for ‘size class X and contains no interior pointers’, which need no bitmaps at all.

        The important thing about these is that while they both require metadata, it’s large-scale metadata (and it’s aggregated together in memory, which probably helps efficient access as compared to chasing pointers to per-type information for each object).

        One corollary of this is that Go’s GC doesn’t actually know the type of any object in memory. It might not even know the exact size, since some size classes are ranges of sizes.

        (I don’t know exactly how Go implements objects larger than 32 Kb, but I can think of various plausible approaches.)

        1. 1

          A number of JVMs optimise layout for GC by sorting all pointer fields before non-pointer fields. This means that they only need to carry one integer value for GC metadata: everything from the start of the object to some per-type offset is a pointer. This also means that the scanning logic plays nicely with branch predictors and caches.

          This is, as far as I am aware, the only down side of having value types in the language. .NET, for example, cannot do this trick because it has value types and so a class may interleave pointer and non-pointer data in a way that is exposed to the abstract machine. I believe the same is true of Go.

          1. 1

            Which JVMs do this? Also, how does that interact with inheritance? In Hotspot, I believe that the fields for the subclass are just tacked on to the end of the fields from the parent class. If you reorder the fields for the subclass, you’d then need to generate different instructions when you JIT compile methods, right?

            1. 1

              Which JVMs do this?

              I think Hotspot did it, not sure if it still does.

              In Hotspot, I believe that the fields for the subclass are just tacked on to the end of the fields from the parent class. If you reorder the fields for the subclass, you’d then need to generate different instructions when you JIT compile methods, right?

              Sorry, I missed half of the explanation. The class pointer points to the middle of the object. Primitives go in front of the class pointer, pointers go afterwards. For subclasses, you just append pointers and prepend primitives. This isn’t possible if you have multiple inheritance but Java has only single inheritance.

        2. 3

          The lack of metadata header is interesting in Go because there is one, it’s just not always there and that has other tradeoffs. In Go, a pointer is either to a concrete type or to an interface. If it’s an interface then it’s a fat pointer that has to carry around both a pointer to the metadata header and a pointer to the object. This isn’t necessarily better or worse than the Java[1] model, it optimises for different things. In the Go model, any time you’re passing an object around via an interface, you need to pass two pointers. A Go array, slice, or a map of interface{} is twice as big as a Java array of Object. In exchange for this, you get smaller objects and if you don’t do dynamic dispatch, Java has more overhead. If you do, Go has more overhead.

          I generally prefer the Go approach to Java or .NET because it means that you pay for dynamic dispatch only when you want to use it and it moves the decision about whether to pay the cost to the caller.

          The other interesting thing that Go does is to separate the code-reuse mechanism from the subtyping mechanism. Pony also does this and we’re going to do the same for Verona. This conflation is part of the reason that Smalltalk-family languages are difficult to compile to efficient code. Anamorphic Smalltalk (StrongTalk) differentiated in the VM between subclassing (used for composition at the implementation) and subtyping (used at the call site to determine the set of allowed callees), even though the source language conflated them. Go doesn’t have a notion of subtyping but does have some syntactic sugar to make composition easier. Concrete types in Go are never subtypes of other concrete types, but interfaces can be subtypes of other interfaces and concrete types can be subtypes of interfaces. This is a very nice property, from an implementation perspective.

          [1] Note: a lot of things that this article ascribes to Java were inherited directly from Smalltalk and pre-date the GC research that the author is talking about by at least a year.

          1. 1

            This is a dilemma for any system that uses vtbls for dynamic dispatch. You can either put the vtbl pointer in the object (like Java or C++), or in a fat pointer next to the object pointer (like Go interfaces or Rust trait objects). I’m curious which other languages have taken the latter approach.

        1. 4

          This is a pretty good overview! Using the .init_array section to hide code the OS runs for you on load is pretty ingenious. Also as a die-hard member of the “crates.io without namespacing is just fine the way it is” camp (and let’s not start that debate again please), the “misleading name” part is a pretty good argument for privileged namespaces.

          Some of these exploits can be mitigated by efforts like cargo-crev and rustsec, which I have high hopes for since they have tools for checking dependencies and those can be built into automated workflows. But they still require work both for people to find the vulnerabilities and heed them when they are found.

          1. 1

            Not asking you to restart the namespaces debate, but do you know what I should look for to find what people have said about it?

            1. 2

              users.rust-lang.org has several megathreads about it with lots of arguments for and against. There are lots of reasons, but ultimately it’s crates.io team’s decision. They know the trade-offs, and have deliberately chosen a single namespace.

          1. 8

            How does this compare to other tools? Coincidentally, I just used PlantUML for the first time this past week (had to admit that my policy of “don’t do UML” isn’t infinitely sustainable).

            1. 2

              +1 for PlantUML - very nice tool with a VS Code plugin.

              1. 2

                I used to have that policy; but I eventually realised that what I was railing against was (in my opinion) misuse of UML; UML as a tool for communication is perfectly helpful. Especially sequence diagrams; I love those.

                Also another happy PlantUML user here :) It works really well with orgmode and babel to autogenerate diagrams in HTML documents.

              1. 3

                I think this is a valuable switch of perspective, but taken together the bulk of these techniques sound like “write your own specific mocks instead of using a mocking library” to me

                1. 1

                  I think there is a difference still, perhaps just in that the scope of the “mocks” is smaller, but a more thorough explanation would be good.

                1. 1

                  There’s a lot of interesting material here (more than I’ve fully digested), but a lot of the specifics seem debatable. To take one small instance, colocating test factory methods with the production code seems odd. The advantage is a minor increase in ease of maintenance, at the cost of polluting your API. I’m not sure it seems like a worthwhile tradeoff to me.

                  1. 5

                    Never used in anger, but I really enjoyed this: https://apenwarr.ca/log/20171213.

                    1. 6

                      Here’s a relevant research paper on this subject: https://www.cs.cornell.edu/~rahul/papers/pwtypos.pdf.

                      My non-expert judgment impression is that correcting a common subset of typos can be done securely, and is a win.

                      1. 2

                        Some further sources can be found at the Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/entries/goedel-incompleteness/#GdeArgAgaMec

                        1. 1

                          Thanks. The SEP analysis of why the argument fails is much better than the one I was going to put forth.

                        1. 5

                          No, I don’t.

                          I use new tools, but I pick them up in a relatively disorganized way. The closest thing to practice I do is that I put some common commands and concepts I wanted to remember into Anki. I’m 50-50 on whether that was useful (potentially because of poor choices around the entries I added). I also keep a text file of common issues I’ve run into and how to solve them. I view that habit as definitely positive, but it’s not really practice per se.

                          If you’re interested in this topic, I appreciated a recent thread by Dan Luu: https://twitter.com/danluu/status/1442945072144678914

                          Michael Malis has written about recording himself solving issues, and reviewing it later: https://malisper.me/recording-footage-of-myself-investigating-a-performance-issue/.

                          1. 1

                            I can’t quite make this precise, but I have a worry about this study.

                            I think programmers are typically targeting coverage when they produce test cases. I suspect most test cases are written for one of three reasons:

                            1. Writing tests to cover new functionality or help refactor existing functionality (usually as prelude to introducing new functionality)
                            2. Writing tests to ensure that overall coverage becomes/remains high (in orgs that mandate some level of coverage).
                            3. Writing tests in response to specific bugs (fix the issue, write a test so it can’t happen again). This may or may not increase coverage.

                            2 of the 3 reasons explicitly mention the idea that tests should increase coverage.[1]

                            If your test cases are written in a way that targets coverage, then wouldn’t you expect that the fraction of the test suite you throw away would be highly correlated with coverage?

                            If that’s right, then is it the case that you can replace coverage with size only because programmers mostly write tests that increase coverage–they’re less often adding tests that keep coverage the same, but improve overall reliability. In that case, it’s still reasonable to look at a test suite and focus on adding coverage to the places that are not already covered.

                            [1] This is true whether or not you’re actually using a code coverage tool. At $DAYJOB$ CI doesn’t flag code coverage, but development still emphasizes it. The way this will work is that you’ll open a PR, and if it doesn’t have tests, the first question is likely to be “why doesn’t this have tests?” Actual measurement of coverage would be better, but this is the primitive approach using human brains to do a machine’s job.

                            1. 11

                              Ah.

                              My pet hobby horse.

                              Let me ride it.

                              It’s a source of great frustration to me that formal methods academia, compiler writers and programmers are missing the great opportunity of our life time.

                              Design by Contract.

                              (Small Digression: The industry is hopelessly confused by what is meant by an assert. And subtle disagreements about what is meant or implied by different programmers is an unending source of programmers talking passed each other).

                              You’re welcome to your own opinion, but for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment.

                              By an assert in the following I mean, it’s a programmer written boolean expression, that if it ever evaluates to false, the programmer knows that the preceding code has an unknown bug that can only be fixed or handled by a new version of the code.

                              If it evaluates to true, the programmer full expects the subsequent code to work and that code will fully rely on the assert expression being true.

                              In fact, if the assert expression is false, the programmer is certain that the subsequent code will fail to work, so much so, there is no point in executing it.

                              So going back to DbC and formal methods.

                              Seriously. Writing postconditions is harder than just writing the code. Formal methods are way harder than just programming.

                              But we can get 90% of the benefit by specializing the postconditions to a few interesting cases…. aka. Unit testing.

                              So where can Formal Methods really help?

                              Assuming we’re choosing languages that aren’t packed with horrid corner cases… (eg. Signed integer overflow in C)…

                              Given a Design by Contract style of programming, where every function has a bunch of precondition asserts and a bunch of specializations of the postconditions……

                              My dream is a future where formal methods academics team up with compiler writers and give us…

                              • Assuming that every upstream assert expression is true, if it can be determined that any downstream assert will fail, the compile will fail with a useful warning.
                              • Where for every type, the programmer can associate an invariant expression, and the compiler will attempt to verify that it is true at the end of the constructor, and the start and end of every public method and at the start of the destructor. If it can’t, it will fail the compile with a warning.
                              • Wherever a type is used, the invariant expression can be used in this reasoning described above.

                              So far, you might say, why involved the compiler? Why not a standalone linter?

                              Answer is simple… allow the optimizer to rely on these expressions being true, and make any downstream optimizations and simplifications based on the validity of these expressions.

                              A lot of optimizations are base on dataflow analysis, if the analysis can be informed by asserts, and the analysis can check the asserts, and be made more powerful and insightful by relying on these asserts… then we will get a massive step forward in performance.

                              My experience of using a standalone linter like splint… is it forces you to write in a language that is almost, but not quite like C. I’d much rather whatever is parsed as valid (although perhaps buggy) program in the language by the compiler, is parsed and accepted as a valid program by the linter (although hopefully it will warn if it is buggy), and vice versa.

                              I can hear certain well known lobste.rs starting to the scream about C optimizers relying on no signed integer overflow since that would be, according the standard, undefined and resulting in generated assembler that results in surprised pikachu faced programmers.

                              I’m not talking about C. C has too much confused history.

                              I’m talking about a new language that out of the gate takes asserts to have the meaning I describe and explains carefully to all users that asserts have power, lots and lots of power, to both fail your compile AND optimize your program.

                              1. 5

                                As someone who has been using Frama-C quite a lot lately, I can’t but agree with this. There’s potential for a “faster than C” language that is also safer than C because you have to be explicit with things like overflow and proving that some code can’t crash. Never assume. Instead, prove.

                                1. 3

                                  for the following to make any sense at all, put aside whatever you mean by “assert” for the duration, and accept what I mean. You can go back to your meaning after we have finished discussing this comment

                                  Didactic/pedagogical critique: in such a case, it may be more appropriate to introduce a new term rather than using one which has a common lay meaning.

                                  My dream is a future where formal methods academics team up with compiler writers and give us […]

                                  Sounds a lot like symbolic execution.

                                  1. 3

                                    using one which has a common lay meaning.

                                    Does assertions have a different meaning than the one given here?

                                    1. 4

                                      I have colleagues for whom it means, “Gee, I didn’t think that input from outside the system was possible, so I want to know about it if I see it in unit test, and log it in production, but I must still handle it as a possibility”.

                                      When I personally put in an assert at such a point, I mean, “a higher layer has validated the inputs already, and such a value is by design not possible, and this assert documents AND checks that is true, so in my subsequent code I clearly don’t and won’t handle that case”.

                                      I have also seen debates online where people clearly use it to check for stuff during debugging, and then assume it is compiled out in production and hence has no further influence or value in production.

                                      1. 1

                                        GTest (Google Test), which I’ve recently had to use for C++ school assignments, refers to this in their macro names as EXPECT. Conditions whose failure is fatal are labelled with ASSERT. This makes intuitive sense to me: if you expect something to be true you accept its potential falsehood, whereas when you assert something to be true you reject its potential falsehood.

                                        1. 1

                                          TIL! Thanks! I only knew about assertions from the contract perspective.

                                        2. 1

                                          A common design choice is that assertions are evaluated in test environments, but not production. In that case, plus a test environment that you’re not confident fully covers production use cases, you might use assertions for hypotheses about the system that you’re not confident you can turn into an error yet.

                                          I’m not sure that’s a good idea, but it’s basically how we’ve used assertions at my current company.

                                        3. 2

                                          Alas, my aim there is to point out people think there is a common lay meaning…. but I have been involved in enough long raging arguments online and in person to realize… everybody means whatever they damn want to mean when they write assert. And most get confused and angry when you corner them and ask them exactly what they meant.

                                          However, the DbC meaning is pretty clear and for decades explicitly uses the term “assert”… except a lot of people get stuck on their own meaning of “assert” and conclude DbC is useless.

                                          Sounds a lot like symbolic execution.

                                          Ahh, there is that black or white thinking again that drives me nuts.

                                          Symbolic execution and program proving is a false aim. The halting problem and horrible things like busy beavers and horrible fuzzy requirements at the UX end of things make it certain that automated end to end program proving simply will never happen.

                                          That said, it can be incredibly useful. It’s limited for sure, but within it’s limits it can be extraordinarily valuable.

                                          Odds on symbolic execution is going to fail on a production scale system. Not a chance.

                                          However, it will be able to reason from assert A to assert B that given A, B will fail in these odd ball corner cases… ie. You have a bug. Hey, thats’ your grandfathers lint on steroids!

                                        4. 2

                                          You might find the Lean theorem proving language meets some of your requirements. As an example:

                                          structure Substring :=
                                          ( original : string )
                                          ( offset length : ℕ )
                                          ( invariant : offset + length ≤ original.length )
                                          

                                          In order to construct an instance of this Substring type, my code has to provide proof of that invariant proposition. Any function that consumes this type can rely on that invariant to be constrained by the compiler, and can also make use of that invariant to prove proofs about the function’s postcondition.

                                        1. 32

                                          This seems like an oversimplification, and an unreasonable one.

                                          This model is true if a dependency has a fixed benefit and a cost that scales linearly with the cost of your app. There probably are some dependencies that work that way, but I doubt it’s all, and I’m not even sure if it’s most. The author mentions just using jQuery, but comparing jQuery and React, both the costs and benefits of React scale with the size of the app.

                                          In other cases, building your own solution might be great when you’re small–you build a solution that works perfectly for your use case, and avoid the complexity of the dependency. But as you grow, you may find yourself wandering into increased complexity. If that happens, you can end up reimplementing the complexity of the dependency, but without the benefit of a community that has already solved some of those problems.

                                          A final issue with NIH is that it can impose a real cost when dealing with turnover and on-boarding new people. The tacit knowledge that current employees has is lost, and new employees will expect things to work like “standard” tools they’ve used elsewhere.

                                          …that was three somewhat pro-dependency paragraphs, but that’s not the whole picture either. The same dynamic that I’ve cited above can go in reverse. There have been times when my judgment is that depencies don’t pull their weight, but for the opposite reason the author thought–I didn’t think we’d be using enough functionality to justify it. Our version might not be as robust, but it would be 10-100x less code, perfectly match our use case, and we could adjust the functionality in a single commit.

                                          The real point isn’t to be pro or anti dependency, but to argue that you need to understand how the costs or benefits of any particular dependency will play out for your project.

                                          1. 5

                                            The tacit knowledge that current employees has is lost, and new employees will expect things to work like “standard” tools they’ve used elsewhere.

                                            You meant this part as pro-dependency, but I think it can work equally well for anti-dependency.

                                            “Standard” carries a lot of weight here, and assumes that the bespoke tool is harder to learn than the standard one. It also assumes that every current devs, and new ones, are proficient in the standard tool. But if that “standard” tool is React, or Angular, say, this often won’t be the case.

                                            I have personally seen all of React brought in to add a little bit of dynamic functionality to one part of one page. On a team where most people had never used React and about 50 lines of vanilla JS could have done the same job as the 50 lines of React did (and we ultimately did rewrite it to remove React).

                                            It’s not just about the tradeoff between “re-implementation cost” and “learning cost.” It’s about accurately measuring both. But ime the full costs of dependencies are rarely measured correctly. This failure is magnified by the programmer’s love of new toys, and the rationalizations they’ll make to satisfy it.

                                            1. 3

                                              The tacit knowledge that current employees has is lost, and new employees will expect things to work like “standard” tools they’ve used elsewhere.

                                              An organisation can try to do an industry scale inverse Conway’s maneuver by pushing its internal solution as a “standard.” When it works, the organisation reduces their cost of onboarding!

                                              1. 1

                                                After your first sentence, I was expecting a strong, non-nuanced opinion.

                                                But your comment is nicely nuanced and balanced and drills right to the root issue. Thanks!

                                                1. 1

                                                  As org gets big they have more money to consider inhouse solutions.

                                                  As for the inverse graph, would it apply to something like git or gcc? You got big enough the benefit of using gcc goes into the negative (or zero)?

                                                  1. 3

                                                    That may be true in isolation, but you need to factor in two other things: competition and opportunity cost. A company the size of Apple or Google could easily afford to create an in-house GCC competitor, but the money / engineer time spent doing that is money and time spent not doing something else. Apple was very explicit when they started investing in LLVM that they didn’t consider a compiler to be a competitive advantage. Their IDE tooling was, but a toolchain is just table stakes. By investing in an open solution that other companies are also putting money into, they benefit in two ways:

                                                    • They aren’t paying all of the costs (for a while, Apple was paying for over 50% of LLVM development, that’s dropped off now as others contribute more), so their opportunity cost is lower.
                                                    • They have the same baseline as their competitors in something that they don’t consider a differentiating feature.
                                                  2. 1

                                                    Addendum, since my previous post has a negative tone: I appreciate that the author asked how the cost/benefit analysis changes as a project scales. While I obviously disagree with the generic answer, it’s a good frame and I’ll keep it in mind whenever I try and talk about the subject.

                                                  1. 20

                                                    It’d be nice to have some actual background on hashing in here instead of just broad generalizations and links to various hash functions. Examples:

                                                    • There’s no mention of cyclic redundancy checks and why they are not valid as crypto functions (a mistake some programmers have made).
                                                    • There’s no mention of avalanche effects, which is a good way of seeing how “random” a digest scheme is (with some implications for how well the output can be predicted/controlled by an attacker).
                                                    • The mentioned attack on JSON hash tables in PHP (if you dig into it) would’ve been a great place to talk about trivial hashes (e.g., f(x) =0 or f(x)=x) and why they cause problems even in non-hostile environments, but that would’ve required more of an introduction to how hashing works…)
                                                    • Lots of usage of jargon like “non-invertible”, “collision-resistance”, “preimage attack resistance”, etc. which is probably inaccessible if your audience is programmers who “don’t understand hash functions”.
                                                    • There’s not really an explanation about the differences/similarities of crypto-strong hash functions, password hash functions, and key derivation functions, other than a mention that there is some relation but which isn’t elaborated on at all.
                                                    • There’s not really any useful information at all about perceptual hashing vs other forms of multimedia digest approaches–there’s just some Apple hate.
                                                    • etc.

                                                    Programmers might not understand hash functions, but infosec furries may also not understand pedagogy.

                                                    (also, can you please cool it with the inflammatory article headlines?)

                                                    1. 24

                                                      Programmers might not understand hash functions, but infosec furries may also not understand pedagogy.

                                                      Please don’t pick a fight. It seems more angry than friendly.

                                                      1. 22

                                                        Honestly I think it’s a valid concern. One of the biggest problems with the computer security world, as stated repeatedly by leading experts in the field, is communication and teaching.

                                                        1. 23

                                                          A valid concern would be “infosec experts may not understand pedagogy” but why call out “infosec furries” specifically? Unless we should be concerned about infosec furries in particular vs other infosec experts?

                                                          Are these acceptable?

                                                          • but infosec gays may also not understand pedagogy
                                                          • but infosec women may also not understand pedagogy
                                                          • but infosec people of color may also not understand pedagogy

                                                          No. So why furries? People need to get over it and quit furry bashing. This isn’t acceptable behavior on Lobste.rs, and I’m tired of it.

                                                          1. 3

                                                            See elsewhere for the explanation; furry bashing doesn’t enter into it, though I see why you might have read it that way. Furries are internet denizens like the rest of us, with all that entails.

                                                            1. 12

                                                              I agree with you that it’s a bad title.

                                                              I also think that you wouldn’t have reacted nearly this strongly to the title if it wasn’t a furry blog.

                                                              1. 11

                                                                I read your other comments. But you said what you said, and that undermines all your pontificating about the harm of “insulting/demeaning a group” and “the sort of microaggression/toxicity that everybody talks so much about.” Take your own advice.

                                                              2. 2

                                                                “Furry” is a kink, not an identity or protected class. And normally you have to get people’s consent before you bring them into your kink.

                                                                1. 7

                                                                  I don’t see any sexual imagery in this blog post.

                                                                  1. 2

                                                                    The OP’s site has some pretty well reasoned and presented articles on precisely why “furry” cannot reasonably be summarized as “a kink”.

                                                                    And, no, you do not “normally” have to get someone’s consent to introduce them to the idea of your kink, unless said introduction involves you engaging them in the practice of your kink.

                                                                  2. 1

                                                                    Sorry, I didn’t realize the “furry” part was what you were opposed to. It sounded like you were upset with the implication that the infosec world is bad at teaching.

                                                              3. 6

                                                                Programmers might not understand hash functions, but infosec furries may also not understand pedagogy.

                                                                (also, can you please cool it with the inflammatory article headlines?)

                                                                https://www.youtube.com/watch?v=S2xHZPH5Sng

                                                                1. 10

                                                                  One of the things he talks about there is testing the hypothesis and seeing which title actually worked. I only clicked this link because I recognized your domain name and knew you had written interesting articles in the past and might legitimately explain something I didn’t know. If not for that, I probably would have bypassed it since the title alone was not interesting at all.

                                                                  1. 9

                                                                    Even so, it is still possible to write clickbait titles that aren’t predicated on insulting/demeaning a group.

                                                                    • “Hash functions: hard or just misunderstood?”
                                                                    • “Things I wish more programmers knew about hashes”
                                                                    • “Programmer hashes are not infosec hashes”
                                                                    • “Are you hashing wrong? It’s more common than you might think”
                                                                    • “uwu whats this notices ur hash function

                                                                    How would you feel if I wrote “Gay furries don’t understand blog posting”? Even if I raise good points, and even if more people would click on it (out of outrage, presumably), it would still probably annoy a gay furry who wrote blogs and they’d go in with their hackles raised.

                                                                    1. 8

                                                                      The important difference between what I wrote and your hypothetical is the difference between punching up and punching down.

                                                                      My original title was along the same lines as “Falsehoods Programmers Believe About _____” but I’ve grown a distaste for the cliche.

                                                                      1. 7

                                                                        The difference between “Programmers don’t understand hash functions” and “Gay furries don’t understand blog posting” is quite obvious to me and I definitely don’t want to engage in whatever Internet flame is going on here. Especially since, uh, I have a preeetty good idea about what the problem here is, and I tend to think it’s about gay furries, not article titles, which is definitely not a problem that I have. (This should probably be obvious but since I’m posting in this particular thread, I wanted to make sure :P).

                                                                        But I also think this title really is needlessly nasty, independent of how it might be titled if it were about other audiences. It’s a bad generalisation – there are, in fact, plenty of programmers who understand hash functions – and it’s not exactly encouraging to those programmers who want to get into security, or who think their understanding of these matters is insufficient.

                                                                        I am (or was?) one of them – this was an interest of mine many, many years ago, at a time when I was way too young to understand the advanced math. My career took me elsewhere, and not always where I wanted to go, and I tried to keep an eye on these things in the hope that maybe one day it’ll take me there. Needless to say, there’s only so much you can learn about these topics by spending a couple of evenings once in a blue moon studying them, so I never really got to be any good at it. So I think the explanation is amazing, but it would definitely benefit from not reminding me of my inadequacy.

                                                                        And I’m in a happy boat, actually, this is only an interest of mine – but there are plenty of people who have to do it as part of their jobs, are not provided with adequate training of any kind, have no time to figure it out on their own, and regularly get yelled at when they get it wrong.

                                                                        Now, I realise the title is tongue-in-cheek to some degree, the playful furries and the clever humour scattered throughout the post sort of gives it away. If you think about it for a moment it’s pretty clear that this is meant to grab attention, not remind people how much they suck. But it’s worth remembering that, in an age where web syndication is taken for granted to the point where it sounds like a Middle English term, this context isn’t carried everywhere. Case in point, this lobste.rs page includes only the title. Some people might react to it by clicking because you grabbed their attention, but others might just say yeah, thanks for reminding me, I’ll go cry in a corner.

                                                                        Even if I didn’t realise it was tongue-in-cheek, it probably wouldn’t bother me, partly because I understand how writing “competitively” works (ironically, from around the same time), partly because I’ve developed a thick skin, and partly because, honestly, I’ve kindda given up on it, so I don’t care about it as much as I once did. But I can see why others would not feel the same way at all. You shouldn’t count on your audience having a thick skin or being old enough to have given up on most of their dreams anyway.

                                                                        I know this is a real struggle because that’s just how blogs and blogging work today. You have to compete for attention to some degree, and this is particularly important when a large part of the technical audience is “confined” to places like HN and lobste.rs, where you have to grab attention through the title because there’s nothing else to grab attention through. But maybe you can find a kinder way to grab it, I dunno, maybe a clever pun? That never hurt anyone. These radical, blunt (supposedly “bluntly honest” but that’s just wishful thinking) headlines are all the rage in “big” Internet media because, just like Internet trolls, they thrive on controversy, us vs. them and a feeling of smugness, but is that really the kind of thing you want to borrow?

                                                                        (Edit: just to make sure I get the other part of my message across, because I think it’s even more important: title aside, which could be nicer, the article was super bloody amazing: the explanation’s great, and I like the additional pointers, and the humour, and yes, the drawings! Please don’t take any of all that stuff above as a criticism of some sort: I wanted to present a different viewpoint from which the title might read differently than you intended, not that the article is bad. It’s not!)

                                                                        1. 15

                                                                          How do you know that you’re punching up?

                                                                          What if the person encountering your blog is a programmer from an underrepresented background, just barely overcoming imposter syndrome, and now here’s this scary suggestion that they don’t understand hash functions? What if they actually made one of the mistakes in the article, and feel like they’re a complete fraud, and should leave the industry? This is the sort of microaggression/toxicity that everybody talks so much about, if I’m not mistaken.

                                                                          The point is: you don’t know. You can’t know.

                                                                          So, err on the side of not adding more negative shit to the world accidentally in the name of pageviews–especially when there are many, many other more positive options in easy reach.

                                                                          EDIT:

                                                                          I wouldn’t care if it weren’t for the fact that you’re a smart dude and clearly passionate about your work and that you have good knowledge to share, and that it pains me to see somebody making mistakes I’ve made in the past.

                                                                          1. 8

                                                                            I wouldn’t care if it weren’t for the fact that you’re a smart dude and clearly passionate about your work

                                                                            I’m neither of those things :P

                                                                            and that you have good knowledge to share, and that it pains me to see somebody making mistakes I’ve made in the past.

                                                                            I appreciate your compassion on this subject. It’s definitely new territory for me (since forever I’ve been in the “boring headline out of clickbait adversion” territory).

                                                                            1. 9

                                                                              Do you actually not see a difference between saying a slightly negative thing about people of a certain profession and how they engage in that profession, and an ad-hominem using sexual orientation? What a weird and bad analogy?

                                                                              I’m trying to assume good intent here but all your comments make it sound like you’re annoyed at the furry pics and awkwardly trying to use cancel culture to lash out the author.

                                                                              1. 7

                                                                                Neither the label of programmers (with which I identify) nor of gay furries (with which the author identifies, according to their writing) is being misapplied. I’m sorry you feel that a plain statement of fact is somehow derogatory–there is nothing wrong with being a proud programmer or a proud gay furry.

                                                                                My point in giving that example was to critique the used construction of “ is ”. I picked that label because the author identified with it, and I picked the “bad at blogging” because it’s pretty obviously incorrect in its bluntness. If I had picked “lobsters” or “internet randos” the conjured association for the person I was in discussion with may not have had the same impact it that “programmers” had on me, so I went with what seemed reasonable.

                                                                                1. 4

                                                                                  What do you gain by emphasizing soatok’s sexual identity, other than this morass of objections?

                                                                                2. 5

                                                                                  I’m trying to assume good intent here

                                                                                  that’s exactly what friendlysock is hoping for

                                                                                  1. 5

                                                                                    you’re right but it’s best not to feed them

                                                                                  2. 8

                                                                                    What if the person encountering your blog is a programmer from an underrepresented background, just barely overcoming imposter syndrome, and now here’s this scary suggestion that they don’t understand hash functions?

                                                                                    Or they may read this and think ‘I’m glad it’s not just me!’. As a programmer who probably has a better than average understanding of hash functions, I don’t feel demeaned by this generalisation, if I were worried about my level of understanding I’d feel comforted by the idea that I wasn’t in a minority in my lack of understanding.

                                                                                    What if they actually made one of the mistakes in the article, and feel like they’re a complete fraud, and should leave the industry?

                                                                                    Or they may feel better that this mistake is so common that someone writes about it on a list of mistakes programmers make.

                                                                                    1. 1

                                                                                      What if the person encountering your blog is a programmer from an underrepresented background….

                                                                                      While I said you’re picking a fight (and would add: “look at the thread, it’s a fight”), I see what you’re saying in this paragraph. I also value non-judgmental explanations.

                                                                                  3. 6

                                                                                    My problem with the title isn’t that it’s insulting, but that it’s inaccurate. Clearly some programmers do understand hash functions, even if other programmers do not. If nothing else, @soatok, a programmer, presumably understands hash functions, or why else would he write a blog post purporting to explain the right way to use them?

                                                                                    Programmers don’t understand hash functions, and I can demonstrate this to most of the people that will read this with a single observation:

                                                                                    When you saw the words “hash function” in the title, you might have assumed this was going to be a blog post about password storage.

                                                                                    Specifically is wrong, at least about me, and almost certainly among other programmers as well. I don’t claim to have deep knowledge about cryptography, and I do expect that there’s probably something I could learn from this blog post, which I will read more carefully when I have a chance. But I am aware that the computer science concept of hash functions is useful for a variety of programming problems, and not just storing password-related data.

                                                                              1. 6

                                                                                One more wrinkle: “are late bugs more expensive to fix” isn’t even the question we need to answer. It’s an intellectually interesting question, but the question we actually need to answer is “how should we develop software?”

                                                                                If bugs we discover late are more expensive, that suggests that we could save money by finding them earlier. But that’s not a guarantee–perhaps they’re more expensive to fix because they’re just harder to find, not because they’re found later? I’m happy to assume it’s some of the second, even without research, but they’re two different explanations of the hypothesized phenomenon.

                                                                                1. 2

                                                                                  Agreed. Oftentimes the better question wasn’t “When should we fix these bugs?” but rather “How might this new feature impact the bug profile of our codebase” or even “Should we spend code on this feature in the first place?”

                                                                                  1. 2

                                                                                    That’s a great point, and to add to it, I think looking at whether a specific bug would have been cheaper to fix earlier discounts the possibility that the cost of finding the bug earlier would have exceeded the cost of fixing it later. My intuition is that that must be true at least some of the time.

                                                                                  1. 2

                                                                                    This was very helpful for me, as someone who’s taken a look at osh/oil from time to time, but never gone that deep. A few observations/requests to say certain things explicitly.

                                                                                    In the list section, you mention arrays and lists. Are they different? Are dict keys always strings? Can they be quoted strings with underscores or meta chars? Are expressions always in parens?

                                                                                    Structurally, I wonder if it might make sense to move some examples to the top to give people more of a feel, before talking about words, and other things that might matter less to someone who lands on the page without familiarity with the project.

                                                                                    1. 2

                                                                                      Thanks, this is great feedback, I will update the doc. (Although some of it will have to go on linked docs to keep the length down.)

                                                                                      • Yes good point array vs. list is confusing. Oil is like Python – it only has lists. I sometimes use “array” to mean “list of strings” but I think I should settle on one term. I want to keep consistency with Python’s terminology, but “list” is sort of a bad name, since it reminds people of linked list? JavaScript uses “array” so maybe I should adopt that terminology.
                                                                                        • JS: array and “object” (object isn’t good)
                                                                                        • Python: list and dict
                                                                                        • Oil: array and dict? Or maybe array and map? Not sure if that’s a good idea.
                                                                                      • The dict keys have to be strings, unlike Python. The rule is the same as in JavaScript, except with the -> operator instead of . (dot)
                                                                                        • d->key is the same as d['key']
                                                                                        • If you have special chars, do d['foo+bar'], since d->foo+bar is parsed like addition.
                                                                                      • Expressions aren’t always in parens
                                                                                      • Yes, good point about examples. I got other feedback that the “word/command/expression” stuff was too abstract, so I will try to reduce / de-emphasize it. I think it makes sense for the END of the doc rather than the beginning.

                                                                                      I will update the doc but let me know if you more feedback!

                                                                                    1. 1

                                                                                      Could someone help me understand the “write boilerplate” section? I don’t really understand what they’re arguing for there; would someone fancy explaining it to me in different words and with code snippets? Or is it just a preemptive strike against religious adherence to the next section’s premise (i.e. “don’t write boilerplate”)?

                                                                                      1. 7

                                                                                        The case study from https://matklad.github.io/2020/08/15/concrete-abstraction.html is a great example of this.

                                                                                        TL;DR: in rust-analyzer, we need to convert our internal data structures to LSP wire format, and that’s a lot of conversion. Originally, I tried to dry that code by introducing a generic “convertible” abstraction. That was a mistake: replacing that with just boilerplate somewhat repetitive code to do conversions manually in the simplest possible way reduced complexity a lot (and actually made the code shorter).

                                                                                        1. 1

                                                                                          I like your article and would mostly agree.

                                                                                          Small point: I think Collection ends up being pretty useful for code objects that somehow “accept” a series of things. Perhaps they maintain a set of objects or they act on each object coming in. Then by defining a method as (java syntax):

                                                                                          accept(Collection<T> collection) {
                                                                                              for (T t : collection) {
                                                                                                   accept(t);
                                                                                              }
                                                                                          }
                                                                                          

                                                                                          I don’t have to decide ahead of time what my caller is going to use. It’s true that more often than not you only have a concrete type coming into the method, but I don’t always know what that will be prior to noodling on things for awhile. I might use a list or a set (or less frequently a queue). Having the habit of defining a method as Collection up front saves some time.

                                                                                          1. 1

                                                                                            This doesn’t need Collection abstraction, only Iterable/Iterator are required here. In Rust, that would be fun accept(items: impl IntoIterator<Item=T>).

                                                                                            1. 1

                                                                                              Ah, of course you’re right.

                                                                                              I forgot, because I don’t see Iterable much explicitly in Java, and it’s been a year since I wrote Rust.

                                                                                          2. 1

                                                                                            Not going for an abstraction often allows a for more specific interface. A monad in Haskell is a thing with >>=. Which isn’t telling much. Languages like Rust and OCaml can’t express a general monad, but they still have concrete monads. The >>= is called and_then for futures and flat_map for lists. These names are more specific than >>= and are easier to understand. The >>= is only required if you want to write code generic over type of monad itself, which happens rarely.

                                                                                            This is actually a good example, because in Haskell you do this all the time because monads are so pervasive. I think that’s true because of a few differences:

                                                                                            • Haskell has higher-kinded types, so you can actually parameterize over the type of a monad (which has to be a type constructor). It’s kinda hacky in Rust.
                                                                                            • Monads are very important in Haskell because they’re one of the core abstractions: you use them for state, IO, error handling, and a bunch of other things.
                                                                                            • The do-syntax lets you write monadic code in a vaguely imperative notation, which is useful when writing longer blocks.

                                                                                            Rust, on the other hand, has no HKTs, is an impure language so doesn’t need those abstractions (aside from error-handling, which it uses ? for), and doesn’t have do-syntax.

                                                                                            This isn’t to say that Rust would be better off with >>= or a Monad trait; I don’t think it would. But it goes to show that an abstraction that’s incredibly useful in one language can be not worth the bother in another.

                                                                                        1. 3

                                                                                          Unbeknownst to you, that person actually works at Sony. They didn’t realize this, but their contributions in their spare time are not owned by them and Sony’s lawyers now want to sue you for using their proprietary, patented technology.

                                                                                          If you didn’t get a written statement from that person saying they owned those contributions, and actually intended to release them under your open source license terms - then you could indeed be found in the wrong!

                                                                                          Legally speaking, what does having them sign a CLA do? If Sony has that line in their authors’ employment contract, then it seems like they would own the code regardless. Is the idea something like: the CLA provides some sort of protection because you put up a step asking for authorization so you can say you’re not intentionally using code that’s under others’ copyright?

                                                                                          1. 2

                                                                                            Is the idea something like: the CLA provides some sort of protection because you put up a step asking for authorization so you can say you’re not intentionally using code that’s under others’ copyright?

                                                                                            Yes, exactly that. In one scenario you may be liable for damages (‘profits lost due to using their unlicensed IP’) while in another you could argue to the court that you did due diligence and it’s the contributors fault, not yours.

                                                                                            1. 3

                                                                                              Lack of intent and due diligence are not defenses to copyright infringement. That’s a common misconception you’ll see on YouTube, in disclaimers people try to use when posting copyrighted content, but will not see in the Copyright Act.

                                                                                              On whether people actually have the right to license contributions: please read a few CLAs. See also Apache’s licenses page, which has CLA forms both for companies and for individuals.

                                                                                              1. 3

                                                                                                Lack of intent and due diligence are not defenses to copyright infringement. That’s a common misconception you’ll see on YouTube, in disclaimers people try to use when posting copyrighted content, but will not see in the Copyright Act.

                                                                                                I do think these disclaimers have value though: namely, I find them extremely funny because this is not how it works at all.

                                                                                                In general loads of people seem confused about this. Back when I edited on RationalWiki (many years ago) I tried to clean up the copyright mess of their images a bit, but this was met with significant resistance from some people. They would take some webcomic, slap that on a page, and claim “fair use” as “parody” or “criticism”. But … you’re not actually criticising or parodying the copyrighted work, it’s just used in the course of parodying something else. That’s not how fair use works at all. This mostly fell of deaf ears though 🤷

                                                                                                That entire site is a copyright lawsuit waiting to happen. I’m surprised it hasn’t already happened given that lots of people would just love to see the site shut down.

                                                                                                1. 5

                                                                                                  Yeah, I used to kind of “collect” badly misconceived copyright disclaimers I’d run into online, like some folks stockpile memes. And it was fun, once or twice, to trade best-ofs with fellow copyright lawyers. But I wouldn’t have any fun hyuck-hyucking about it these days, especially in public. In the end, it’s at least partly a failure of the copyright bar that there’s so much bad information out there. It’s got to the point where we’re flooded with bunk second-order information, like those disclaimers, based on more fundamental misconceptions.

                                                                                                  Basically: We’ve really flubbed public education. We didn’t scale up good copyright information with application and enforcement of copyright online.

                                                                                                  On the flip side, there’s what the law allows and there’s what copyright holders allow. Plenty of copying, sharing, and even “remixing” we see online probably wouldn’t hold up in court, or would cost so much to defend that nobody reasonable would try. But also falls beneath cost-benefit or concern for the copyright holders. Some of them even welcome it, so long as they don’t have to formally license it. I know some Star Wars fans who’ve spelled out the rules of the road for fan media and the like, basically a stricter noncommercial, with a few nuances. Those rules are nowhere written, but broadly understood.

                                                                                                  That’s a very normal situation in law. Theory and reality combine where they conflict. But the mismatch also creates opportunities to prey on expectations developed from experience, rather than formal study. So we have photographers running around suing people for reusing their stuff on social media, in all honesty just seeking fair comp and creative control, often against people convinced they’ve done nothing wrong. But we also have folks enforcing copyrights—or even acquiring copyrights to enforce—where the only real financial value in the IP is the power to shake others down for small dollars, even against folks who might win fair use if they could afford to fight. And every shade of grey between.

                                                                                                  I often suspect it’s the really frustrating experiences of folks on the receiving end of claims under rules that don’t match lived experience, moreso than the rules themselves, that makes people so cynical about the legal system. I don’t say that to absolve lawyers. There’s lawyers on every side of this.

                                                                                            2. 1

                                                                                              It shifts the burden to the developer that contributes the code. My understanding is that in the case of a copyright lawsuit, the contributor is now the only person liable to pay for damages.

                                                                                              That’s why, as a developer, you should never sign a CLA. Unless you are ready to pay a lawyer that can advise you on exactly what it means.

                                                                                            1. 8

                                                                                              I’ve heard variants of point #2 before, and I cannot understand it at all. To some people, referring to someone as a user sounds vaguely demeaning, but to me, it’s the obvious term. One reason is that it has the appropriate generality–it lets you abstract over what’s common to the user of emacs, spotify, firefox, and lobste.rs. But also, for some pieces of software, no more specific term works. In a podcast app, you’re a listener, but what are you in emacs, or your terminal emulator? No term more specific than user applies, because the tool is open-ended and powerful.

                                                                                              That said, I’d still recommend this post because point #3 is incredibly important.

                                                                                              1. 4

                                                                                                Relevant XKCD

                                                                                                I’m torn on this, it’s very much a case-by-case thing. Relevant questions to ask include:

                                                                                                • Is there a reference which states that you are not allowed to rely on the bug?
                                                                                                • How many people depend on the bug?
                                                                                                • Is it a security issue?
                                                                                                • How important is backward compatibility in this case?
                                                                                                1. 1

                                                                                                  I think the existence of a reference saying you can’t rely on a behavior is a very weak signal. It appeals to the part of our minds that wants to assign blame for a situation, but it doesn’t really change the consequences of changing the behavior.

                                                                                                  “I told you not to do it, you did it anyway, now I’m stuck supporting it” is a perfectly pragmatic viewpoint. So is “I never talked about it, you relied on it, but I’m going to change it because it’s terrible”.

                                                                                                1. 2

                                                                                                  I’m not opposed to the author’s viewpoint that systemd did the wrong thing, but I disagree with the statement that it’s wrong to call a behavior users rely on a bug. I think the right perspective is that sometimes there are bugs you cannot fix.

                                                                                                  Maybe that’s nitpicking over the meaning of words, but I think it helps focus attention on the ways that our software falls short of what it could be. Acknowledging that these bad behaviors exist makes us understand that we are choosing not to fix them. Our reasons for not fixing them may be valid, but they are a decision, and one which should be made with a focus on both the short-term and the long-term consequences of that decision.