Threads for cgenschwap

  1. 7

    Malleable identifiers make signed patches the default, while allowing users to later change their personal details (email address, name, login…). The full story for synchronising these identifiers from multiple different source repos is not yet completely written. Testing and suggestions are welcome!

    This is an interesting design decision. As far as I can tell, there are two options, both of which are bad. The git option is to make these part of the hash and immutable. This means that:

    • If there’s a GDPR right-to-be-forgotten request on a public repo and you have to erase someone’s email address, you may need to rewrite history and end up with broken downstreams.
    • If you need to fix a commit message after pushing, then you can’t without breaking downstreams.

    The Pijul option to make them mutable is also problematic:

    • It’s easy to change the commit message to a different one after review, which makes it easy to hide changes.
    • You can’t rely on the authorship information for copyright ownership auditing.
    • You can’t detect tampering with commit messages / authorship if the repo is compromised.

    I think my ideal solution would be to make these things separately signed such that you have an audit trail of updates but you can also retain a hash of old versions if you need to delete them from the public tree. This would let you delete someone’s email address but still preserve a public log of the fact that the author was updated after the original commit. It would let you rewrite the commit message but (unless explicitly deleted) preserve the original in the audit trail.

    1. 22

      It’s possible I’m misunderstanding the Pijul docs on this, but I believe Pijul handles the situation quite nicely. The commits themselves are immutable, but instead of signing them with a string of your username/email, you sign them with a public key. There is then a mutable mapping file from public key to author details.

      I think this is a good overall solution, because if an author wants to delete their details they just delete the entry in the map file. But the commits stay immutable like Git.

      1. 4

        I believe what David is saying is that it’s not necessarily enough for the commits to stay immutable. You need an audit trail of changes to authorship as well. If that isn’t a problem for your use cases then Pijul will be fine. If you do need an audit trail of changes to authorship though then Pijul’s mechanism will be problematic.

        1. 16

          If you do need an audit trail of changes to authorship though then Pijul’s mechanism will be problematic.

          Quite the opposite: patches that don’t have a fixed author name must be signed, which is actually stronger than plaintext in the commit, not weaker. Sure, you need to trust the key owner on their identity, but you have the same problem with plaintext author names.

          1. 7

            Ah, I see. Thanks for clarifying.

            In that case, I think the public/private key is the audit trail, no? I don’t think a different author can “claim” the commits, but the original author can change their email address/name. I see it more like scuttlebutt’s approach to naming[1]

            [1] everything is public/private key, and names/nicknames or whatever are mappings individuals can apply independently

            1. 4

              Exactly.

          2. 4

            just delete the entry in the map file

            Do you mean remove just the username/email mapped to the public key or do you remove the public key as well?

            In the former case, what happens when someone sends a “right-to-be-forgotten request” to remove the public key? Surely the author could have used the same public key author at other public places from where they don’t want to remove it, and thus the key itself could be construed as PII.

            In the latter case, how do you actually verify the entire trail without the public keys?

          3. 13

            It’s easy to change the commit message to a different one after review, which makes it easy to hide changes.

            You can’t do that in Pijul, only the mapping between author and signing key.

            You can’t rely on the authorship information for copyright ownership auditing.

            Quite the opposite, the patches are signed! This is stronger than plaintext author names.

            You can’t detect tampering with commit messages / authorship if the repo is compromised.

            You can, if you trust the key or the reviewer’s key.

            1. 6

              Nice, thanks! Since you’re depending on keys, what is your revocation story? If my secret key is compromised, what do I do next?

              1. 2

                This isn’t completely done yet. We decided not to store keys inside the repo, mostly to make it fully mutable, but it isn’t fully fleshed out yet.

                1. 4

                  Thanks. With git, email addresses are immutable and so you can use that in auditing if you have some other mechanism for validating the addresses. For public-key crypto, I worry that this is a harder problem. With git, anyone can fake my email address but my repo can require pushes to have out-of-band authentication (for example, the FreeBSD git repo doesn’t accept pushes with the author set to anyone other than the account associated with the credentials used for the push). I know that any email address in the public FreeBSD git repo is the person to point to if there are problems with the code. An email address is an identifier. You can’t compromise an identifier, you can compromise only the authorisation that’s coupled to that identifier. In the case of FreeBSD, that’s an ssh private key, but that key can be revoked and then the attacker can’t impersonate that user anymore.

                  With a public key, I’m not sure what the infrastructure would look like. If a private key that is used directly to sign commits is compromised then I have no way of temporally bounding the scope of the compromise. Any patch signed with that key is suspect - it may be valid or it may come from the attacker. You might be able to manage with a trusted service that does the signing and includes a trusted time stamp, coupled with a revocation list, so that you can identify suspect patches and have them signed with the new key if they are valid.

                  In general, anything that depends on public-key crypto and doesn’t have a revocation mechanism is suspect. Things that depend on long-lived persistent signed artefacts and don’t have revocation and freshness mechanisms are suspect. Designing this correctly is incredibly hard.

                  1. 2

                    First, I want to stress out that there are two separate issues indeed:

                    • Public key signatures when authoring a patch.
                    • Public keys/Authentication used to run the command that applies it.

                    In general, anything that depends on public-key crypto and doesn’t have a revocation mechanism is suspect.

                    Absolutely. That story in Pijul is not complete, and adding revocation certificates shouldn’t be hard. Btw, the current keys have a non-optional expiration date.

                    Things that depend on long-lived persistent signed artefacts and don’t have revocation and freshness mechanisms are suspect. Designing this correctly is incredibly hard.

                    I fully agree, which is also one of the reasons for this beta: feedback and design discussions on issues like that need to happen before the full 1.0 version.

                    1. 5

                      First, I want to stress out that there are two separate issues indeed:

                      I agree. The one that I’m interested in is the first:

                      Public key signatures when authoring a patch.

                      This becomes a long-term attestation of authorship.

                      Absolutely. That story in Pijul is not complete, and adding revocation certificates shouldn’t be hard. Btw, the current keys have a non-optional expiration date.

                      But what does expiration mean? If I get a patch from a repo and it’s signed with a key that expired, does that mean I shouldn’t trust it? But the repo metadata says that it was committed a year ago, so does that mean I trust it?

                      The root question is: what is the trust relationship that you’re trying to establish with these signatures? A public-key signature is a verifiable attestation of something. For code signatures, it’s an attestation of a particular publisher. This is backed up by two things:

                      • The public key is signed by something (either a CA or by the distributor of the software). This is another layer of public-key-based attestation where the signing party attests that the key is owned by a specific entity and the entity then attests that they created the software. This depends on some form of trust root (typically a set of root certs distributed with the client).
                      • A revocation mechanism that allows you to stop installing the software after you learn that the key was compromised. At this point, the publisher can create a new key pair, sign a new version of the package with the new private key and ask the publisher or CA to sign the new public key.

                      For TLS, something similar works but the question of what to with revoked certs is easier because TLS is for interactive sessions and you don’t want to use a TLS cert to verify the authenticity of a connection log from a year ago.

                      There are a few ways that I can see how you’d apply the first part of this in the context of a revision-control system. For example, the repo could form part of a PKI system and sign the public keys of authorised committers to attest that they have gone through some (repo-specific) confirmation of identity. This, in turn, could be signed by a hosting service to attest that the repo used some specific authentication policy. I’m not really sure if any of that would work though.

                      The really tricky part is the second one. A patch in a repo is a long-lived artefact. It may exist for decades. If a key is compromised then it can be used to sign commits in a repo that has its date set in the past. This means that even if I know the date of compromise then the only thing that a signed patch gives me is an attestation that it was authored either by the entity I think created it or by an attacker. This is not a valuable thing to attest. Having a signature here actually makes the situation worse than just having an identity because you have something that looks like it is a trust anchor but isn’t.

                      I honestly have no idea what a useful solution looks like here. Perhaps you can establish a chain of trust over the flow of patches, rather than the patches themselves, so that it’s not the signature of the patch that matters but the signature of whoever gave you the patch (which could provide a chain of custody for the patch set)? That way, if a private key is compromised then it doesn’t matter because it’s only one of the things in the chain of custody and the next one attests that they are happy that this wasn’t one of the malicious uses of the compromised key. The chain of custody could also include the root of a Merkel tree of a CRL so that you can establish some partial ordering between patches being received by a repo and the signing key being revoked?

                      I fully agree, which is also one of the reasons for this beta: feedback and design discussions on issues like that need to happen before the full 1.0 version.

                      I’m looking forward to seeing what you end up with, the rest of the project looks fantastic.

                      1. 1

                        Having a trusted 3rd party attest that it was presented with a particular hash at or before a particular timestamp seems like a good idea. There’s no reason why you would need to have only TTP, too.

                        you have something that looks like it is a trust anchor but isn’t

                        Honestly I think that’s already a problem. It’s psychologically very easy to assume the authorship on unsigned patches is honest.

                        the next one attests that they are happy that this wasn’t one of the malicious uses of the compromised key

                        This sounds obviously good.

                  2. 1

                    It’s not clear to me that key revocation should be handled at the VCS protocol level.

                    The important thing is to get universal commit signing and allow layering of arbitrary trust/audit systems on top of that. Even if we could get a significant fraction of developers to invest in a web-of-trust setup … private keys are lost all the time, repo servers get hacked, and individual authors sometimes publish malicious changes.

                    What’s important (and what I am hoping Pijul has implemented) is to make it easy to track changes in trust at every level and enable clients/servers to support arbitrary security policies. Can a repo (like NPM) express that a password reset has occurred, a new 2FA token, N-of-M signatures, or some other new protocol we haven’t invented yet?

                    Even if Pijul just auto generates a key pair and stores it in paintext on the developer’s system, a repo server compromise would show a new developer key being used. That’s would be a huge step forward over Git’s status quo. For that reason functionality alone, I think Pijul should disabled unsigned commits by default.

              2. 1

                Having immutable data but mutable/evolvable metadata would be great. Mercurial has that (sort of) by allowing to mark some commits as deprecated/hidden. The key thing is to preserve history, but to allow to rework how it is presented (the main tree of changes). The GDPR use case is an interesting and tricky one.

              1. 1

                Does anyone know a good way to prevent things like this from happening? I see it a lot, especially like the article in creating gaps in software security. Yet it’s unreasonable to have a core group review all of the code, and it’s also unreasonable to expect everyone to know everything.

                With “training” what it is today, many junior and maybe even higher engineers are unaware of, or uninterested in, this kind of software reasoning. At my last company, the security team desperately tried to get everyone on board with security is a company-wide concern, but the teams I was on very rarely thought through security implications and would just say “oh, we’ll just have security review.” Which I think is an important step, but security was not nearly a big enough team to review everything and only had the time to look at the big picture leaving all of the actual code unreviewed (but theoretically “signed off on”)

                It almost seems like these gaps are inevitable with current software practices. I wonder if maybe Go had some of the right idea in reducing the power the developer has[1], but perhaps what we need is something like RBAC but for languages: different roles have access to different software primitives. No idea how that could work in practice, and there are a lot of issues with the idea (as well as it couldn’t cover everything), but I wonder if some big, crazy idea is what is needed to get software security to where it should be.

                [1] Not trying to be inflammatory here, the original goal of Go was stated to be this, though the language has evolved and it may no longer be the case

                1. 3

                  I think this has to come top down. As I’ve gotten farther up the leadership chain I’ve started requiring security analysis from the teams who roll up under me. List of CVEs, Developer driven pen testing, Library patch status. I have a templated report who’s realy purpose is just to get the team to start looking at and considering this stuff.

                  It only works because I as a person in leadership is requiring them to provide a report on it to me. And I then provide them with budget to tackle any issues the report identifies. I don’t accept a “There are no issues” report either. That just tells me they didn’t do the work.

                  1. 2

                    Would be cool if you published your templates, your process and such results as you can.

                  2. 2

                    I think the issue is that our tools are lagging behind our practices; unrestricted I/O access in business systems is an anti-pattern, but most programming languages lack the semantics necessary to deal with it. Even languages like Haskell, where side-effects must be explicitly dealt with, lack the ability to reason about what kind of side-effects a piece of code can perform.

                    Relying on processes and training is also unsustainable IMO - if it’s possible to make a mistake someone will make it eventually, not to mention the cost and mental overhead required to maintain a strict security regimen.

                    I believe the right way forward is something like capabilities - this was a very interesting post on the subject: https://justinpombrio.net/2021/12/26/preventing-log4j-with-capabilities.html (which is also trending right now, see https://lobste.rs/s/lumsvs/preventing_log4j_with_capabilities )

                    Of course you could also restrict I/O access at the container or runtime level (see eg Deno https://deno.land/manual@v1.17.1/getting_started/permissions) but IMO this is a bit too coarse-grained. For example, what if you want to give file system access to your own application code to but not to vendor modules? Containers only provide one isolation level per application, but with capabilities you could control exactly what parts of the code can access a certain resource.

                    1. 2

                      #define system system_function_forbidden

                      There are some projects with more complete list of security restrictions like that.

                      1. 1

                        Those work to an extent, along with various linters. But I’ve found those are often the first to be disabled when someone runs into an issue, rather than understanding why they are there in the first place. Thinking back on my comment, I guess it really boils down to “I wish more software engineers cared about security, and treated it as something to preemptively think about rather than reactively think about.”

                    1. 13

                      This talks about the tooling investment required for a monorepo, but doesn’t address the tooling investment required for multi-repos, which I would argue is far more. Versioning dependencies between services is a hard problem, and a place like Amazon which uses multi-repos internally still hasn’t figured it out.

                      1. 4

                        Having worked on toolings for monorepo, i would say its fair to say that there are painpoints and tradeoffs at 2 sides of the spectrum.

                        Today, the best is middle ground wgere you have a set of ‘macro repos’ separated by toolings/platform/infra needs.

                        1. 3

                          There are certainly pain-points for a monorepo, but they have known, good solutions (though they may require a high technical investment). I haven’t seen a good solution to the multi-repo versioning dependency problem yet. I agree that both sides have pain-points but I would argue multi-repos has a huge pain-point with no good solution, which in my opinion makes it a non-starter.

                          I would be really interested to see further developments in the multi-repo world and be proven wrong, but as far as I can tell most solutions revolve around building a “meta” VCS around the various repositories to try to replicate the atomic commit behavior. At a certain point, is the only downside to a monorepo the performance? (And this is semi-limited to Git, as from my understanding Mercurial has better monorepo characteristics which Meta is investing in)

                        2. 1

                          Author here. Absolutely. I was working on Kubernetes during the early phases where the API client and types were being transitioned out of the monorepo to their own projects. It was a convoluted staging and sync process that made some of dependency hell problems untenable.

                          There’s probably some sort of relation to Conway’s law – where you are shipping your org chart, but the underlying services are severely codependent.

                        1. 8

                          Seeing this RCE has strengthened my belief that the best way to secure our production infrastructure is to eliminate dynamism at all levels – no runtime class loading, no shell, ideally no JIT or interpreter (unless, of course, the product is exposing an interpreter to the customer). It’s too bad that, except perhaps for Go, a fully static environment like an AOT-compiled language running in a distroless container fails the “boring technology” test, at least for a typical web application.

                          1. 8

                            You’re halfway down the path to the best way to write secure and reliable applications!

                            1. 1

                              I’ve always been a big proponent of keeping things static and eliminating dynamism, usually arguing for reliability and test-ability, but the security angle is an interesting addition to the argument. I wonder if one day we’ll consider dynamism everywhere to be as bad as memory unsafety.

                            1. 1

                              The only part of Hungarian Notation I agree with is prefixes to clarify variables’ scope or lifetime. I always prefix instance/member variables with an underscore, and in those situations where I need a static (aka class) variable I’ll prefix it with “s”. It’s very useful to be able to tell clearly where the code accesses or changes the receiver’s state, or state shared by all instances.

                              My only other disagreement with this article is the Ref-of-Rc stuff. When it says:

                              A much better approach is to change your API to not hold long-lived references to other objects. Depending on the situation, it might make sense to take a callback argument in the Monster::take_damage() method

                              … I basically want to throw my cards in the air and leave the table. Yes, let’s get rid of the observer pattern in favor of something nearly useless. I imagine some Clippy-like avatar popping up after I call “monster.take_damage()” and saying “gosh, it looks like the monster just took damage!”

                              To me this seems like degrading one’s design to fit the rigid constraints of the language. The Observer pattern is perfectly cromulent— it’s pub-sub at the object level — and if the language is too paranoid to support it, that seems like a language design problem.

                              1. 1

                                The Observer pattern is still very possible, but Rust makes it a special case because it is special and requires additional thought to use properly (as the article notes the downsides). I think it’s nice for languages to make advanced/special cases look advanced/special in the code because it makes it easy to spot something to pay extra attention to.

                                It is also a matter of previous experience and style of development, and Rust favors the imperative and functional styles more than the OO styles. Certain OO design patterns are trickier to bring over, but usually for good reason.

                              1. 1

                                Hm… I have mixed feelings here. There are libraries I have used that have names for the lifetimes, and in certain situations I agree it makes it more useful, but I think its a pretty niche use-case.

                                Since lifetimes are akin to generic types (where we generally use single letters like T and U) I think it makes sense in most cases to keep lifetimes as single letters. Otherwise it risks just adding noise for no real reason.

                                Of course, nothing in software is black and white, and there are situations for single-character and for full names. It is certainly good to be aware of the options.

                                1. 2

                                  Yeah, I try not to prescriptively state that one thing is better or worse when I’m writing for Possible Rust. There are lots of reasons to make one choice or the other, and they include considerations outside the software like the knowledge and experience of the team. Sticking to describing what’s “possible” (hence the name of the site) is a safer strategy for sharing useful knowledge.

                                  1. 1

                                    Right on! I think the post is good at showing the options available. I was just adding my personal commentary on the topic to see what others might think.

                                    Sorry if it seems like I’m disagreeing with the post! Although to your point, there isn’t really anything to disagree with because you’ve just stated the available option.

                                    1. 2

                                      No worry at all! I was trying to provide a bit of context for my own thinking around the purpose of the site and the posts. You’re right that the framing makes disagreeing with the stances the site takes difficult (because I try not to take stances usually), but I find the conversations around the posts often end up in the lovely world of exploring trade-offs, edge cases, and situational case studies. That’s what I’m looking for!

                                1. 13

                                  What the colleague was proposing sounds like premature abstraction. I see this quite often, mostly with junior and medior programmers but sometimes in seniors too (hell, I still catch myself doing it) - they start inventing all kinds of abstractions and patterns “in case this code needs to change”. But the thing is, you never know how the code will change. The things you make configurable might actually never change, and the things you baked in as assumptions might change several times.

                                  I find it’s best to write the code to be as straightforward as possible, so it’s easy to understand. Easy to understand code is easier to change too, because you’ll have an easier time figuring out the repercussions of your changes. Code that is factored in a way that makes it easy to understand and modify in various dimensions while still being efficient is the mark of a true craftsman.

                                  As you get more experienced, you’ll notice the kinds of things that tend to change a lot (data models, database layouts etc) and you’ll develop a style that favors “premature abstraction” for these kinds of things. If you’re lucky, your past experience affects the future work too and you’ll be right on the money with your abstractions. They were still a bit premature in a way, but because you’re working in a similar context you see the same patterns over and over and you and your colleague will thank your prescience that allowed the easy changes.

                                  However, be wary of carrying over these abstractions to other kinds of work, as your hard-won experience will actually betray you. For example, people working on frameworks and libraries tend to favor decoupling database code so that it can work on multiple databases. This is often the right thing to do for libraries, but in end-products it just makes code needlessly complex and slow - there it makes more sense to maximally leverage the features of the database you chose in the first place.

                                  1. 6

                                    I agree with all sjamaan has written. Also, I want to add:

                                    What the colleague was proposing sounds like premature abstraction.

                                    I call that YAGNI.

                                    I was going to joke by starting my comment as “I stopped reading at ‘My initial proposal was to add a class…’”. I was a heavy Python class user (inherited from University Java class, and the Python community), now that I have seen how much pain it is, and how much easier it is to work and debug with simple data types and functions (possibly passing function as arguments to do inversion of control).

                                    I did not write “you do not need classes or disjoint types, and anything beyond dict and list is overkill”. My current approach is to start new code as simple as possible but not simpler. To be able to grasp that, I learned standard Scheme that does not have OOP builtin.

                                    I quote the great last sentence of the previous comment that is my approach to build “abstractions” on top of SQL databases:

                                    in end-products it just makes code needlessly complex and slow - [in end products] it makes more sense to maximally leverage the features of the database you chose in the first place.

                                    1. 1

                                      The story is slightly editorialized to (hopefully) by applicable to a larger audience. The actual language being used doesn’t even have classes. In general I agree that OOP is a tool to use sparingly.

                                      1. 1

                                        (opinion) OOP is a great tool for API discovery

                                        I’ve not found it useful for much else tbh.

                                    2. 4

                                      I agree that what my colleague proposed was premature abstraction, at least from my perspective. But as you note, oftentimes experience will favor certain premature abstractions. This colleague was also at a high senior level (at least in title), so I like to give benefit of the doubt that they know what they are doing.

                                      What is interesting is that “code as straightforward as possible” also suffers from ambiguity. From your comments, I believe you and I agree on what that looks like, but someone from a different background may completely disagree. My colleague might argue that their proposal was very straightforward! The absolutely fascinating bit is that “decoupled code” and “code that is simple” is something we all know as the goal, but the true definitions of what these mean are not actually defined.

                                      I thought it was just so utterly strange that two entirely different approaches can both be justified with the same reasoning – and is there any way to “prove” that one approach is more correct than the other? Or are we just basing it all on our personal background experience?

                                      1. 4

                                        I don’t think you can “prove” that one approach is better/more correct than the other, because that’s highly subjective. Everybody will be bringing in different baggage. My gut says that “straightforwardness” as I call it should be measurable - the less indirections, classes, methods and conditionals you have, the more straightforward the code is.

                                        But even this relatively objective measure can be perverted, I’m sure, because programmers are just so damn good at that. Just think “code golf” to reduce the LOC count - that doesn’t necessarily make it more readable/straightforward.

                                        1. 3

                                          I lean towards agreeing that “proving” one approach over the other is impossible. Then I guess the question is, if everyone has different, subjective, ideas of what “straightforward code” and “decoupled code” is, does it even make sense to have “straightforward and decoupled code” as the north star for software engineering? If none of us agree on where that north star is in the sky, we’re all going in different directions with entirely different maps of the world.

                                          This is mostly just a philosophical question, one which I find myself considering. The engineer/scientist in me truly wants to define what good code looks like, but if its fundamentally a problem about people then that is clearly impossible.

                                          1. 3

                                            It’s a very good question indeed. I think maybe it’s much less important as to where your particular north star is than to have a consistent vision of where that star should be in a given team/project. That way, you get a consistent codebase that perhaps some onlookers find horrible and badly engineered, but everyone on the team agrees is a joy to maintain.

                                            At least in my own experience, in a team where people are aligned on a certain style of engineering, you get a codebase that has an internal coherency. If different parts are done by people with different viewpoints, you get a Frankenstein monster that has no sense of identity. I’ve seen this happen, and it can be very difficult to reconcile this without somehow “enforcing” one view on those that disagree. And even talking about it can be difficult, because it’s something that’s very difficult to adequately put into words.

                                            I think this is something that “style guides” try to do but fail at miserably because they’re only focused on the superficial syntactical details. But choosing (for example) whether to use interfaces or callbacks in a codebase goes much more deeply than that. And to make matters worse, there may be cases where there’s a good technical reason to deviate from the common convention.

                                            1. 2

                                              That is a good point, perhaps the only thing that does matter is what the team agrees on. Being aligned on the north star is certainly important, but where the north star is might not be. Definitely something I’m going to need to spend some time mulling over.

                                              Then there is the question of whether a particular north star will “dominate” so to speak. For instance, complex abstractions are oftentimes difficult to replace while more straightforward approaches are oftentimes easier. The paradox is that straightforward code is usually refactored until it is complicated and difficult to change, while complicated code remains complicated. Does a team or project’s north star inevitably drift towards complicated over time? My instinct says yes, which I feel has some interesting consequences.

                                              1. 1

                                                Does a team or project’s north star inevitably drift towards complicated over time? My instinct says yes, which I feel has some interesting consequences.

                                                hm, I’d have to agree that, in general, products tend to become more complicated due to the pressures from changing requirements, and the code follows suit. With one caveat: I’ve seen some refactorings that made needlessly complicated code much clearer (but possibly more abstract and complex to understand from scratch). Sometimes all it takes is some time away from the code to realise how it can be made simpler by finding the right abstraction for that particular part of the code. And sometimes it takes multiple iterations of refactorings in which one realises that eventually entire parts can simply be dropped.

                                                1. 2

                                                  As an aside, I think the fact that products will grow more complex over time is a great reason not to start out making things complex right away. That complexity will probably be in different places than you expect, just like performance bottlenecks will be in different places than you expect. So premature abstraction is like premature optimization in that sense, and also in the sense that it is silly to design code in a way that you know will be slow/difficult to adapt to changing requirements - it just takes a lot of experience to see where to apply optimization and abstraction ahead of time.

                                            2. 1

                                              In the long term this can be measured in money: how much product earns and how much it costs to support it. Earnings incorporate values like customer satisfaction. Spends can tell about team velocity and maybe even employee churn.

                                      1. 7

                                        I can relate to this topic. Background: I have started with Java and later transitioned to Go. At this point, I have more Go experience than Java. I will concentrate on these two languages. But I believe that concerning the topic we can roughly compare two groups of languages: (Go, C, Rust, etc.) and (Java, .NET, Ruby, etc.).

                                        I have noticed that traditions (or cargo-cult) in Java are solid. You always apply some layered architecture, you always use interfaces, and you always test against your mocks. Often you do not ask a lot of questions with well-established legacy corporate systems. And very often, your framework will dictate how to structure your code.

                                        In Go, and I believe in C as well, things are more straightforward. Due to the niche (infrastructure in case of Go) that language occupies, in most cases, you have a minimal system that is fine-tuned to solve one particular problem, interacting with well-known external systems. In Go, you declare an interface only when you can not avoid having one. Overall approaches are much more practical, and this allows you to write simpler code. If there is no framework, you have more freedom. In Go, the standard library is your framework.

                                        Sure, nowadays, we have more extensive Go applications that adopted some enterprise patterns. Still, an excellent project will strike you as a minimal sufficient implementation to solve concrete tasks.

                                        Comparing my experiences with both approaches: you get a more reliable solution when you test against the real deal (DB, queue, fs, etc.) and avoid extensive use of mocks. You get a more straightforward and more readable solution when you use fewer abstractions. Simple code makes handling (rare) incidents at 3 AM way easier.

                                        The enterprise approach gives you the flexibility to switch vendors rapidly, a requirement that can not be ignored in the real world.

                                        There is a middle ground when abstraction and complexity are extracted into a separate library and maintained independently. Then you can have both: simplicity and flexibility. From my experience, “gocloud.dev” is an excellent example of such an approach. This set of libraries abstracts away different cloud implementations for you. You can keep your applications small and simple and still have an opportunity to switch vendors and perform tests.

                                        To conclude, I want to say that collaboration between people with radically different views often makes the product better. It is hard, and staying respectful is the key (reminder to myself).

                                        1. 4

                                          Thanks for your thoughts! I think you capture the trade-offs between the two approaches well, and I agree, there do seem to be two groups of languages in this regard.

                                          I have noticed that traditions (or cargo-cult) in Java are solid.

                                          I have found this as well. Every colleague I have worked with that had an extensive Java background knew the name of every design pattern, and would describe all code and approaches as combinations of these. Personally, I’ve just understood the concepts behind the various design patterns, otherwise I just apply them where they fit naturally (only remembering the name of a tricky few which are handy for explaining to junior engineers so they can research them more). Completely different approaches!

                                          To conclude, I want to say that collaboration between people with radically different views often makes the product better. It is hard, and staying respectful is the key (reminder to myself).

                                          I couldn’t agree more. The combination of different views and backgrounds is incredibly important for a robust product. Staying respectful, identifying trade-offs, and justifying decisions (without cargo-culting in either direction) is extremely difficult and what I think is the true mark of an expert.

                                        1. 4

                                          The 2021 Edition makes me especially happy because of how utterly boring it is. The changes are simply to take out jagged edges in the language as-is, but otherwise nothing new or big is being added (unlike the previous 2018 Edition with async/await).

                                          It seems like the future of Rust is really just smoothing out the edges, and the editions are a brilliant way of doing that, since we aren’t stuck with them indefinitely (or forced through a python 2/3 scenario).

                                          1. 3

                                            Note: the official position of Rust project is that Rust is stuck with all Rust editions released indefinitely.

                                          1. 3

                                            I find it quite interesting most people are suggesting surface-level changes to syntax. I wonder why that is?

                                            1. 9

                                              Note: this observation is called Wadler’s law: https://wiki.haskell.org/Wadler's_Law

                                              1. 9

                                                Its called bikeshedding, and always happens with complicated problems/questions. Its easiest to argue surface-level stuff rather than the really difficult stuff.

                                                I think its an utterly fascinating aspect of human nature!

                                                1. 4

                                                  A few reasons I think

                                                  • It’s like a bike shed, it’s easy to have an opinion on how to paint it/change syntax, hard to have an opinion on how to improve the internal structure.
                                                  • Change the internet structure too much and it’s no longer “rust”. Obviously many people prefer python to rust, but saying “clone python” isn’t an interesting answer.
                                                  • Most of rust’s warts are, at least in my opinion, in half finished features. But like the above it’s uninteresting to say “well just hurry up and fix const generics, GATs, custom allocators, self referential structs, generators, making everything object safe, etc”. Those aren’t really “redesigns”, they’re just “well keep at it and we’ll see what we end up with”.
                                                1. 39

                                                  This article is full of misinformation. I posted details on HN: https://news.ycombinator.com/item?id=26834128.

                                                  1. 10

                                                    This really shouldn’t be needed, and even someone without any exposure to Go can see this is just bunk with the minimal application of critical thinking. It’s sad to see this so highly upvoted on HN.

                                                    When I was in high school one of the my classmates ended up with a 17A doorbell in some calculations. I think he used the wrong formula or swapped some numbers; a simple mistake we all made. The teacher, quite rightfully, berated him for not actually looking at the result of his calculation and judging if it’s roughly in the right ballpark. 17A is a ludicrous amount of power for a doorbell and anyone can see that’s just spectacularly wrong. The highest rated domestic fuses we have are 16A.

                                                    If this story had ended up with 0.7%, sure, I can believe that. 7%? Very unlikely and I’d be skeptical, but still possible I suppose. 70% Yeah nah, that’s just as silly as a 17A doorbell. The author should have seen this, and so should anyone reading this, with or without exposure to Go. This is just basic critical thinking 101.

                                                    Besides, does the author think the Go authors are stupid blubbering idiots who someone missed this huge elephant-sized low-hanging fruit? Binary sizes have been a point of attention for years, and somehow missing 70% wasted space of “dark bytes” would be staggeringly incompetent. If Go was written by a single author then I suppose it would have been possible (though still unlikely), but an entire team missing this for years?

                                                    Everything about this story is just stupid. I actually read it twice because surely someone can’t make such a ludicrous claim with such confidence, on the cockroachdb website no less? I must be misunderstanding it? But yup, it’s really right there. In bold even.

                                                    1. 6

                                                      I think this is really interesting from a project management and public perception point of view. This is slightly different from your high school classmate, because they might not have been aware the ridiculousness of their claims. Of course, this situation could be the same, but I think it is more interesting if we assume the author did see this number and thought it was ridiculous and still wrote the article anyway.

                                                      Someone doesn’t write a post like this without feeling some sort of distrust to the tool they are using. For some reason, once you’ve lost the trust, people will start making outlandish claims without giving any benefit of the doubt. I feel like this is similar to the Python drama which ousted the BDFL and to Rust’s actix-web drama which ousted the founding developer. Once the trust is lost in whoever is making the decisions, logic and reason seem to just go out the window. Unfortunately this can lead to snowballing and people acting very nasty for no real reason.

                                                      I don’t have much knowledge of the Go community or drama, and in some sense this is at least much more nicely put than some of Rust’s actix-web drama (which really threw good intent out the window), but I’d be curious to know what happened that lost the trust here. It might be as simple as being upfront about the steps being done to reduce binary size, even if they are not impactful, that might gain back trust in this area.

                                                      1. 3

                                                        It’s my impression that the Python and actix-web conflicts were quite different; with Python Guido just quit as he got tired of all the bickering, and actix-web was more or less similar (AFAIK neither were “ousted” though, but quit on their own?) I only followed those things at a distance, but that’s the impression I had anyway.

                                                        But I think you may be correct with lack of trust – especially when taking the author’s comments on the HN story in to account – but it’s hard to say for sure though as I don’t know the author.

                                                        1. 2

                                                          Perhaps I am over-generalizing, but I think they are all the same thing. With Rust’s actix-web it essentially boiled down to some people have a mental model of Rust which involves no unsafe code (which differed from the primary developer’s mental model). At some point, this went from “lets minimize unsafe” to “any unsafe is horrible and makes the project and developer a failure”, regardless of the validity of the unsafe statements. Unfortunately it devolved to the point where the main developer left.

                                                          In the Go situation it seems very similar. Some people have a mental model that any binary bloat is unacceptable, while the core devs see the situation differently (obviously balancing many different requirements). It seems like this article is that disagreement boiling over to the point where any unaccounted-for bits in a binary are completely unacceptable, leading to outlandish claims like 70% of the binary is wasted space. Hopefully no Go core developers take this personally enough to leave, but it seems like a very similar situation where different mental models and lack of trust lead to logic and benefit of the doubt getting thrown out the window.

                                                          It is hard to say what is going on for sure, and in many ways I’m just being an armchair psycologist with no degree, but I think it is interesting how this is a “common” trend. At some point projects that are doing a balancing act get lashed out at for perceived imbalances being misconstrued as malicious intent.

                                                          1. 1

                                                            I don’t think you’re correctly characterizing the actix situation. I think the mental model was “no unnecessary unsafe”. There were some spots where the use of unsafe was correct but unnecessary, and others where it was incorrect and dangerous. I think there was poor behavior on both sides of that situation. The maintainer consistently minimized the dangerous uses and closed issues, meanwhile a bunch of people on the periphery acted like a mob of children and just kept piling on the issues. I personally think someone should have forked it and moved on with their lives instead of trying to convince the maintainer to the point of harassment.

                                                      2. 2

                                                        on the cockroachdb website no less

                                                        Cockroachdb is on my list to play with on a rainy afternoon, but this article did knock it down the list quite a few notches.

                                                        1. 2

                                                          We use it as our main database at work and it’s pretty solid. The docs for it are pretty good as well. But I definitely agree, this is a pretty disappointing article.

                                                    1. 12

                                                      I am writing up notes for an overview of Ada and comparing its constructs against C, C++, and Rust since I am familiar with them. I’m trying to avoid it, but I think I have to do it as multiple articles to make it consumable.

                                                      I am aiming to be as objective as possible and show the tradeoffs and similarities in engineering choices available. The goal isn’t language bashing or spreading the gospel of Ada, but to demonstrate similarities of Ada with other more well-known languages, especially because Ada can seem dense since it’s not in the C family.

                                                      1. 3

                                                        This sounds super interesting. Please share here when you are done, I’d love to read it!

                                                      1. 10

                                                        I enjoy these write-ups and seeing the rapid development of Bevy. There have been a few other game engine projects in Rust (Piston, Ggez, and Amethyst are a few), but none of them seem to have caught on just as much as Bevy has. I always find it fascinating which projects get popular and which don’t, and I think a large part of Bevy’s success has been the cohesiveness of the design and the aim at simplicity while still providing the power of complicated features. It obviously has had the benefit of being able to learn from the other projects!

                                                        I also wonder if it is these updates which are a huge part of the popularity, as they put everything front and center and explain complicated topics without boring you with details (and plenty of nice pictures and videos to boot!) The first release of Bevy was just a single page explaining all of the features and the power behind it all, and it made everything seem so… achievable. It was easy to picture how you could put together a game with all of the cool features, and it seemed as if it wouldn’t require diving into pages and pages of documentation. I’m not sure how true that actually is, but I think it shows how important release notes can be.

                                                        1. 2

                                                          Yeah, I’m a huge fan of this. I’m writing my own game and writing my own game engine for it (NIH syndrome in full effect!) but I’m totally going to go through this and steal copy ideas, like the stack-based state machine.

                                                        1. 5

                                                          I feel like the important detail missing here about how Java does stuff differently is that Java is effectively “everything is a reference”(ignoring primitives), so you don’t need the type information for much of anything. Whereas rust is “everything is a value by default”, which is super unconventional in most popular languages (like, what, C/++ and Pascal? Definitely showing my ignorance here)

                                                          “I need a box for 3 Ts” is what you do in Rust. So you need to know how big T is! Like you’re can’t get around that issue at runtime. You need to know the size of T somehow. So there’s stuff

                                                          “I need a box for 3 pointers to T” is what Java does. Well pointers are all the same size so you’re all good, one structure will work for everything. You basically just want it for like…. type checking, at a first approximation. And yeah, maybe if you want a default instatiator for it you can add that to the structure. Not erasing would be weird!

                                                          I feel like passing by value is the biggest source of “oh I gotta think differently here” for non-systems programmers coming to Rust

                                                          1. 5

                                                            Whereas rust is “everything is a value by default”, which is super unconventional in most popular languages (like, what, C/++ and Pascal? Definitely showing my ignorance here)

                                                            My initial thought here was “What? Its weird to treat everything as a reference!” but I actually think you make a good point and I agree with you! C/++ and Rust do end up being the “weird” ones given the state of programming languages today, which is funny because passing by value used to be the conventional way.

                                                            1. 2

                                                              To bring this full circle, my first language was actually C++! I remember suffering a lot through arrays and references, and yeah , arguments are copied.

                                                              To be honest I think that pass by value by default is super messy in a world of structured data. Like I have a vector, the data is managed by it, so passing the vector around should let me operate on the same data? But it doesn’t, while arrays(pointers) do… it’s really hard cuz you need to dig into implementations or docs to figure out what your data structure does on a copy.

                                                              I get it but it’s definitely tougher than like… the Python object model

                                                          1. 4

                                                            One thing that Erlang gets right that other people miss is Hot Reloading. A distributed system that is self healing has to be able to hot reload new fixes.

                                                            That’s my biggest frustration with the new BEAM compilers in Rust and so on: they choose to not implement hot reloading - it’s often in the list of non-goals.

                                                            In a different video, Joe says to do the hard things first. If you can’t do the hard things, then the project will fail, just at a later point. The hard thing is isolated process hot reloading: getting BEAM compiled in a single binary is not.

                                                            1. 2

                                                              Hot reloading is one of those features that I have never actually worked with (at least, not like how Erlang does it!) So for possibly that reason alone I don’t see the absence of the feature a major downside of the new BEAM compiler. I wonder if the lack of development in that area is just because it is a rare feature to have, and while it seems like a nice-to-have, it isn’t a paradigm shift in most people’s minds (mine included!).

                                                              The benefits of it do seem quite nice though, and there was some other lobste.rs member who had written a comment about their Erlang system which could deploy updates in < 5min due to the hot reloading, and it was as if nothing changed at all (no systems needed to restart). This certainly seems incredible, but it is hard to fully understand the impact without having worked in a situation like this.

                                                            1. 5

                                                              This is a fantastic talk! The idea that robust systems are inherently distributed systems is such a simple and obvious idea in hindsight. Distributed systems are difficult, and I have had upper managers claim that we need “more robust” software and less downtime, yet refuse to invest in projects which involve distributed algorithms or systems (have to keep that MVP!). I think Armstrong was right that in order to really build a robust system we need to design for millions of users, even if we only expect thousands (to start), otherwise the design is going to be wrong. Of course this is counter-intuitive to modern Scrum and MVPs.

                                                              Additionally, there is so much about Erlang/OTP/BEAM that seem so cutting-edge yet the technology has been around for a while. It will always be a wonder to me that Kubernetes has caught on (and the absolutely crazy technology stack surrounding it) yet Erlang has withered (despite having more features), although Elixir has definitely been gaining steam recently. Having used kubernetes at the past two companies I’ve been at, it has been nothing but complicated and error-prone, but I guess that is just much of modern development.

                                                              I have also been learning TLA+ on the side (partially to just have a leg to stand on when arguing that a quick and sloppy design is going to have faults when we scale up, and we can’t just patch them out), and I think there are so many ideas that Lamport has in the writing of the TLA+ Book that mirror Armstrong’s thoughts. It is really unfortunate that software has figured out all of these things already but for some reason nobody is using any of this knowledge really. It is rare to find systems that are actually designed rather than just thrown together, and that will never lead to robust systems.

                                                              Finally, I think this is where one of Rust’s main features is an under-appreciated super-power. Distributed systems are hard, because consistency is hard. Rust being able to have compile-time checks for data-races is huge in this respect because it allows us to develop small-scale distributed systems with ease. I think some of the projects bringing OTP ideas to Rust (Bastion and Ludicrous are two that come to mind) have the potential to build completely bullet-proof solutions, with the error-robustness of Erlang and the individual-component robustness of Rust.

                                                              1. 4

                                                                No. Rust prevents data races, not race conditions. It is very important to note that rust will not protect you from the general race condition case. In distributed systems, you’ll be battling race conditions, which are incredibly hard to identify and debug. It is an open question if the complexity of rust will get in the way of debugging a race condition (erlang and elixir are fantastic for debugging race conditions because they are simple, and there is very little to get in your way of understanding and debugging them).

                                                                1. 2

                                                                  The parent post says rust has compile time checks for data races and makes no claim about race conditions. Did I miss something?

                                                                  1. 2

                                                                    When you are working with distributed systems, it’s race conditions you worry about, not data races. Misunderstanding the distinction is common.

                                                                    Distributed systems are hard, because consistency is hard. Rust being able to have compile-time checks for data-races is huge in this respect because it allows us to develop small-scale distributed systems with ease.

                                                                  2. 1

                                                                    Yes, Rust prevents data races which is (as mentioned by another poster) what I wrote. However, Rust’s type system and ownership system does makes race conditions more rare in my experience, since it requires the data passed between threads to be explicitly wrapped in an Arc and potentially Mutex. It is also generally easier to use a library such as Rayon or Crossbeam to handle simple multithreaded cases, or to just use message-passing.

                                                                    Additionally most race conditions are caused by data races, so… yes, Rust does prevent a certain subsection of race conditions but not all of them. It is no less a superpower.

                                                                    It is an open question if the complexity of rust will get in the way of debugging a race condition (erlang and elixir are fantastic for debugging race conditions because they are simple, and there is very little to get in your way of understanding and debugging them).

                                                                    I don’t understand this point. Rust can behave just like Erlang and Elixir (in a single-server use-case, which is what I was talking about) via message passing primitives. Do you have any sources for Rust’s complexity being an open question in this case? I am unaware of the arguments for Rust’s affine type system is cause for concern in this situation – in fact it is usually the opposite.

                                                                    1. 2

                                                                      “most race conditions are caused by data races”

                                                                      What definition of “most” are you using here?

                                                                      Many people writing distributed system are using copy or copy on write systems and will never encounter a data race.

                                                                      Do I have any sources? Yes. I debug distributed systems, I know what tools I use, and ninjaing them into and out of rust is not going to be ergonomic.

                                                                      1. 5

                                                                        Just some quick feedback/level-setting, I feel like this conversation is far more hostile and debate-like than I am interested in/was hoping for. You seem to have very strong opinions, and specifically anti-Rust opinions, so lets just say I said Ada + Spark (or whatever language with an Affine type system you don’t have a grudge against).

                                                                        The point I was making is that an affine type system can prevent data-races at compile-time, which are common in multi-threaded code. OTP avoids data-races by using message-passing, but this is not a proper fit for all problems. So I think an extremely powerful solution would be an affine-type powered system for code on the server (no data-races) with an OTP layer for server-to-server communication (distributed system). This potentially gets the best of both worlds – flexibility to have shared memory on the server, while OTP robustness in the large-scale system.

                                                                        I think this is a cool idea and concept, and you may disagree. That is fine, but lets keep things civil and avoid just attacking random things (especially attacking points that I am not making!)

                                                                        1. 2

                                                                          Not the parent:

                                                                          In the context of a message-passing system, I do not think affine|linear types hurt you very much, but a tracing GC does help you, since you can share immutable references without worrying about who has to free them. Linear languages can do this with reference-counted objects—maintaining ref. transparency because the objects have to be immutable, so no semantics issues—but reference counting is slow.

                                                                          Since the context is distributed systems, the network is already going to be unreliable, so the latency hit from the GC is not a liability.

                                                                          1. 1

                                                                            Interesting point although I don’t know if I necessarily agree. I think affine/linear types and GC are actually orthogonal to each other; I imagine its possible for a language to have both (although I am unaware of any that exist!) I don’t fully understand the idea that affine/linear types would hurt you in a multi-threaded context, as I have found them to be just the opposite.

                                                                            I think you are right that reference counted immutable objects will be slightly slower than tracing GC, but I imagine the overhead will be quickly made up for. And you’re right – since its a distributed system the actual performance of each individual component is less important, and I think a language like Rust is mainly useful in this context in terms of correctness.

                                                                          2. 1

                                                                            Can you give an example of a problem where message passing is not well suited? My personal experience has been that systems either move toward a message passing architecture or become unwieldy to maintain, but I readily admit that I work in a peculiar domain (fintech).

                                                                            1. 2

                                                                              I have one, although only half way. I work on a system that does relatively high bandwidth/low latency live image processing on a semi-embedded system (nVidia Xavier). We’re talking say 500MB/s throughput. Image comes in from the camera, gets distributed to multiple systems that process it in parallel, and the output from those either goes down the chain for further processing or persistence. What we settled on was message passing but heap allocation for the actual image buffers. The metadata structs get copied into the mailbox queues for each processor, but it just has a std::shared_ptr to the actual buffer (ref counted and auto freed).

                                                                              In Erlang/Elixir, there’s no real shared heap. If we wanted to build a similar system there, the images would be getting copied into each process’s heap and our memory bandwidth usage would go way way up. I thought about it because I absolutely love Elixir, but ended up duplicating “bare minimum OTP” for C++ for the performance.

                                                                              1. 2

                                                                                Binaries over 64 bytes in size are allocated to the VM heap and instead have a reference copied around: https://medium.com/@mentels/a-short-guide-to-refc-binaries-f13f9029f6e2

                                                                                1. 2

                                                                                  Hey, that’s really cool! I had no idea those were a thing! Thanks!

                                                                                2. 1

                                                                                  You could have created a reference and stashed the binary once in an ets table, and passed the reference around.

                                                                                3. 1

                                                                                  It is a little tricky because message passing and shared memory can simulate each other, so there isn’t a situation where only one can be used. However, from my understanding shared memory is in general faster and with lower overhead, and in certain situations this is desirable. (although there was a recent article about shared memory actually being slower due to the cache misses, as every update each CPU has to refresh its L1 cache).

                                                                                  One instance that I have had recently was a parallel computation context where shared memory was used for caching the output. Since the individual jobs were long-lived, there was low chance of contention, and the shared cache was used for memoization. This could have been done using message-passing, but shared memory was much simpler to implement.

                                                                                  I agree in general that message passing should be preferred (especially in languages without affine types). Shared memory is more of a niche solution (although unfortunately more widely used in my experience, since not everyone is on the message passing boat).

                                                                        2. 4

                                                                          I think a good explanation is that K8s allows you to take concepts and languages you’re already familiar with and build a distributed system out of that, while Erlang is distributed programming built from first principles. While I would argue that the latter is superior in many ways (although I’m heavily biased, I really like Erlang) I also see that “forget Python and have your engineering staff learn this Swedish programming language from the 80ies” is a hard sell

                                                                          1. 2

                                                                            You’re right, and the ideas behind K8s I think make sense. I mainly take issue with the sheer complexity of it all. Erlang/OTP has done it right by making building distributed systems extremely accessible (barring learning Erlang or Elixir), while K8s has so much complexity and bloat it makes the problems seem much more complicated than I think they are.

                                                                            I always think of the WhatsApp situation, where it was something like 35 (?) engineers with millions of users. K8s is nowhere close to replicating this per-engineer efficiency, you basically need 10 engineers just to run and configure K8s!

                                                                        1. 14

                                                                          I’m a big fan of storing developer documentation in a doc directory in the git repo, as plain text files in $your_favourite_format. MAny hosting tools will render Markdown (or RST, ASCIIDoc, etc.) But you can also add a static site generator in front of it, which is pretty easy to set up.

                                                                          It’s simple, easy, can be reviewed in code reviews, the documentation will always match the code version, can be searched easily, you get a good history with decent tooling (I missed “blame” on Wikipedia many times), anyone can use their $favourite editor/IDE to write documentation, has basically 0 operational overhead/barrier of entry, etc.

                                                                          I have yet to see a system that improves on this for developer documentation. Sometimes things don’t need to be complex.

                                                                          1. 3

                                                                            There was even this cool concept named artifact (now abandoned) that expanded this to provide a link between documentation and implementation of features described in the specification. That sound like almost perfect solution for documenting projects.

                                                                            1. 3

                                                                              I found this also keep the docs and code in one place so people are easier to update it. Example, you submit a PR and someone pointed out you have to update the doc.

                                                                              If we move the doc out to say Google Docs or Dropbox Paper then it’s harder to review them.

                                                                              1. 2

                                                                                I missed “blame” on Wikipedia many times

                                                                                There’s a tool for a git-blame-like usage on Wikipedia http://wikipedia.ramselehof.de/wikiblame.php?lang=en&article=Main_Page (this alternative one also), linked in the History tab for each article.

                                                                                Also, putting documentation in the repo makes it easily discoverable with grep for example.

                                                                                1. 2

                                                                                  Ah good to know; thanks. It’s been quite a few years since I did serious editing and I tried to find something like this, but wasn’t able to find it at the time.

                                                                                2. 1

                                                                                  Dropbox Paper worked well at one place, re: operations. Searchable, without having to work out which git repo to look at. Easier to make small edits, without coming up with a edit reason. Editing a wiki page, I could picture getting interrupted, and totally losing the WIP.

                                                                                  1. 1

                                                                                    I am also generally a fan of this approach, but unfortunately it still suffers from the same issues as other tree-based solutions. Its fine if it is tied 1-to-1 with the code, because then the hierarchy is determined. But for information that doesn’t fit this mold it has the same issues as any tree-store will.

                                                                                    Additionally, I have found this approach works until it gets unmaintainable (because tree-based documentation always does), and then someone higher up decides the issue is that the documentation is stored in Git and not the structure of the documentation. I’ve seen this happen twice, and while switching to a system like Confluence makes things worse it is unfortunately how things have gone in my experience.

                                                                                    Gittit is a cool project which provides a graph/wiki interface on top of a Git repo which I think gets the best of both worlds. As far as I know it doesnt store images in Git which is a little unfortunate though. I would love to see some more solutions in this space, and I’ve been somewhat working on my own on the side.

                                                                                    1. 3

                                                                                      I don’t think it needs to be a tree-based approach. I rarely use subdirectories for these things (preferring filenames to order things, like email-development.markdown, email-sendgrid.markdown, etc.) and you can link to other documents with [email-sendgrid] or [See the sendgrid docs](email-sendgrid] (Markdown syntax) which should show up in the rendered version.

                                                                                      With things like Gittit you lose some of the advantages of storing the docs in the repo, like reviews. We used GitHub wiki at a previous job (which is also stored in git) and I found the only real advantage is the ability to edit files with Vim, which is nice (for me, anyway), but other than that I didn’t really see any clear advantages over Confluence or whatnot.

                                                                                      The biggest downside of editing Markdown files like this is the lack of direct preview by the way. If you make a typo in the above then you won’t know until after you push, although this can be solved with a doc generator/lint tool locally to some degree (there are other mistakes you can make too, like forgetting to close a * which are not so easily caught by a linter).

                                                                                      1. 1

                                                                                        This is super interesting, thanks for explaining! Using a flat-directory approach with markdown files does seem like a good solution, and I was unaware that Markdown could link between itself that easily. It is certainly something that I will have to try out myself. Have you used this system with many developers at once? I’m curious how easy it is to get everyone on-board.

                                                                                        I agree with your points on Gittit; it is not a perfect solution. The underlying behavior is very similar to your flat-directory approach, but the Git integration is a little lacking (for instance, symmetric syncing doesn’t seem completely figured out, and PRs are not a thing).

                                                                                  1. 1

                                                                                    Trees were never the right solution, but that doesn’t mean they’re not useful. I think that, even for file systems, a better solution would be a tag-based system from which trees can be constructed on-demand based on some set of tag groups. This doesn’t preclude a graph interface, as the value of a tag can be another node.

                                                                                    1. 1

                                                                                      I agree trees can be useful. The benefit of organizing knowledge in a graph is that you can then make arbitrary spanning trees representing hierarchies. This is flexible because you can make different trees for different situations (tags and categories are like this), and remove them if they are no longer relevant.

                                                                                      A tree based store doesn’t have the same flexibility.

                                                                                    1. 6

                                                                                      “Ad Hoc Documentation” is also different from “Real Documentation” because it’s situational. I don’t have to write to whoever doing whatever. I’m writing for cgenschwap and they’re trying to make the authored-by flag work. That makes the problem you’re solving fundamentally different, and I think that has as much to do with it as graphs-vs-trees does.

                                                                                      I hate writing documentation because I never no how much to write down. Too much, and it becomes an incomprehensible deluge. Not enough, and it’s useless.

                                                                                      1. 3

                                                                                        You’re right, ad-hoc documentation can be situational. However I’ve found many cases to be of the form “Oh, X happens all the time with Y, just do A, B and C to resolve it.” While this is ad-hoc, it can and should exist as a how-to or runbook in the documentation – its a known situation with a known resolution. There are definitely situations where it is highly specific, but I think those are fairly rare (for instance, that information can always go into a “Troubleshooting” section).

                                                                                      1. 8

                                                                                        I feel like the graph vs. tree thing is something of a false dichotomy. At my workplace, we use Confluence and use its tools to make the information as accessible as possible:

                                                                                        • Yes, there’s a tree. That’s just how Confluence works
                                                                                        • We cross link a lot — Confluence makes this easy. In fact, easier than it ever has because the linking dialog you get with cmd-K (on Mac) quickly searches all of the pages for title matches
                                                                                        • We use tags/labels on pages providing another way to search
                                                                                        • a top-level page for our engineering docs provides quick searches for those docs, plus the collection of labels

                                                                                        Our documentation isn’t perfect and, sure, it’s not always obvious where in the hierarchy people should put stuff, but there really are a lot of ways to find information in Confluence.

                                                                                        1. 3

                                                                                          I don’t think it is a false dichotomy. Adding links between pages in a tree-based structure might try to approximate the graph structure, but it doesn’t fix the underlying issue. Whatever hierarchy is created will be wrong, and certain documents or information will not fit within it. This means this information is either lost, placed in a poor location or the hierarchy has to go through a restructure. The graph model doesn’t have this issue because there is no inherent structure. There is no hierarchy to get wrong.

                                                                                          Of course, you can just treat a tree-based documentation system as if it was a graph based system, and use tags/labels, cross-links and a flat hierarchy, but at that point why use the system over something that is designed to support that use-case?

                                                                                          1. 3

                                                                                            Q: What’s the difference between a tree-based structure and an index of pages/sections where the index is organised by category and the category has subcategories (usually numbered, e.g. 3.2.1)?

                                                                                            Perhaps the difference is if you mandate creating the index first, rather than it being a future summary of existing material? Perhaps the correct term is graph-first vs tree-first?

                                                                                            1. 1

                                                                                              Q: What’s the difference between a tree-based structure and an index of pages/sections where the index is organised by category and the category has subcategories (usually numbered, e.g. 3.2.1)?

                                                                                              If I’m understanding you correctly they are effectively the same thing. Textbooks, for instance, are tree-based structures.

                                                                                              Perhaps the difference is if you mandate creating the index first, rather than it being a future summary of existing material? Perhaps the correct term is graph-first vs tree-first?

                                                                                              The issue is that an index is only going to be correct at a specific snapshot in time. Unlike textbooks, company documentation is a living entity that changes over time (although textbooks do have revisions). Once a tree-like index is added to the documentation it will eventually become out-of-date and require redoing, or cause the documentation to rot.

                                                                                              It doesn’t really matter when the index is added, I think we should attempt to avoid it completely. The alternative in a graph-based documentation system is to have categories and tags, with the main difference being that a single page can be in multiple categories or have multiple tags.