1. 110
  1. 7

    Malleable identifiers make signed patches the default, while allowing users to later change their personal details (email address, name, login…). The full story for synchronising these identifiers from multiple different source repos is not yet completely written. Testing and suggestions are welcome!

    This is an interesting design decision. As far as I can tell, there are two options, both of which are bad. The git option is to make these part of the hash and immutable. This means that:

    • If there’s a GDPR right-to-be-forgotten request on a public repo and you have to erase someone’s email address, you may need to rewrite history and end up with broken downstreams.
    • If you need to fix a commit message after pushing, then you can’t without breaking downstreams.

    The Pijul option to make them mutable is also problematic:

    • It’s easy to change the commit message to a different one after review, which makes it easy to hide changes.
    • You can’t rely on the authorship information for copyright ownership auditing.
    • You can’t detect tampering with commit messages / authorship if the repo is compromised.

    I think my ideal solution would be to make these things separately signed such that you have an audit trail of updates but you can also retain a hash of old versions if you need to delete them from the public tree. This would let you delete someone’s email address but still preserve a public log of the fact that the author was updated after the original commit. It would let you rewrite the commit message but (unless explicitly deleted) preserve the original in the audit trail.

    1. 22

      It’s possible I’m misunderstanding the Pijul docs on this, but I believe Pijul handles the situation quite nicely. The commits themselves are immutable, but instead of signing them with a string of your username/email, you sign them with a public key. There is then a mutable mapping file from public key to author details.

      I think this is a good overall solution, because if an author wants to delete their details they just delete the entry in the map file. But the commits stay immutable like Git.

      1. 4

        I believe what David is saying is that it’s not necessarily enough for the commits to stay immutable. You need an audit trail of changes to authorship as well. If that isn’t a problem for your use cases then Pijul will be fine. If you do need an audit trail of changes to authorship though then Pijul’s mechanism will be problematic.

        1. 16

          If you do need an audit trail of changes to authorship though then Pijul’s mechanism will be problematic.

          Quite the opposite: patches that don’t have a fixed author name must be signed, which is actually stronger than plaintext in the commit, not weaker. Sure, you need to trust the key owner on their identity, but you have the same problem with plaintext author names.

          1. 7

            Ah, I see. Thanks for clarifying.

            In that case, I think the public/private key is the audit trail, no? I don’t think a different author can “claim” the commits, but the original author can change their email address/name. I see it more like scuttlebutt’s approach to naming[1]

            [1] everything is public/private key, and names/nicknames or whatever are mappings individuals can apply independently

            1. 4

              Exactly.

          2. 4

            just delete the entry in the map file

            Do you mean remove just the username/email mapped to the public key or do you remove the public key as well?

            In the former case, what happens when someone sends a “right-to-be-forgotten request” to remove the public key? Surely the author could have used the same public key author at other public places from where they don’t want to remove it, and thus the key itself could be construed as PII.

            In the latter case, how do you actually verify the entire trail without the public keys?

          3. 13

            It’s easy to change the commit message to a different one after review, which makes it easy to hide changes.

            You can’t do that in Pijul, only the mapping between author and signing key.

            You can’t rely on the authorship information for copyright ownership auditing.

            Quite the opposite, the patches are signed! This is stronger than plaintext author names.

            You can’t detect tampering with commit messages / authorship if the repo is compromised.

            You can, if you trust the key or the reviewer’s key.

            1. 6

              Nice, thanks! Since you’re depending on keys, what is your revocation story? If my secret key is compromised, what do I do next?

              1. 2

                This isn’t completely done yet. We decided not to store keys inside the repo, mostly to make it fully mutable, but it isn’t fully fleshed out yet.

                1. 4

                  Thanks. With git, email addresses are immutable and so you can use that in auditing if you have some other mechanism for validating the addresses. For public-key crypto, I worry that this is a harder problem. With git, anyone can fake my email address but my repo can require pushes to have out-of-band authentication (for example, the FreeBSD git repo doesn’t accept pushes with the author set to anyone other than the account associated with the credentials used for the push). I know that any email address in the public FreeBSD git repo is the person to point to if there are problems with the code. An email address is an identifier. You can’t compromise an identifier, you can compromise only the authorisation that’s coupled to that identifier. In the case of FreeBSD, that’s an ssh private key, but that key can be revoked and then the attacker can’t impersonate that user anymore.

                  With a public key, I’m not sure what the infrastructure would look like. If a private key that is used directly to sign commits is compromised then I have no way of temporally bounding the scope of the compromise. Any patch signed with that key is suspect - it may be valid or it may come from the attacker. You might be able to manage with a trusted service that does the signing and includes a trusted time stamp, coupled with a revocation list, so that you can identify suspect patches and have them signed with the new key if they are valid.

                  In general, anything that depends on public-key crypto and doesn’t have a revocation mechanism is suspect. Things that depend on long-lived persistent signed artefacts and don’t have revocation and freshness mechanisms are suspect. Designing this correctly is incredibly hard.

                  1. 2

                    First, I want to stress out that there are two separate issues indeed:

                    • Public key signatures when authoring a patch.
                    • Public keys/Authentication used to run the command that applies it.

                    In general, anything that depends on public-key crypto and doesn’t have a revocation mechanism is suspect.

                    Absolutely. That story in Pijul is not complete, and adding revocation certificates shouldn’t be hard. Btw, the current keys have a non-optional expiration date.

                    Things that depend on long-lived persistent signed artefacts and don’t have revocation and freshness mechanisms are suspect. Designing this correctly is incredibly hard.

                    I fully agree, which is also one of the reasons for this beta: feedback and design discussions on issues like that need to happen before the full 1.0 version.

                    1. 5

                      First, I want to stress out that there are two separate issues indeed:

                      I agree. The one that I’m interested in is the first:

                      Public key signatures when authoring a patch.

                      This becomes a long-term attestation of authorship.

                      Absolutely. That story in Pijul is not complete, and adding revocation certificates shouldn’t be hard. Btw, the current keys have a non-optional expiration date.

                      But what does expiration mean? If I get a patch from a repo and it’s signed with a key that expired, does that mean I shouldn’t trust it? But the repo metadata says that it was committed a year ago, so does that mean I trust it?

                      The root question is: what is the trust relationship that you’re trying to establish with these signatures? A public-key signature is a verifiable attestation of something. For code signatures, it’s an attestation of a particular publisher. This is backed up by two things:

                      • The public key is signed by something (either a CA or by the distributor of the software). This is another layer of public-key-based attestation where the signing party attests that the key is owned by a specific entity and the entity then attests that they created the software. This depends on some form of trust root (typically a set of root certs distributed with the client).
                      • A revocation mechanism that allows you to stop installing the software after you learn that the key was compromised. At this point, the publisher can create a new key pair, sign a new version of the package with the new private key and ask the publisher or CA to sign the new public key.

                      For TLS, something similar works but the question of what to with revoked certs is easier because TLS is for interactive sessions and you don’t want to use a TLS cert to verify the authenticity of a connection log from a year ago.

                      There are a few ways that I can see how you’d apply the first part of this in the context of a revision-control system. For example, the repo could form part of a PKI system and sign the public keys of authorised committers to attest that they have gone through some (repo-specific) confirmation of identity. This, in turn, could be signed by a hosting service to attest that the repo used some specific authentication policy. I’m not really sure if any of that would work though.

                      The really tricky part is the second one. A patch in a repo is a long-lived artefact. It may exist for decades. If a key is compromised then it can be used to sign commits in a repo that has its date set in the past. This means that even if I know the date of compromise then the only thing that a signed patch gives me is an attestation that it was authored either by the entity I think created it or by an attacker. This is not a valuable thing to attest. Having a signature here actually makes the situation worse than just having an identity because you have something that looks like it is a trust anchor but isn’t.

                      I honestly have no idea what a useful solution looks like here. Perhaps you can establish a chain of trust over the flow of patches, rather than the patches themselves, so that it’s not the signature of the patch that matters but the signature of whoever gave you the patch (which could provide a chain of custody for the patch set)? That way, if a private key is compromised then it doesn’t matter because it’s only one of the things in the chain of custody and the next one attests that they are happy that this wasn’t one of the malicious uses of the compromised key. The chain of custody could also include the root of a Merkel tree of a CRL so that you can establish some partial ordering between patches being received by a repo and the signing key being revoked?

                      I fully agree, which is also one of the reasons for this beta: feedback and design discussions on issues like that need to happen before the full 1.0 version.

                      I’m looking forward to seeing what you end up with, the rest of the project looks fantastic.

                      1. 1

                        Having a trusted 3rd party attest that it was presented with a particular hash at or before a particular timestamp seems like a good idea. There’s no reason why you would need to have only TTP, too.

                        you have something that looks like it is a trust anchor but isn’t

                        Honestly I think that’s already a problem. It’s psychologically very easy to assume the authorship on unsigned patches is honest.

                        the next one attests that they are happy that this wasn’t one of the malicious uses of the compromised key

                        This sounds obviously good.

                  2. 1

                    It’s not clear to me that key revocation should be handled at the VCS protocol level.

                    The important thing is to get universal commit signing and allow layering of arbitrary trust/audit systems on top of that. Even if we could get a significant fraction of developers to invest in a web-of-trust setup … private keys are lost all the time, repo servers get hacked, and individual authors sometimes publish malicious changes.

                    What’s important (and what I am hoping Pijul has implemented) is to make it easy to track changes in trust at every level and enable clients/servers to support arbitrary security policies. Can a repo (like NPM) express that a password reset has occurred, a new 2FA token, N-of-M signatures, or some other new protocol we haven’t invented yet?

                    Even if Pijul just auto generates a key pair and stores it in paintext on the developer’s system, a repo server compromise would show a new developer key being used. That’s would be a huge step forward over Git’s status quo. For that reason functionality alone, I think Pijul should disabled unsigned commits by default.

              2. 1

                Having immutable data but mutable/evolvable metadata would be great. Mercurial has that (sort of) by allowing to mark some commits as deprecated/hidden. The key thing is to preserve history, but to allow to rework how it is presented (the main tree of changes). The GDPR use case is an interesting and tricky one.

              3. 4

                Pijul is darcs, but better? (I am very out of touch with the state of darcs, having started out using it but jumped ship to git when it became clear that git had all the mind share.)

                1. 15

                  Darcs but scalable, fast in 100% of cases, with reliable conflict handling, and without some of the “cool but not super useful” features such as darcs replace.

                  1. 7

                    I had the same question. From their docs:

                    Pijul for Darcs users Pijul is mostly a formally correct version of Darcs’ theory of changes, as well as a new algorithm for merging changes. Its main innovation compared to Darcs is to use a better data structure for its pristine, allowing for:

                    • A sane representation of conflicts: Pijul’s pristine is stored in a “conflict-tolerant” data structure. Many changes can be applied to it, and the presence or absence of conflicts are only computed afterwards, by looking at the pristine.

                    • Conflicting changes always commute in Pijul, and never commute in Darcs.

                    • Fast algorithms: Pijul’s pristine can be seen as a “cache” of applied changes, to which new changes can be applied directly, without having to compute anything on the repository’s history.

                    However, Pijul’s pristine format was designed to comply with axioms on a specific set of operations only. As a result, some of darcs’ features, such as darcs replace, are not (yet) available.

                    1. 3

                      At first approximation, yes. It uses a different underlying theory, to my understanding, though.

                    2. 4

                      @pmeunier, I have questions. I’m interested in patch theory and patch-based version control.

                      However, I just can’t understand many things about it.

                      Here’s what I understand:

                      • The idea of using patches instead of snapshots.
                      • Patch commutation and how that helps with better merging.
                      • Most of being able to figure out if a patch depends on another patch.
                      • Pijul’s pristine can store conflicts.

                      What I don’t understand:

                      • Darcs merge algorithm.
                      • Why it goes exponential and when.
                      • Why storing conflicts in the pristine avoids the problem.
                      • Is Pijul’s pristine just a cache that can also store conflicts?

                      If you can explain those things to someone who struggles to read academic material, that would be great, although I do know that your work has been stolen before, so if you don’t want to explain, I understand.

                      1. 3

                        I don’t know how Darcs worked, but it doesn’t have a datastructure independent from the patches: the patches are applied to a plain text version of the repository. In the absence of conflicts, every patch applied needs to be checked with all the patches since the last tag for commutation. Applying n patches cannot be faster than n^2.

                        When there are conflicts, I’m not sure anyone knew why it went exponential time, but it’s apparently fixed now (still quadratic, whereas Pijul is log).

                        Pijul’s pristine is not “just a cache”, it’s a CRDT. You can think of it as a cache if you want.

                      2. 3

                        In general I hope Pijul really takes off. In the small, I find it hard to work with due to the lack of tools which already exist for both Git and Fossil. I wonder what the solution is in this space.

                        1. 3

                          lovely!

                          I find it weird that unrecord isn’t named unapply, since apply works on changes, whereas record works only on working copy files.

                          In my mind, it would have been clearer to keep working-copy-related stuff as “record” terminology, and change-related stuff to the “apply” terminology.

                          But maybe I’m missing something?

                          1. 2

                            I’m excited for this. I started looking into pijul recently, to see if I could use it as my main VCS. I think there’s some differences in how the Pijul devs think about version control – at least, different from how I do.

                            They seem to not be too big on branches. I haven’t quite figured this one out yet; it seems pretty widely accepted in the programming world.

                            It seems to be very much written for people who understand the pijul internals. doing a pijul diff shows metadata needed if…you are making a commit out of the diff?

                            I would think a “what’s changed in this repository” is a pretty base-level query. They seem to not think it’s especially important; the suggested replacement of pijul diff --short works but is not documented for this. For example, it shows information that is not in pijul diff – namely, commits not added to the repository yet.

                            I also want to see if I can replicate git’s staging area, or have a similarly safe, friendly workflow for interactive committing. It seems like most VCSs other than git don’t understand the use cases for the staging area.

                            1. 3

                              They seem to not be too big on branches. I haven’t quite figured this one out yet; it seems pretty widely accepted in the programming world.

                              Curious about where you got that from, I even wrote the most painful thing ever, called Sanakirja, just so we could fork databases and have branches in Pijul.

                              Now, branches in Git are the only way to work somewhat asynchronously. Branches have multiple uses, but one of them is to keep your work separate and delay your merges. Pijul has a different mechanism for that, called patches. It is much simpler and more powerful, since you can cherry-pick and rebase patches even if you didn’t fork in the first place. In other words, you can “branch after the fact”, to speak in Git terms.

                              I would think a “what’s changed in this repository” is a pretty base-level query

                              So do the authors, they just think slightly differently from Git’s authors. pijul diff shows a draft of the patch you would get if you recorded. There is no real equivalent of that in Git, because a draft of a commit doesn’t make sense.

                              I also want to see if I can replicate git’s staging area

                              One thing you can do (which I find easier than the index) is record and edit your records in the text editor before saving.

                              1. 7

                                (Thanks pmeunier for the interesting work!)

                                I found the discussion of branches in your post rather confusing. (I use git daily, and I used darcs heavily years ago and forgot large parts of it.) And in fact I’m also confused the About channels mention in the README, and the Channels documentation in the manual. I’m trying to explain this here in case precise feedback can be useful to improve the documentation.

                                Your explanation, here and in the manual, focuses on differences in use-cases between Git branches and channels. This is confusing because (1) the question is rather “how can we do branches in Pijul?”, not “what are fine-grained differences between what you do and git branches?”, and because (2) the answer goes into technical subtleties or advanced ideas rather quickly. At the end I’m not sure I have understood the answer (I guess I would if I was very familiar with Pijul already), and it’s not an answer to the question I had.

                                My main use of branches in git is to give names to separate repository states that correspond to separate development activities that should occur independently of each other. In one branch I’m trying to fix bug X, in another branch I’m working on implementing feature Y. Most branches end up with commits/changes that are badly written / buggy / etc., that I’m refining other time, and I don’t want to have them in the index when working on something else.

                                So this is my question: “how do you work on separate stuff in Pijul?”. I think this should be the main focus of your documentation.

                                There are other use-cases for branches in git. Typically “I’m about to start a difficult rebase/merge/whatever, let me create a new branch foo-old to have a name for what I had before in case something blows up.”, and sometimes “I want to hand-pick only commit X, Y and Z of my current work, and be able to show them separately easily”. I agree that most of those uses are not necessary in patch-based systems, but I think you shouldn’t spend too much answer surface to point that out. (And I mostly forget about those uses of branches, because they are ugly so I don’t generally think about them. So having them vaguely mentioned in the documentation was more distracting than helfpul.)

                                To summarize:

                                • There is a “good” use-case for branches, namely keeping track of separate development activities on the same repository that should remain independent, and some “bad” use-cases, namely all the rest.
                                • I think that when people ask “how do we do branches?”, they have the good use-case in mind, so please start by answering about this clearly
                                • It’s okay to mention that the bad use-cases are mostly not needed in Pijul anymore, but I think to most people they are an afterthought so I wouldn’t focus on that.

                                The Pijul documentation writes: “However, channels are different from Git branches, and do not serve the same purpose.”. I think that if Channels are useful for the “good use case” given above, then we should instead consider than they basically serve the same purpose as branches.

                                Note: the darcs documentation has a better explanation of “The darcs way of (non-)branching”, showing in an example-based way a situation where talking about patches is enough. I think it’s close to what you describe in your documentation, but it is much clearer because it is example-based. I still think that they spend too much focus on this less-common aspect of branches.

                                Finally a question: with darcs, the obvious answer to “how to do branches?” is to simply use several clones of the same repository in different directories of my system, and push/pull between them. I assume that the same approach would work fine with pijul. What are the benefits of introducing channels as an extra concept? (I guess the data representation is more compact, the dcvs state is not duplicated in each directory?) It would be nice if the documentation of channels would answer this question.

                                1. 2

                                  So this is my question: “how do you work on separate stuff in Pijul?”

                                  This all depends on what you want to do. The reason for your confusion could be because Pijul doesn’t enforce a strict workflow, you can do whatever you want.

                                  If you want to fork, then so be it! If you’re like me and don’t want to worry about channels/branches, you can as well: I do all my reviewing work on main, and often write drafts of patches together in the same channel, even on independent features. Then, I can still push and pull whatever I want, without having to push the drafts.

                                  However, if you prefer to use a more “traditional” Git-like way of working, you can too. The differences between these two ways isn’t a huge as a Git user would think.

                                  Edit: I do use channels sometimes, for example when I want to expose two different versions of the same project, for example if that project depends on an fast-moving library, and I want to have a version compatible with the different versions of that library.

                                  1. 2

                                    But if you work on different drafts of patches in the same channel, do they apply simultaneously in your working copy? I want to work on patches, but then leave them on the side and not have them in the working copy.

                                    Re. channels: why not just copy the repository to different directories?

                                    1. 1

                                      They do apply to the same working copy, and you may need multiple channels if you don’t want to do that.

                                      Re. channels: why not just copy the repository to different directories?

                                      Channel fork copies exactly 0 byte, copying a repository might copy gigabytes.

                                2. 1

                                  I use git and don’t typically branch that much. All a branch is a sequence of patches and since git lets me chop and slice patches in whatever way I want to, it seems like its usually overkill to create branches for things. Just makes your changes and build the patch chains you want when you want to, how you want to.

                                  1. 1

                                    Then you might feel at home with Pijul. Pijul will give you the additional ability to push your patches independently from each other, potentially to different remote channels. Conversely, you’ll be able to cherry-pick for free (we simply call that “pulling” in Pijul).

                                3. 1

                                  They seem to not think it’s especially important; the suggested replacement of pijul diff –short works but is not documented for this.

                                  A bit lower in the conversation the author agrees that a git status command would be useful but they don’t have the time to work on it at the time of writing. My guess is that it is coming and the focus is on a working back-end at the moment.

                                4. 1

                                  This is very exciting. I hope i can find sometimes to try this out soon.

                                  A bit iffy when the import step said to load entire history into memory 👀

                                  1. 10

                                    A bit iffy when the import step said to load entire history into memory 👀

                                    It sounds scary, it is absolutely fine in practice, the abstract representation of the graph is actually tiny on all real-world repos. The contents of commits aren’t loaded at that step.

                                  2. 1

                                    First thing I see trying to build this version is this dependency build failure. Sent a merge request there. Then I just needed to actually download it without cargo doing it behind the scenes so that I’d have a chance to insert the patched dependency :) The URL for the raw crate tarball was https://crates.io/api/v1/crates/pijul/1.0.0-beta/download

                                    Converting Pijul repositories to Git, or at least do something to make it easy to switch back and forth

                                    Related: will there be support for continuously pulling new changes from git, and would it be possible to have a command that would just export particular changes in git format-patch (email diff) format? My FreeBSD “patchset style” (constantly rebased) fork is what I would like to try pijul on, and these are the two required features.

                                    working on large binary files

                                    There’s a fun special case of those: “binary files” that really are archives full of, say, XML. Git has a filters feature for this — hopefully pijul will have something similar?

                                    1. 2

                                      Related: will there be support for continuously pulling new changes from git

                                      This is already the case, if I’m not mistaken. The CLI isn’t great if you have more than one channel (i.e. if your Git repo merges rather than rebases).

                                      hopefully pijul will have something similar?

                                      Yes, that would be cool indeed. Some other patches need to operate on words rather than lines, and Pijul can already handle that well. The interface for that is missing, but shouldn’t be hard to write, if you’re interested in contributing (I can mentor!).

                                      1. 1

                                        This is already the case

                                        Nice. I kinda expected that but pijul git --help really makes it look like a one-time import is all there is.