1. 3

    +Semi-related work

    +Replacements (refs/replace) are superficially similar to obsolescences in that they describe that one commit should be replaced by another.

    git replace is new to me. When did it appear? Is anyone using it?

    I’ve also been meaning to try git absorb which looks thoughtfully designed.

    1. 4

      If I remember correctly, git replace was useful to me in a case where an old CVS repository had been imported without history to git and then worked on for a long, long time. Much later, the old CVS history was preserved by using a decent cvs2git kind of utility, creating another git repository. Git replace would then allow me to stitch these two repositories together, effectively prepending history (something which would be impossible to do with a normal git parent commit ref, without completely cracking sha1)

      1. 3

        Absorb is really nice if you’re using a fixup-heavy workflow. Fast, too.

        1. 1

          Are you an actual user of git absorb? Or are you talking about the hg original? I’ve wondered how usable it is for git already.

          1. 1

            Yes I use git absorb. Haven’t run into any problems yet 🙂

        2. 2

          I’m hoping to eventually integrate git absorb into git revise, which is basically a faster in-memory version of git rebase for rebases that don’t change the final file contents, just reorder when changes happen (i.e. 90% of my use cases for git rebase)

          1. 3

            Since that’s in Python you could probably take advantage of the linelog implementation that’s underneath hg absorb to make the work easier. I recommend looking at the hg absorb code. Linelog makes it easy and slightly more robust than just moving patch hunks around.

        1. 3

          Oddly enough, hg is using a modified version of git’s xdiff code, with patience diffing ripped out: https://www.mercurial-scm.org/repo/hg/log?rev=34e2ff1f9cd8%2B9e7b14caf67f

          There are other modifications since then, but I don’t think anything that’s caused the algorithm to diverge a ton.

          1. 2

            Oh, interesting. The linked issue also has a good discussion. https://bz.mercurial-scm.org/show_bug.cgi?id=4074

          1. 20

            I do agree with the theme of this post: at scale software is complex. Whether you use a monorepo or polyrepos, you’ll need a lot of complicated tooling to make things manageable for your developers.

            But I want to draw attention to sfink’s excellent rebuttal (full disclosure, he is a colleague of mine).

            Additionally, I’d like to address the VCS Scalablilty downside. The author’s monorepo experience seems to be with Git. Companies like Google, Facebook (who have two of the largest monorepos in the world) and to a lesser extent Mozilla all use Mercurial for a reason: it scales much better. While I’m not suggesting the path to get there was easy, the work is largely finished and contributed back upstream. So when the author points to Twitter’s perf issues or Microsoft’s need for a VFS, I think it is more a problem related to using the wrong tool for the job than it is something inherently wrong with monorepos.

            1. 5

              I was under the impression (possibly mistaken) that Google still used perforce predominantly (or some piper wrapper thing), with a few teams using mercurial or git for various externally visible codebases (android, chrome, etc).

              1. 10

                Perforce has been gone for quite a while. Internal devs predominantly use Piper, though an increasing group is using Mercurial to interact with Piper instead of the native Piper tooling. The Mercurial install is a few minor internal things (eg custom auth), evolve and core Mercurial. We’ve been very wary of using things outside of that set, and are working hard to keep our workflow in line with the OSS Mercurial workflow. An example of something we’ve worked to send upstream is hg fix which helps you use a source code formatter (gofmt or clang-format) as you go, and another is the narrow extension which lets you clone only part of a repo instead of the whole thing.

                Non-internal devs (Chrome, Android, Kubernetes, etc etc) that work outside of Piper are almost exclusively on Git, but in a variety of workflows. Chrome, AIUI is one giant git repo of doom (it’s huge), Android is some number of hundreds (over 700 last I knew?) of git repos, and most other tools are doing more orthodox polyrepo setups, some with Gerrit for review, some with GH Pull Requests, etc.

                1. 3

                  Thanks for the clarification, sounds like Piper is (and will continue to be) the source of truth while the “rollout” Greg mentioned is in reference to client side tooling. To my original point, Google still seems to have ended up with the right tool for the job in Piper (given the timeline and alternatives when they needed it).

                  1. 2

                    But how does Mercurial interact with Piper? Is Mercurial a “layer” above Piper? Do you have a Mercurial extension that integrates with Piper?

                    1. 3

                      We have a custom server that speaks hg’s wire protocol. Pushing to piper exports to the code review system (to an approximation), pulling brings down the new changes that are relevant to your client.

                      (Handwaving because I’m assuming you don’t want gory narrow-hg details.)

                      1. 2

                        It’s a layer, yeah. My understanding is that when you send out a change, it makes Piper clients for you. It’s just a UX thing on top of Piper, not a technical thing built into it.

                    2. 2

                      I’m fuzzy on the details, but my understanding is that they’re in the middle of some sort of phased Mercurial rollout. So it’s possible only a sample population of their developers are using the Mercurial backend. What I do know is that they are still actively contributing to Mercurial and seem to be moving in that direction for the future.

                      1. 1

                        I wonder if they are using some custom mercurial backend to their internal thing (basically a VFS layer as the author outlined)? It would be interesting to get some first of second hand information on what is actually being used, as people tend to specifically call out Google and Facebook as paragons of monorepos.

                        My feeling is that google/facebook are both huge organizations with lots of custom tooling and systems. /Most/ companies are not google/facebook nor have google/facebook problems.

                        1. 6

                          This is largely my source (in addition to offline conversations): https://groups.google.com/forum/#!topic/mozilla.dev.version-control/hh8-l0I2b-0

                          The relevant part is:

                          Speaking of Google, their Mercurial rollout on the massive Google monorepo continues. Apparently their users are very pleased with Mercurial - so much so that they initially thought their user sentiment numbers were wrong because they were so high! Google’s contribution approach with Mercurial is to upstream as many of their modifications and custom extensions as possible: they seem to want to run a vanilla Mercurial out-of-the-box as possible. Their feature contributions so far have been very well received upstream and they’ve been contributing a number of performance improvements as well. Their contributions should translate to a better Mercurial experience for all.

                          So at the very least it seems they endeavour to avoid as much custom tooling on top of Mercurial as possible. But like you said, they have Google problems so I imagine they will have at least some.

                          1. 6

                            Whoa. This could be the point where Mercurial comes back after falling behind git for years.

                            Monorepo sounds sexy because Facebook and Google use that. If both use Mercurial and open source their modifications then Mercurial becomes very attractive suddenly.

                            In git, neither submodules nor LFS are well integrated and generate pain for lots of developers. If Mercurial promises to fix that many will consider to switch.

                            Sprinkling some Rust into the code base probably helps to seduce some developers as well.

                            1. 10

                              Narrow cloning (authored by Google) has been OSS from the very start, and now ships in the hg tarball. If you’ve got need of it, it’s still maturing (and formats can change etc) but it’s already in use by at least 3 companies. I’d be happy to talk to anyone that might want to deploy hg at their company, and can offer at least some help on narrow functionality if that’s needed.

                            2. 1

                              Thanks for digging!
                              Pretty interesting for sure.

                        2. 0

                          I’m getting verification from someone at Google, but the quick version as I understood it:

                          Google hasn’t actually used Perforce for a long time. What they had was a Perforce workalike that was largely their own thing. They are now using normal Mercurial.

                          1. 12

                            This isn’t true, Google uses Piper (their perforce clone) internally. Devs have the option of using mercurial or git for their personal coding environments, but commits get converted to piper before they land in the canonical monorepo.

                            1. 2

                              I’ll ping @durin42; I don’t think I’m misremembering the discussion, but I may have misunderstood either the current state or implementation details.

                        3. 3

                          What is it about git that makes it a poor choice for very large repos?

                          What does Mercurial and Perforce do differently?

                          1. 2

                            In addition to the article @arp242 linked, this post goes into a bit more technical detail. Tl;dr, it’s largely due to how data is stored in each. Ease of contribution is another reason (scaling Git shouldn’t be impossible, but for one reason or another no one has attempted it yet).

                            1. 1

                              Microsoft has a 300GB git repo. They built a virtual file system to make it work.

                              1. 1

                                True, but in the scalability section of the article the author argues that the need for a VFS is proof that monorepos don’t scale. So I think most of this thread is centered around proving that monorepos can scale without the need for a VFS.

                                I agree that a VFS is a perfectly valid solution if at the end of the day the developers using the system can’t tell the difference.

                            2. 2

                              Facebook wrote about Scaling Mercurial at Facebook back in 2014:

                              After much deliberation, we concluded that Git’s internals would be difficult to work with for an ambitious scaling project. [..] Importantly, it [mercurial] is written mostly in clean, modular Python (with some native code for hot paths), making it deeply extensible.

                              It’s a great example of how applications in a slower language can be made better performing than applications in a faster language, just because it’s so much easier to understand and optimize.

                          1. 1

                            0 refers to the first changeset in the repo, which is handy. There’s no easy way to refer to this in git

                            Can there be multiple first changesets in a Mercurial repo? There can be in in git and I find it helpful. It happens if you merge two repos.

                            1. 2

                              That’s possible in hg with “pull –force”, you’d end up with two changesets and are children of the null changeset. The second descendent of null would not have changset number 0 though.

                              1. 2

                                Yep. You could use hg log -r root() to see all of the root revisions though.

                              1. 9

                                A historical anecdote: I remember the sprint when mpm[0] first demoed revsets. This was back probably 8 or so years ago. At the time, everyone in the room felt like it was a good way to avoid some gross option proliferation and hopefully have saner syntax than some of the equivalent git features. It demoed well, but we didn’t figure it’d be as valuable as it was. About two weeks later the consensus on IRC was that such functionality was clearly necessary, and we’ve been telling anybody that’ll listen about them ever sense. I really wish the Git community would come around on revsets, but so far they seem stuck in the “neat demo” phase of understanding revsets.

                                These days, I probably use revsets at least once a week to find a patch: often I’ll remember reviewing a change from a specific person that touched a given file, but that’s all I know, or I’ll remember a couple of words in a commit message and need to dig up the change that way.

                                0: mpm is the original author of Mercurial.

                                1. 4

                                  I’m sure this has come up but I can’t see any discussion about it - any chance at all of adding mercurial support? If no would you be open to someone else adding it? The key is to not encode too much git-specific stuff into the UI or backend which makes it impossible to support other VCS systems.

                                  1. 1

                                    At the moment this is unlikely.

                                    1. 4

                                      Unlikely that you’ll do it, or also unlikely that you’d accept patches from someone that ended up enthusiastic?

                                      (Your roadmap for a review tool is extremely similar to what the hg community has discussed building for itself.)

                                      1. 2

                                        Well, it’d be a lot of work, both to implement and to review the patches. It’s not too far gone yet… why don’t you start a conversation on sr.ht-discuss?

                                  1. 4

                                    I’m going to be very curious if this actually goes anywhere. During the Bazaar retrospective, I remember Jelmer commenting that one specific feature of Bazaar—its ability to work transparently inside Git repositories—was a misfeature he regretted. I was a bit surprised by that at the time; Mercurial generally feels that it’d be great to have better interop with Git, and there have even been projects such as hgit (directly use the Mercurial UI on .git repos) and hg-git (use Mercurial to clone and work with Mercurial copies of remote Git repositories; this is also the track I took in Kiln Harmony) to try to achieve that.

                                    (BTW, neither hgit nor hg-git are official Mercurial projects, but both were started by core Mercurial contributors, and the latter remains very actively maintained.)

                                    I’m not personally convinced there is enough interest in Bazaar, or enough legacy Bazaar repositories in active use, to really justify maintaining Bazaar at this point, but I’m really unsure that there’s enough room in this space to launch an third island of DVCS into the existing landscape. The ability to use Bazaar to work with Git seemed like one of its few bright stars; I’m not sure how Breeze will get any initial traction at this point.

                                    1. 4

                                      Nit: hg-git was actually started by a GitHub employee (he got to it a few weeks before our GSoC student was to work on the very same thing). In order to help the GSoC student, I made the code more pythonic and added some tests, and then ended up holding the bag for several years.

                                      I’ve since given up maintainership of hg-git, because I never use it. I still want to try hgit again some day, but there’s many miles of core hg refactoring to go before it’s worth attempting.

                                    1. 3

                                      That’s an impressively thorough proposal. I’m quite happy that Facebook is using Mercurial, it helps drive innovation and helps having an alternative to Git that works as scale. Now, I’m not sure how much performance they’ll get: many extensions are in Python, which is good thing for extensibility, so unless they port extensions to Rust (and lose the flexibility of Python), they won’t get that much of an improvement. Or did I miss something?

                                      1. 7

                                        Extensions would still benefit from their primitives being faster. They appreciate that issues around FFI might arise and passing from Rust to Python and back in quick succession is definitely one of them.

                                        1. 5

                                          Yeah, FFI speed is a concern, and ideally it’d be easier to implement an entire class in Rust and expose some methods to Python, because then it’d be easier to move some low-level parsers into Rust. I did a naive implementation of one of our parsers using nom and it was 100x (not a typo, one hundred times) faster than the C version, but the FFI overhead ended up making it not a win.

                                          1. 1

                                            Out of curiosity, why is the Rust-Python FFI slower than C-Python FFI? I thought that Rust could generate C-callable symbols and call C directly. On that topic, I wrote a PEG parsing library in C with Python bindings and in production workloads 90% of the time is spend in FFI and object creation.

                                            1. 1

                                              Well, with C, there’s always the possibility to write a python extension. This is not a generic FFI then.

                                              Often, the issue there - and, reading python.h etc. at a glance - is that many interpreters allow direct manipulation of their memory structures (including creating and destroying objects). For that, they ship a lot of macros and definitions. You cannot use those in Rust directly. There’s two approaches to that: write a library that does exactly what python.h (on every version of the interpreter!) does and use that. Alternative: write a small C shim over your Rust code that does just the parts you need.

                                              1. 1

                                                The big issue seemed to be in creating all the Python objects - I was returning a fairly large list of tuples, and the cpython extension could somewhat intelligently preallocate some things, whereas the Rust I think was having to be a bit dumber due to the wrappers in play.

                                          2. 1

                                            As an addition to the point about primitives: there are cheap operations that are fast even in Python, there are expensive operations you run several times a day and there are rare operations where you need flexibility and mental-model-fit but can accept poor performance. Having better performance for frequent operations while keeping the flexibility for the long tail could be a win (depends on the effort required and usage patterns, of course).

                                          1. 9

                                            Leiningen for Clojure once again defaults to the latest version.

                                            Leiningen doesn’t default to any latest version as far as I know. Leiningen does

                                            1. reproducible dependency resolution: For the same dependency list, you always get the same dependencies ¹
                                            2. in no way “default” to anything, as inserting dependencies and their versions is the user’s responsibility

                                            Versioning/pinning is not only about having an API-compliant library though, it’s also about being sure that you can build the exact same version of your program later on. Hyrum’s Law states that any code change may effectively be a breaking one for your consumers. For example:

                                            • Fixed a bug in your library? Someone will depend on the buggy behaviour, or attempt to fix the bug downstream while it’s still an issue. If you forget to quote apostrophes, for example, fixing that in a bugfix release may cause some tools to double escape them.
                                            • Fixed an edge case/security issue? You’ve most likely introduced some checks which will have some overhead. If your library is used in a hot spot for a consumer, then it may lead to performance degradation of the program they’ve made.
                                            • User complains that an old version of your software is buggy/breaks, but you cannot reproduce on HEAD and you want to know what fixed it? That’s hard if you cannot use old dependencies. If git-bisect doesn’t test a commit with the dependencies you used at the commit time, you’re not testing your software as it was at that point in time. And if the bug is upstream, it’s hard to figure out what dependency caused it and how it was fixed.

                                            Of course, pinning is not a panacea: We usually want to apply security issues and bugfixes immediately. But for the most part, there’s no way we can know a priori that new releases will be backwards compatible for our software or not. Pinning gives you the option to vet dependency updates and defer them if they require changes to your system.

                                            1: Unless you use version ranges or dependencies that use them. But that happen so infrequently and is strongly advised against – I don’t think I’ve ever experienced it in the wild.

                                            1. 3

                                              Hyrum’s Law

                                              FYI, Hyrum finally made http://www.hyrumslaw.com/ with the full observation. Useful for linking. :)

                                              1. 2

                                                Hmm, perhaps I misunderstood the doc I read. I’m having trouble finding it at the moment. I’m not a Clojure user. Could you point me at a good link? Do library users always have to provide some sort of version predicate for each dependency?

                                                Your point about reproducing builds is a good one, but it can coexist with my proposal. Imagine a parallel universe where Bundler works just like it does here and maintains a Gemfile.lock recording precise versions in use for all dependencies, but we’ve just all been consistently including major version in gem names and not foisting incompatibilities on our users. Push security fixes and bugfixes, pull API changes.

                                                Edit: based on other comments I think I’ve failed to articulate that I am concerned with the upgrade process rather than the deployment process. Version numbers in Gemfile.lock are totally fine. Version numbers in Gemfile are a smell.

                                                1. 3

                                                  Oh, yes, sorry for not being clear: I strongly agree that version “numbers” might as well be serial numbers, checksums or the timestamp it was deployed. And I think major versions should be in the library name itself, instead of in the version “number”.


                                                  In Leiningen, library users always have to provide some sort of version predicate for each dependency, see https://github.com/technomancy/leiningen/blob/master/doc/TUTORIAL.md#dependencies. There is some specific stuff related to snapshot versions and checkout dependencies, but if you try to build + deploy a project with those, you’ll get an error unless you setup some environment variable. This also applies to boot afaik ; the functionality is equivalent with how Java’s Maven works.

                                                  1. 2

                                                    Thanks! I’ve added a correction to OP.

                                                    1. 1

                                                      Hmm, I’ve been digging more into Leiningen, and growing increasingly confused. What’s the right way to say, “give me the latest 2.0 version of this library”? It seems horrible that the standard tutorial recommends using exact versions.

                                                      1. 3

                                                        There’s no way to do that. The Maven/JVM dependency land always uses exact versions. This ensures stability.

                                                1. 1

                                                  Your two submissions make me think David A Wheeler’s summary of SCM security is still timely since the [D]VCS’s on average aren’t built with strong security in architecture or implementation. The only two I know that tried in architecture/design at least were Aegis and especially Shapiro et al’s OpenCM:

                                                  http://aegis.sourceforge.net/propaganda/security.html

                                                  https://web.archive.org/web/20070623124209/http://www.opencm.org/docs.html

                                                  Both are defunct since they didn’t get popular. I think it would be beneficial for someone to apply the thinking in Wheeler’s summary and linked papers (esp on high-assurance) to modern DVCS to see what they have and don’t have. Plus the feasibility of strong implementation. I think my design in the past was just the standard mediating and logging proxy in front of a popular VCS with append-only logs of the code itself. A default for when you have nothing better.

                                                  1. 5

                                                    I think that’s rather orthogonal. The problem is everybody implemented a “run more commands” feature which runs more commands. It’s not really about the integrity of the code in the repo.

                                                    In a sense, yes, if the repo was a read only artifact everything would be safer. But somehow we decided that repos need to be read/execute artifacts with embedded commands in them. Behold, the “smart” repo. Crypto signing that doesn’t make it safer.

                                                    1. 3

                                                      I’ve seen the “dumb” source control tool - speed is a feature, and without a “smart” transport layer of some kind your push/pull or checkin/checkout times become pretty awful. Just compare CVS-via-pserver to Subversion, or tla to bzr.

                                                      The thing that’s surprising to me is that it took well over a decade for anyone to notice this problem, since it’s been present in Subversion all these years…

                                                      1. 5

                                                        My takeaway is that argv parsing is too fragile to serve as an API contract. And I doubt very much this is the first and only bug of its kind.

                                                        If SSH transport had been implemented with calls to some SSH library instead of a fork+exec to an external ‘ssh’ program, this bug would not have happened as it did.

                                                        1. 2

                                                          Oh, absolutely argv is too fragile. I’m surprised even considering that this bug survived so long.

                                                    1. 17

                                                      This fucks bisect, defeating one of the biggest reasons version control provides value.

                                                      Furthermore, there are tools to easily take both approaches simultaneously. Just git merge —squash before you push, and all your work in progress diffs get smushed together into one final diff. And, for example, Phabricator even pulls down the revision (pull request equivalent) description, list of reviewers, tasks, etc, and uses that to create a squash commit of your current branch when you run arc land.

                                                      1. 7

                                                        I’m surprised to hear so many people mention bisect. I’ve tried on a number of occasions to use git bisect and svn bisect before that, and I don’t think it actually helped me even once. Usually I run into the following problems:

                                                        • there is state that is essential to exercising the test case I’m interested in which isn’t in source control (e.g. configuration files, databases, external services) and the shape of the data in these places needs to change to exercise different versions of the code
                                                        • the test case passes/fails at different points in the git history for reasons unrelated to the problem that I’m investigating

                                                        I love the idea of git bisect but in practice it’s never been worth it for me.

                                                        1. 14

                                                          Your second bullet point suggests to me bisect isn’t useful to you in part because you’re not taking good enough care of your history and have broken points in it.

                                                          I bisect things several times a month, and it routinely saves me hours when I do. By not keeping history clean as others have talked about, you ensure bisect is useless even for those developers who do find it useful. :(

                                                          1. 6

                                                            Right: meaningful commit messages are important but a passing build for each commit is essential. A VCS has pretty limited value without that practice.

                                                            1. 1

                                                              It does help that your commits be at clean points but isn’t really necessary - you don’t need to run your entire test suite. I usually will either bisect with a single spec or isolate the issue to a script that I can run against bisect. And as mentioned in other places you can just bisect manually.

                                                          2. 6

                                                            You can run bisect in an entirely manual mode where git checks out the revision for you to tinker with and before marking the commit as good or bad.

                                                            1. 3

                                                              There are places where it’s not so great, and there are places where it’s a life-saving tool. I work (okay, peripherally… mostly I watch people work) on the Perl 5 core. Language runtime, right? And compatibility is taken pretty seriously. We try not to break anyone’s running code unless we have a compelling reason for it and preferably they’ve been given two years' warning. Even if that code was written in 1994. And broken stuff is supposed to stay on branches, not go into master (which is actually named “blead”, but that’s another story. I think we might have been the ones who convinced github to allow a different default branch because having it fail to find “master” was kind of embarrassing).

                                                              So we have a pretty ideal situation, and it’s not surprising that there’s a good amount of tooling built up around it. If you see that some third-party module has started failing its test suite with the latest release, there’s a script that will build perl, install a given module and all of its dependencies, run all of their tests along the way, find a stable release where all of that did work, then bisect between there and HEAD to determine exactly what merge made it started failing. If you have a snippet of code and you want to see where it changed behavior, use bisect.pl -e. If you have a testcase that causes weird memory corruption, use bisect.pl --valgrind and it will tell you the first commit where perl, run with your sample code, causes valgrind to complain bitterly. I won’t say it works every time, but… maybe ¾ of the time? Enough to be very worth it.

                                                            2. 0

                                                              This fucks bisect, defeating one of the biggest reasons version control provides value.

                                                              No it doesn’t. Bisect doesn’t care what the commit message is. It does care that your commit works, but I don’t think the article is actually advocating checking in broken code (despite the title) - rather it’s advocating committing without regard to commit messages.

                                                              Just git merge —squash before you push, and all your work in progress diffs get smushed together into one final diff.

                                                              This, on the other hand, fucks bisect.

                                                              1. 3

                                                                Do you know how bisect works? You are binary searching through your commit history, usually to find the exact commit that introduced a bug. The article advocates using a bunch of work in progress commits—very few of which will actually work because they’re work in progress—and then landing them all on the master branch. How exactly are you supposed to binary search through a ton of broken WIP commits to find a bug? 90% of your commits “have bugs” because they never worked to begin with, otherwise they wouldn’t be work in progress!

                                                                Squashing WIP commits when you land makes sure every commit on master is an atomic operation changing the code from one working state to another. Then when you bisect, you can actually find a test failure or other issue. Without squashing you’ll end up with a compilation failure or something from some jack off’s WIP commit. At least if you follow the author’s advice, that commit will say “fuck” or something equally useless, and whoever is bisecting can know to fire you and hire someone who knows what version control does.

                                                                1. 1

                                                                  Do you know how bisect works?

                                                                  Does condescension help you feel better about yourself?

                                                                  The article advocates using a bunch of work in progress commits—very few of which will actually work because they’re work in progress—and then landing them all on the master branch. How exactly are you supposed to binary search through a ton of broken WIP commits to find a bug? 90% of your commits “have bugs” because they never worked to begin with, otherwise they wouldn’t be work in progress!

                                                                  I don’t read it that way. The article mainly advocates not worrying about commit messages, and also being willing to commit “experiments” that don’t pan out, particularly in the context of frontend design changes. That’s not the same as “not working” in the sense of e.g. not compiling.

                                                                  It’s important that most commits be “working enough” that they won’t interfere with tracking down an orthogonal issue (which is what bisect is mostly for). In a compiled language that probably means they need to compile to a certain extent (perhaps with some workflow adjustments e.g. building with -fdefer-type-errors in your bisect script), but it doesn’t mean every test has to pass (you’ll presumably have a specific test in your bisect script, there’s no value in running all the tests every time).

                                                                  Squashing WIP commits when you land makes sure every commit on master is an atomic operation changing the code from one working state to another.

                                                                  Sure, but it also makes those changes much bigger. If your bisect ends up pointing to a 100-line diff then that’s not very helpful because you’ve still got to manually hunt through those changes to find the one that made the actual difference - at that point you’re not getting much benefit from having version control at all.

                                                            1. 2

                                                              The page mentions git specifically as being vulnerable. While I’m sure that’s true, it seems highly impractical to attempt to move git away from SHA1. Am I wrong? Could you migrate away from SHA1?

                                                              1. 17

                                                                [Edit: I forgot to add, Google generated two different files with the same SHA-1, but that’s dramatically easier than a preimage attack, which is what you’d need to actually attack either Git or Mercurial. Everything I said below still applies, but you’ve got time.]

                                                                So, first: in the case of both Mercurial and Git, you can GPG-sign commits, and that will definitely not be vulnerable to this attack. That said, since I think we can all agree that GPG signing every commit will drive us all insane, there’s another route that could work tolerably in practice.

                                                                Git commits are effectively stored as short text files. The first few lines of these are fixed, and that’s where the SHA-1 shows up. So no, the SHA-1 isn’t going anywhere. But it’s quite easy to add extra data to the commit, and Git clients that don’t know what to do will preserve it (after all, it’s part of the SHA-1 hash), but simply ignore it. (This is how Kiln Harmony managed to have round-trippable Mercurial/Git conversions under-the-hood.) So one possibility would be to shove SHA-256 signatures into the commits as a new field. Perfect, right?

                                                                Well, there are some issues here, but I believe they’re solvable. First, we’ve got a downgrade vector: intercept the push, strip out the SHA-256, replace it with your nefarious content that has a matching SHA-1, and it won’t even be obvious to older tools anything happened. Oops.

                                                                On top of that, many Git repos I’ve seen in practice do force pushes to repos often enough that most users are desensitized to them, and will happily simply rebase their code on top of the new head. So even if someone does push a SHA-256-signed commit, you can always force-push something that’ll have the exact same SHA-1, but omit the problematic SHA-256.

                                                                The good news is that while the Git file format is “standardized,” the wire format still remains a bastion of insanity and general madness, so I don’t see any reason it couldn’t be extended to require that all commits include the new SHA-256 field. I’m sure this approach also has its share of excitement, but it seems like it’d get you most of the way there.

                                                                (The Mercurial fix is superficially identical and practically a lot easier to pull off, if for no other reason than because Git file format changes effectively require libgit2/JGit/Git/etc. to all make the same change, whereas Mercurial just has to change Mercurial and chg clients will just pick stuff up.)

                                                                1. 18

                                                                  It’s also worth pointing out that in general, if your threat model includes a malicious engineer pushing a collision to your repo, you’re already hosed because they could have backdoored any other step between source and the binary you’re delivering to end-users. This is not a significant degradation of the git/hg storage layer.

                                                                  (That said, I’ve spent a decent chunk of time today exploring blake2 as an option to move hg to, and it’s looking compelling.)

                                                                  Edit: mpm just posted https://www.mercurial-scm.org/wiki/mpm/SHA1, which has more detail on this reasoning.

                                                                  1. 1

                                                                    Plenty of people download OSS code over HTTPS, compile it and run the result. Those connections are typically made using command line tools that allow ancient versions of TLS and don’t have key pinning. Being able to transparently replace one of the files they get as a result is reasonably significant.

                                                                    1. 1

                                                                      Right, but if your adversary is in a position that they could perform the object replacement as you’ve just described, you were already screwed. There were so many other (simpler!) ways they could own you it’s honestly not worth talking about a collision attack. That’s the entire point of both the linked wiki page and my comment.

                                                                  2. 2

                                                                    That said, since I think we can all agree that GPG signing every commit will drive us all insane, there’s another route that could work tolerably in practice.

                                                                    It is definitely a big pain to get gpg signing of commits configured perfectly, but now that I have it setup I always use it and so all my commits are signed. The only thing I have to do now is enter my passphrase the first time in a coding session that I commit.

                                                                    1. 4

                                                                      Big pain? Add this to $HOME/.gitconfig and it works?

                                                                      [commit]
                                                                          gpgsign = true
                                                                      
                                                                      1. 2

                                                                        Getting gpg and gpg-agent configured properly and getting git to choose the right key in all cases even when sub keys are around were the hard parts.

                                                                        1. 1

                                                                          That’s exactly what I did.

                                                                        2. 3

                                                                          Sorry, to rephrase: mechanically signing commits isn’t a big deal (if we skip past all the excitement that comes with trying to get your GPG keys on any computer you need to make a commit on), but you now throw yourself into the web-of-trust issues that inevitably plague GPG. This is in turn the situation that Monotone, an effectively defunct DVCS that predates (and helped inspire) Git, tried to tackle, but it didn’t really succeed, in my opinion. It might be interesting to revisit this in the age of Keybase, though.

                                                                        3. 2

                                                                          I thought GPG signing would alleviate security concerns around SHA1 collisions but after taking a look, it seems that Git only signs a commit object. This means that if you could make a collision of a tree object, then you could make it look like I signed that tree.

                                                                          Is there a form of GPG signing in Git which verifies more than just the commit headers and tree hash?

                                                                          1. 4

                                                                            You are now looking for a preimage collision, and the preimage collision has to be a fairly rigidly defined format, and has to somehow be sane enough that you don’t realize half the files all got altered. (Git trees, unlike commits, do not allow extra random data, so you can’t just jam a bunch of crap at the end of the tree to make the hash work out.) I’m not saying you can’t do this, but we’re now looking at SHA-1 attacks that are probably not happening for a very long time. I wouldn’t honestly worry too much about that right now.

                                                                            That said, you can technically sign literally whatever in Git, so sure, you could sign individual trees (though I don’t know any Git client that would do anything meaningful with that information at the moment). Honestly, Git’s largely a free-for-all graph database at the end of the day; in the official Git repo, for example, there is a tag that points at a blob that is a GPG key, which gave me one hell of a headache when trying to figure out how to round-trip that through Mercurial.

                                                                          2. 1

                                                                            Without gpg signing, you can get really bad repos in general. The old git horror story artile highlights these issues with really specific examples that are more tractable.

                                                                            Though, I don’t want to start a discussion on how much it sucks to maintain private keys, so sorry for the sidetrack.

                                                                            1. 1

                                                                              I don’t see why GPG-signed commits aren’t vulnerable. You can’t modify the commit body, but if you can get a collision on a file in the repo you can replace that file in-transit and nothing will notice.

                                                                              Transparently replacing a single source code file definitely counts as ‘compromised’ in my book (although for this attack the file to be replaced would have to have a special prelude - a big but not impossible ask).

                                                                            2. 4

                                                                              Here’s an early mailing list thread where this was brought up (in 2006). Linus’s opinion seemed to be:

                                                                              Yeah, I don’t think this is at all critical, especially since git really on a security level doesn’t depend on the hashes being cryptographically secure. As I explained early on (ie over a year ago, back when the whole design of git was being discussed), the security of git actually depends on not cryptographic hashes, but simply on everybody being able to secure their own private repository.

                                                                              1. 3

                                                                                the security of git actually depends on not cryptographic hashes, but simply on everybody being able to secure their own private repository.

                                                                                This is a major point that people keep ignoring. If you do one of the following:

                                                                                1. Clone a git repo from a thumb drive you found in a parking lot (or equivalent)
                                                                                2. Don’t review first what you merge into your repo.
                                                                                3. Don’t follow general security best practices

                                                                                then the argument that SHA3, or SHA256 should be used over SHA1 simply doesn’t matter.

                                                                                1. 2

                                                                                  And here’s the new thread after today’s announcement

                                                                                  (the first link in Joey Hess’s e-mail is broken, should be https://joeyh.name/blog/entry/sha-1/ )

                                                                              1. 5

                                                                                sienote: i think they look a bit like the classic plan9 fonts :)

                                                                                1. 6

                                                                                  Plan 9 fonts were also designed by B&H. Plan uses Lucida Sans Unicode and Lucida Typewriter as default fonts. Lucida Sans Unicode, with some minor alterations was renamed as Lucida Grande, the original system font on OS X, replaced only recently by Helvetica Neue. It’s funny that several people say this reminds them of Plan 9, but not OS X :-).

                                                                                  However, these fonts are more similar to the Luxi family of fonts (also from B&H) than the Lucida family.

                                                                                  Personally, I am going to continue programming (in acme, of course) using Lucida Grande (yes, I use a proportional font for programming).

                                                                                  1. 4

                                                                                    What do you like in acme, compared to other editors (vim, Emacs, Atom, Visual Studio Code, Sublime Text…)?

                                                                                    1. 7

                                                                                      Executable text, mutable text (including in win terminal windows), mouse chording, and mouse support in general, structural regexp, integrates well with arbitrary Unix tools, tiled window management, no distracting fluff; no options, no settings, no configuration files, no syntax highlighting, no colors.

                                                                                      Acme is by far the most important tool I use. If it were to disappear from the face of the earth, the first thing I would do is reimplement acme. Luckily, it would not take me very long as acme has very few features to implement, it relies on abstractions, not features.

                                                                                      A good demo: http://research.swtch.com/acme

                                                                                    2. 3

                                                                                      To expand on that, I think macOS uses San Francisco UI nowadays. Helvetica Neue didn’t last long.

                                                                                      1. 3

                                                                                        Indeed. AFAIK Helvetica Neue was only used by macOS 10.10 - it was replaced with (Apple-designed) San Francisco in 10.11.

                                                                                      2. 2

                                                                                        It’s funny that several people say this reminds them of Plan 9, but not OS X :-).

                                                                                        well, i’ve never really used os x ;)

                                                                                      3. 2

                                                                                        I loved the classic Plan 9 pelm font. The enormous and curvaceous curly brackets are still a wonder.

                                                                                      1. 3

                                                                                        How well does mecurial work with git servers? Does the hg-git bridge works properly in general? Do you use it in work? Most of my client related work is done on git and I don’t want to screw things.

                                                                                        1. 3

                                                                                          I know some people use hg-git, but how well it works depends heavily on your code review workflow. It gets kind of thorny around the edges when you want to edit history.

                                                                                          I’ve been tinkering with other ideas to try and make it more pleasant, but nothing real has materialized.

                                                                                        1. 7

                                                                                          This is a bit of an hg FAQ. Here are some responses from a recentish time when this was asked in HN:

                                                                                          https://news.ycombinator.com/item?id=9467096

                                                                                          In short: easier to use, has some powerful features that git doesn’t have, such as revsets, templating, tortoisehg, giant semi-centralised repos, and changeset evolution

                                                                                          1. 3

                                                                                            Performance wise is it fast enough to deal with code bases with 100k or more lines? I have read some comments stating that it is not very fast.

                                                                                            1. 5

                                                                                              In general, yes, it’s extremely fast. 100k lines is fine. Mercurial itself is almost 100k lines (ignoring tests, which adds more), and I’d classify that as small for what hg is used for by FB and Mozilla.

                                                                                              1. 5

                                                                                                The repo I work in, mozilla-central, has around 19 million lines I believe and it is very fast. I’m sure Facebook has a similar number if not more.

                                                                                                1. 5

                                                                                                  I work for Facebook.

                                                                                                  Facebook’s mercurial is faster than Git would be on the same repository, but that’s largely because of tools like watchman hooking in to try to make many operations operate in O(changes), instead of O(reposize). It’s still very slow for many things, especially when updating, rebasing, and so on.

                                                                                                  1. 3

                                                                                                    Your comment made me curious, so I ran cloc over my local copy of mozilla-central. By its count there are 18,613,213 lines of code in there at the moment; full breakdown here.

                                                                                                  2. 3

                                                                                                    Yes. For example, facebook’s internal repository that is hundreds of gigabytes is run on hg. For really huge repositories (much, much bigger than 100k lines) you can use some of the tricks they have for making things like “hg status” very fast for such a huge repository.

                                                                                                1. 8

                                                                                                  IIRC, Linux was kept in bk for a while before Linus got tired of it and wrote up git.

                                                                                                  How has BitKeeper progressed over time?

                                                                                                  What advantages does it have over git, bzr, darcs, Mercurial, etc.?

                                                                                                  1. 15

                                                                                                    Linux was in bk, under a no-cost closed-source license for the kernel devs. Bitkeeper prohibited attempts to clone/reverse engineer it. A dev reverse engineered it by telnetting to the server port and typing ‘help’. Bitkeeper revoked the license. Linus coded git in the next few weeks.

                                                                                                    1. 15

                                                                                                      Linus coded git in the next few weeks.

                                                                                                      Let’s not forget that hg was also released within a couple of weeks to replace bk.

                                                                                                      Writing a vcs within a few weeks isn’t a task that only Linus can do. ;-)

                                                                                                      http://lwn.net/Articles/151624/

                                                                                                      1. 8

                                                                                                        Just to add more details, Linux was happy using bk. He worked in the same office as Andrew Tridgell. Andrew didn’t use bk and hadn’t agreed to no EULA. Andrew begun to reverse engineer the bk protocol (by sniffing network traffic in his office iirc). Linus asked him to stop doing it. He refused. Linus was forced to write git (and called Andrew and ass iirc)

                                                                                                        1. 3

                                                                                                          Any source for this?

                                                                                                          1. 9

                                                                                                            This mostly lines up with stories I’ve heard from people that were present in the kernel community at the time, for what it’s worth. I’ve only ever gotten it as an oral history though, so I can’t really provide any concrete evidence beyond what JordiGH offers in terms of “search the LKML”.

                                                                                                            1. 3

                                                                                                              Most of the drama was public on mailing lists, but it’s kind of hard to find. Look at LKML around April 2005 and earlier.

                                                                                                                1. 1

                                                                                                                  It’s mostly from memory from reading Slashdot and osnews at the time. The parts I’m not 100% certain have iirc next to it.

                                                                                                            2. 4

                                                                                                              The website has a “Why?” page that tries to answer some of those questions.

                                                                                                              1. 11

                                                                                                                BK/Nested allows large monolithic repositories to be easily broken up into any number of sub-repositories.

                                                                                                                “I see you have a poorly structured monolith. Would you like me to convert it into a poorly structured set of micro services?” - Twitter

                                                                                                            1. 4

                                                                                                              How can the code “just happen to be owned by Google”?

                                                                                                              1. 8

                                                                                                                Author works at Google and is using his work computer to work on this project?

                                                                                                                1. 2

                                                                                                                  He wouldn’t necessarily have to be using his work computer :(

                                                                                                                  1. 1

                                                                                                                    Google claims ownership of work done on personal time with personal resources?

                                                                                                                    That’s incredibly shitty of them, if so.

                                                                                                                    1. 10

                                                                                                                      It’s being done on 20% time, from what I understand.

                                                                                                                      1. 4

                                                                                                                        There’s a process to get the company to formally disclaim ownership of things, but then you’re pretty heavily restricted in terms of when you can work on it. If you don’t care about ownership, just getting an OSS license on something is the simpler path by a wide margin.

                                                                                                                        1. 1

                                                                                                                          If it’s useless enough then the process is easy :-)

                                                                                                                        2. 1

                                                                                                                          Shitty, perhaps, but also not uncommon.

                                                                                                                          1. 2

                                                                                                                            Not uncommon, but I normally associate the practice with companies that don’t “get” Open Source, or why devs might pursue side-projects and what their personal IP means for their careers in general.

                                                                                                                            I wouldn’t normally associate those attitudes with Google. And since a lot of developers refuse to sign agreements signing personal IP over to their employer, I’m surprised to hear Google requires it, given how popular they have been among developers as a “good” employer.

                                                                                                                  1. 4

                                                                                                                    Is anyone using Mercurial instead of Git? I thought about switching to Mercurial once, but now it seems the project is slowly dying. Are there benefits?

                                                                                                                    1. 13

                                                                                                                      Mercurial development is not dead at all:

                                                                                                                      https://selenic.com/hg

                                                                                                                      http://news.gmane.org/gmane.comp.version-control.mercurial.devel

                                                                                                                      http://news.gmane.org/gmane.comp.version-control.mercurial.general

                                                                                                                      The userbase is dwindling, but the development, if anything, is speeding up.

                                                                                                                      1. 9

                                                                                                                        I use it almost exclusively for my personal projects.

                                                                                                                        1. 7

                                                                                                                          I’ve found that Mercurial’s plugin system lets you build any workflow you want straight into source control. I also don’t think Mercurial is dying off, just that Github has really pushed Git up and nobody has tried to do something similar for Mercurial.

                                                                                                                          1. 5

                                                                                                                            There’s a couple of people at bitbucket who care about really pushing the envelope with what Mercurial can do. Sean Farley is rolling out Evolve for select bitbucket beta-testers upon request.

                                                                                                                            1. 1

                                                                                                                              Any public information on this change?

                                                                                                                              1. 1

                                                                                                                                I don’t think so, no. Feel free to stop by the #bitbucket or #mercurial channels on freenode to ask questions.

                                                                                                                              2. 1

                                                                                                                                That’s good to hear. I use Mercurial on all my personal projects and strongly prefer it to Git, but reading the blog posts and announcements from Atlassian, it’s really felt like the development velocity there has much more been on the Git side of Bitbucket.

                                                                                                                            2. 5

                                                                                                                              I started using Mercurial for work, and have since grown to prefer it over Git. In large part because of it’s extensibility, but also ease of use. Mercurial makes more conceptual sense to me and is easy to figure out from the cli/help alone. I rarely ever find myself Googling how to do something.

                                                                                                                              I still like Git though, and it’s likely better for people who don’t like tinkering with their workflows.

                                                                                                                              1. 3

                                                                                                                                Lots of people, including some big names (e.g., Facebook). I find git’s merging more reliable, but prefer hg’s CLI. They both get the job done.

                                                                                                                                1. 8

                                                                                                                                  I’d love to know about cases where you find git’s merging to be more reliable. Samples would be awesome, so we can figure out what’s tripping you up.

                                                                                                                                  1. 4

                                                                                                                                    It’s a known issue.

                                                                                                                                    1. 3

                                                                                                                                      Sort of. It’s not a known issue that BidMerge (note that we’ve shipped BidMerge, which is an improvement over ConsensusMerge as a concept) produces worse results than Git. I really meant it when I said I’d appreciate examples, rather than handwaving. :)

                                                                                                                                      1. 2

                                                                                                                                        I was using hg pre-3.0 (via Kiln). The problem that BidMerge is intended to solve is the problem which gave us so much trouble. I can’t speak to how well BidMerge would have fixed that, as the company is no longer in business.

                                                                                                                                        1. 5

                                                                                                                                          Fair enough. It should be pretty well solved then. Thanks for responding!

                                                                                                                                2. 2

                                                                                                                                  It may well have technical advantages, but if you’re working on a project that other people will one day work on, I’d strongly urge you to use git. Being able to use a familiar tool will be far more valuable to other contributors. Look at e.g. Python, which chose mercurial years ago but has recently decided to migrate to git.

                                                                                                                                1. 6

                                                                                                                                  Given the size of the repository, it’s not clear that Git would be significantly better or different.

                                                                                                                                  In all the really big repos I’ve used, a limit gets hit and some wacky customizations are applied. The alternative being that you just have to put up with the sluggishness.

                                                                                                                                  1. 3

                                                                                                                                    Facebook actually hit git’s limit a while back and contributed patches, etc to Mercurial to work with it. Really interesting stuff. But, stemming from that observation and other experiences, I am a superfan of breaking up repos in DVCS systems. I maintain a mercurial extension to coordinate many repos in a friendlier fashion than hg subrepos (guestrepo!).

                                                                                                                                    I’m kind of persuaded that dvcs is a smell at a stereotypical company though, I think there’s room for an excellent central VCS out there.

                                                                                                                                    1. 2

                                                                                                                                      I think where we’re heading with Mercurial over the long term is a set of tools that makes doing centralized-model development painless with DVCS tools, while retaining most of the benefits (smaller patches, pushing several in a group, etc) of a DVCS workflow. I don’t think it’s a smell at all.

                                                                                                                                      As for splitting repositories, there are definitely cases where it makes sense, but there’s also a huge benefit to having everything be in one giant repository.

                                                                                                                                      (Disclaimer: I work on source control stuff for a big company, with a focus on Mercurial stuff whenever possible.)

                                                                                                                                    2. 1

                                                                                                                                      FWIW, I use git with mozilla-central and find it a much more pleasing experience than hg (which I still export to when pushing to shared remote repos). That said, it is also what I am more familiar with, although I did use hg exclusively for a year or so.

                                                                                                                                      I really enjoy having everything in the game repo for many reasons such as the lack of syncing overhead, but it does tend to push performance of version control.