0 refers to the first changeset in the repo, which is handy. There’s no easy way to refer to this in git
0 refers to the first changeset in the repo, which is handy. There’s no easy way to refer to this in git
Can there be multiple first changesets in a Mercurial repo? There can be in in git and I find it helpful. It happens if you merge two repos.
That’s possible in hg with “pull –force”, you’d end up with two changesets and are children of the null changeset. The second descendent of null would not have changset number 0 though.
Yep. You could use hg log -r root() to see all of the root revisions though.
hg log -r root()
A historical anecdote: I remember the sprint when mpm first demoed revsets. This was back probably 8 or so years ago. At the time, everyone in the room felt like it was a good way to avoid some gross option proliferation and hopefully have saner syntax than some of the equivalent git features. It demoed well, but we didn’t figure it’d be as valuable as it was. About two weeks later the consensus on IRC was that such functionality was clearly necessary, and we’ve been telling anybody that’ll listen about them ever sense. I really wish the Git community would come around on revsets, but so far they seem stuck in the “neat demo” phase of understanding revsets.
These days, I probably use revsets at least once a week to find a patch: often I’ll remember reviewing a change from a specific person that touched a given file, but that’s all I know, or I’ll remember a couple of words in a commit message and need to dig up the change that way.
0: mpm is the original author of Mercurial.
I’m sure this has come up but I can’t see any discussion about it - any chance at all of adding mercurial support? If no would you be open to someone else adding it? The key is to not encode too much git-specific stuff into the UI or backend which makes it impossible to support other VCS systems.
At the moment this is unlikely.
Unlikely that you’ll do it, or also unlikely that you’d accept patches from someone that ended up enthusiastic?
(Your roadmap for a review tool is extremely similar to what the hg community has discussed building for itself.)
Well, it’d be a lot of work, both to implement and to review the patches. It’s not too far gone yet… why don’t you start a conversation on sr.ht-discuss?
I’m going to be very curious if this actually goes anywhere. During the Bazaar retrospective, I remember Jelmer commenting that one specific feature of Bazaar—its ability to work transparently inside Git repositories—was a misfeature he regretted. I was a bit surprised by that at the time; Mercurial generally feels that it’d be great to have better interop with Git, and there have even been projects such as hgit (directly use the Mercurial UI on .git repos) and hg-git (use Mercurial to clone and work with Mercurial copies of remote Git repositories; this is also the track I took in Kiln Harmony) to try to achieve that.
(BTW, neither hgit nor hg-git are official Mercurial projects, but both were started by core Mercurial contributors, and the latter remains very actively maintained.)
I’m not personally convinced there is enough interest in Bazaar, or enough legacy Bazaar repositories in active use, to really justify maintaining Bazaar at this point, but I’m really unsure that there’s enough room in this space to launch an third island of DVCS into the existing landscape. The ability to use Bazaar to work with Git seemed like one of its few bright stars; I’m not sure how Breeze will get any initial traction at this point.
Nit: hg-git was actually started by a GitHub employee (he got to it a few weeks before our GSoC student was to work on the very same thing). In order to help the GSoC student, I made the code more pythonic and added some tests, and then ended up holding the bag for several years.
I’ve since given up maintainership of hg-git, because I never use it. I still want to try hgit again some day, but there’s many miles of core hg refactoring to go before it’s worth attempting.
That’s an impressively thorough proposal. I’m quite happy that Facebook is using Mercurial, it helps drive innovation and helps having an alternative to Git that works as scale. Now, I’m not sure how much performance they’ll get: many extensions are in Python, which is good thing for extensibility, so unless they port extensions to Rust (and lose the flexibility of Python), they won’t get that much of an improvement. Or did I miss something?
Extensions would still benefit from their primitives being faster. They appreciate that issues around FFI might arise and passing from Rust to Python and back in quick succession is definitely one of them.
Yeah, FFI speed is a concern, and ideally it’d be easier to implement an entire class in Rust and expose some methods to Python, because then it’d be easier to move some low-level parsers into Rust. I did a naive implementation of one of our parsers using nom and it was 100x (not a typo, one hundred times) faster than the C version, but the FFI overhead ended up making it not a win.
Out of curiosity, why is the Rust-Python FFI slower than C-Python FFI? I thought that Rust could generate C-callable symbols and call C directly. On that topic, I wrote a PEG parsing library in C with Python bindings and in production workloads 90% of the time is spend in FFI and object creation.
Well, with C, there’s always the possibility to write a python extension. This is not a generic FFI then.
Often, the issue there - and, reading python.h etc. at a glance - is that many interpreters allow direct manipulation of their memory structures (including creating and destroying objects). For that, they ship a lot of macros and definitions. You cannot use those in Rust directly. There’s two approaches to that: write a library that does exactly what python.h (on every version of the interpreter!) does and use that. Alternative: write a small C shim over your Rust code that does just the parts you need.
The big issue seemed to be in creating all the Python objects - I was returning a fairly large list of tuples, and the cpython extension could somewhat intelligently preallocate some things, whereas the Rust I think was having to be a bit dumber due to the wrappers in play.
As an addition to the point about primitives: there are cheap operations that are fast even in Python, there are expensive operations you run several times a day and there are rare operations where you need flexibility and mental-model-fit but can accept poor performance. Having better performance for frequent operations while keeping the flexibility for the long tail could be a win (depends on the effort required and usage patterns, of course).
Leiningen for Clojure once again defaults to the latest version.
Leiningen for Clojure once again defaults to the latest version.
Leiningen doesn’t default to any latest version as far as I know. Leiningen does
Versioning/pinning is not only about having an API-compliant library though, it’s also about being sure that you can build the exact same version of your program later on. Hyrum’s Law states that any code change may effectively be a breaking one for your consumers. For example:
Of course, pinning is not a panacea: We usually want to apply security issues and bugfixes immediately. But for the most part, there’s no way we can know a priori that new releases will be backwards compatible for our software or not. Pinning gives you the option to vet dependency updates and defer them if they require changes to your system.
1: Unless you use version ranges or dependencies that use them. But that happen so infrequently and is strongly advised against – I don’t think I’ve ever experienced it in the wild.
FYI, Hyrum finally made http://www.hyrumslaw.com/ with the full observation. Useful for linking. :)
Hmm, perhaps I misunderstood the doc I read. I’m having trouble finding it at the moment. I’m not a Clojure user. Could you point me at a good link? Do library users always have to provide some sort of version predicate for each dependency?
Your point about reproducing builds is a good one, but it can coexist with my proposal. Imagine a parallel universe where Bundler works just like it does here and maintains a Gemfile.lock recording precise versions in use for all dependencies, but we’ve just all been consistently including major version in gem names and not foisting incompatibilities on our users. Push security fixes and bugfixes, pull API changes.
Edit: based on other comments I think I’ve failed to articulate that I am concerned with the upgrade process rather than the deployment process. Version numbers in Gemfile.lock are totally fine. Version numbers in Gemfile are a smell.
Oh, yes, sorry for not being clear: I strongly agree that version “numbers” might as well be serial numbers, checksums or the timestamp it was deployed. And I think major versions should be in the library name itself, instead of in the version “number”.
In Leiningen, library users always have to provide some sort of version predicate for each dependency, see https://github.com/technomancy/leiningen/blob/master/doc/TUTORIAL.md#dependencies. There is some specific stuff related to snapshot versions and checkout dependencies, but if you try to build + deploy a project with those, you’ll get an error unless you setup some environment variable. This also applies to boot afaik ; the functionality is equivalent with how Java’s Maven works.
Thanks! I’ve added a correction to OP.
Hmm, I’ve been digging more into Leiningen, and growing increasingly confused. What’s the right way to say, “give me the latest 2.0 version of this library”? It seems horrible that the standard tutorial recommends using exact versions.
There’s no way to do that. The Maven/JVM dependency land always uses exact versions. This ensures stability.
Your two submissions make me think David A Wheeler’s summary of SCM security is still timely since the [D]VCS’s on average aren’t built with strong security in architecture or implementation. The only two I know that tried in architecture/design at least were Aegis and especially Shapiro et al’s OpenCM:
Both are defunct since they didn’t get popular. I think it would be beneficial for someone to apply the thinking in Wheeler’s summary and linked papers (esp on high-assurance) to modern DVCS to see what they have and don’t have. Plus the feasibility of strong implementation. I think my design in the past was just the standard mediating and logging proxy in front of a popular VCS with append-only logs of the code itself. A default for when you have nothing better.
I think that’s rather orthogonal. The problem is everybody implemented a “run more commands” feature which runs more commands. It’s not really about the integrity of the code in the repo.
In a sense, yes, if the repo was a read only artifact everything would be safer. But somehow we decided that repos need to be read/execute artifacts with embedded commands in them. Behold, the “smart” repo. Crypto signing that doesn’t make it safer.
I’ve seen the “dumb” source control tool - speed is a feature, and without a “smart” transport layer of some kind your push/pull or checkin/checkout times become pretty awful. Just compare CVS-via-pserver to Subversion, or tla to bzr.
The thing that’s surprising to me is that it took well over a decade for anyone to notice this problem, since it’s been present in Subversion all these years…
My takeaway is that argv parsing is too fragile to serve as an API contract. And I doubt very much this is the first and only bug of its kind.
If SSH transport had been implemented with calls to some SSH library instead of a fork+exec to an external ‘ssh’ program, this bug would not have happened as it did.
Oh, absolutely argv is too fragile. I’m surprised even considering that this bug survived so long.
This fucks bisect, defeating one of the biggest reasons version control provides value.
Furthermore, there are tools to easily take both approaches simultaneously. Just git merge —squash before you push, and all your work in progress diffs get smushed together into one final diff. And, for example, Phabricator even pulls down the revision (pull request equivalent) description, list of reviewers, tasks, etc, and uses that to create a squash commit of your current branch when you run arc land.
I’m surprised to hear so many people mention bisect. I’ve tried on a number of occasions to use git bisect and svn bisect before that, and I don’t think it actually helped me even once. Usually I run into the following problems:
I love the idea of git bisect but in practice it’s never been worth it for me.
Your second bullet point suggests to me bisect isn’t useful to you in part because you’re not taking good enough care of your history and have broken points in it.
I bisect things several times a month, and it routinely saves me hours when I do. By not keeping history clean as others have talked about, you ensure bisect is useless even for those developers who do find it useful. :(
Right: meaningful commit messages are important but a passing build for each commit is essential. A VCS has pretty limited value without that practice.
It does help that your commits be at clean points but isn’t really necessary - you don’t need to run your entire test suite. I usually will either bisect with a single spec or isolate the issue to a script that I can run against bisect. And as mentioned in other places you can just bisect manually.
You can run bisect in an entirely manual mode where git checks out the revision for you to tinker with and before marking the commit as good or bad.
There are places where it’s not so great, and there are places where it’s a life-saving tool. I work (okay, peripherally… mostly I watch people work) on the Perl 5 core. Language runtime, right? And compatibility is taken pretty seriously. We try not to break anyone’s running code unless we have a compelling reason for it and preferably they’ve been given two years' warning. Even if that code was written in 1994. And broken stuff is supposed to stay on branches, not go into master (which is actually named “blead”, but that’s another story. I think we might have been the ones who convinced github to allow a different default branch because having it fail to find “master” was kind of embarrassing).
So we have a pretty ideal situation, and it’s not surprising that there’s a good amount of tooling built up around it. If you see that some third-party module has started failing its test suite with the latest release, there’s a script that will build perl, install a given module and all of its dependencies, run all of their tests along the way, find a stable release where all of that did work, then bisect between there and HEAD to determine exactly what merge made it started failing. If you have a snippet of code and you want to see where it changed behavior, use bisect.pl -e. If you have a testcase that causes weird memory corruption, use bisect.pl --valgrind and it will tell you the first commit where perl, run with your sample code, causes valgrind to complain bitterly. I won’t say it works every time, but… maybe ¾ of the time? Enough to be very worth it.
No it doesn’t. Bisect doesn’t care what the commit message is. It does care that your commit works, but I don’t think the article is actually advocating checking in broken code (despite the title) - rather it’s advocating committing without regard to commit messages.
Just git merge —squash before you push, and all your work in progress diffs get smushed together into one final diff.
This, on the other hand, fucks bisect.
Do you know how bisect works? You are binary searching through your commit history, usually to find the exact commit that introduced a bug. The article advocates using a bunch of work in progress commits—very few of which will actually work because they’re work in progress—and then landing them all on the master branch. How exactly are you supposed to binary search through a ton of broken WIP commits to find a bug? 90% of your commits “have bugs” because they never worked to begin with, otherwise they wouldn’t be work in progress!
Squashing WIP commits when you land makes sure every commit on master is an atomic operation changing the code from one working state to another. Then when you bisect, you can actually find a test failure or other issue. Without squashing you’ll end up with a compilation failure or something from some jack off’s WIP commit. At least if you follow the author’s advice, that commit will say “fuck” or something equally useless, and whoever is bisecting can know to fire you and hire someone who knows what version control does.
Do you know how bisect works?
Does condescension help you feel better about yourself?
The article advocates using a bunch of work in progress commits—very few of which will actually work because they’re work in progress—and then landing them all on the master branch. How exactly are you supposed to binary search through a ton of broken WIP commits to find a bug? 90% of your commits “have bugs” because they never worked to begin with, otherwise they wouldn’t be work in progress!
I don’t read it that way. The article mainly advocates not worrying about commit messages, and also being willing to commit “experiments” that don’t pan out, particularly in the context of frontend design changes. That’s not the same as “not working” in the sense of e.g. not compiling.
It’s important that most commits be “working enough” that they won’t interfere with tracking down an orthogonal issue (which is what bisect is mostly for). In a compiled language that probably means they need to compile to a certain extent (perhaps with some workflow adjustments e.g. building with -fdefer-type-errors in your bisect script), but it doesn’t mean every test has to pass (you’ll presumably have a specific test in your bisect script, there’s no value in running all the tests every time).
Squashing WIP commits when you land makes sure every commit on master is an atomic operation changing the code from one working state to another.
Sure, but it also makes those changes much bigger. If your bisect ends up pointing to a 100-line diff then that’s not very helpful because you’ve still got to manually hunt through those changes to find the one that made the actual difference - at that point you’re not getting much benefit from having version control at all.
The page mentions git specifically as being vulnerable. While I’m sure that’s true, it seems highly impractical to attempt to move git away from SHA1. Am I wrong? Could you migrate away from SHA1?
[Edit: I forgot to add, Google generated two different files with the same SHA-1, but that’s dramatically easier than a preimage attack, which is what you’d need to actually attack either Git or Mercurial. Everything I said below still applies, but you’ve got time.]
So, first: in the case of both Mercurial and Git, you can GPG-sign commits, and that will definitely not be vulnerable to this attack. That said, since I think we can all agree that GPG signing every commit will drive us all insane, there’s another route that could work tolerably in practice.
Git commits are effectively stored as short text files. The first few lines of these are fixed, and that’s where the SHA-1 shows up. So no, the SHA-1 isn’t going anywhere. But it’s quite easy to add extra data to the commit, and Git clients that don’t know what to do will preserve it (after all, it’s part of the SHA-1 hash), but simply ignore it. (This is how Kiln Harmony managed to have round-trippable Mercurial/Git conversions under-the-hood.) So one possibility would be to shove SHA-256 signatures into the commits as a new field. Perfect, right?
Well, there are some issues here, but I believe they’re solvable. First, we’ve got a downgrade vector: intercept the push, strip out the SHA-256, replace it with your nefarious content that has a matching SHA-1, and it won’t even be obvious to older tools anything happened. Oops.
On top of that, many Git repos I’ve seen in practice do force pushes to repos often enough that most users are desensitized to them, and will happily simply rebase their code on top of the new head. So even if someone does push a SHA-256-signed commit, you can always force-push something that’ll have the exact same SHA-1, but omit the problematic SHA-256.
The good news is that while the Git file format is “standardized,” the wire format still remains a bastion of insanity and general madness, so I don’t see any reason it couldn’t be extended to require that all commits include the new SHA-256 field. I’m sure this approach also has its share of excitement, but it seems like it’d get you most of the way there.
(The Mercurial fix is superficially identical and practically a lot easier to pull off, if for no other reason than because Git file format changes effectively require libgit2/JGit/Git/etc. to all make the same change, whereas Mercurial just has to change Mercurial and chg clients will just pick stuff up.)
It’s also worth pointing out that in general, if your threat model includes a malicious engineer pushing a collision to your repo, you’re already hosed because they could have backdoored any other step between source and the binary you’re delivering to end-users. This is not a significant degradation of the git/hg storage layer.
(That said, I’ve spent a decent chunk of time today exploring blake2 as an option to move hg to, and it’s looking compelling.)
Edit: mpm just posted https://www.mercurial-scm.org/wiki/mpm/SHA1, which has more detail on this reasoning.
Plenty of people download OSS code over HTTPS, compile it and run the result. Those connections are typically made using command line tools that allow ancient versions of TLS and don’t have key pinning. Being able to transparently replace one of the files they get as a result is reasonably significant.
Right, but if your adversary is in a position that they could perform the object replacement as you’ve just described, you were already screwed. There were so many other (simpler!) ways they could own you it’s honestly not worth talking about a collision attack. That’s the entire point of both the linked wiki page and my comment.
That said, since I think we can all agree that GPG signing every commit will drive us all insane, there’s another route that could work tolerably in practice.
It is definitely a big pain to get gpg signing of commits configured perfectly, but now that I have it setup I always use it and so all my commits are signed. The only thing I have to do now is enter my passphrase the first time in a coding session that I commit.
Big pain? Add this to $HOME/.gitconfig and it works?
gpgsign = true
Getting gpg and gpg-agent configured properly and getting git to choose the right key in all cases even when sub keys are around were the hard parts.
That’s exactly what I did.
Sorry, to rephrase: mechanically signing commits isn’t a big deal (if we skip past all the excitement that comes with trying to get your GPG keys on any computer you need to make a commit on), but you now throw yourself into the web-of-trust issues that inevitably plague GPG. This is in turn the situation that Monotone, an effectively defunct DVCS that predates (and helped inspire) Git, tried to tackle, but it didn’t really succeed, in my opinion. It might be interesting to revisit this in the age of Keybase, though.
I thought GPG signing would alleviate security concerns around SHA1 collisions but after taking a look, it seems that Git only signs a commit object. This means that if you could make a collision of a tree object, then you could make it look like I signed that tree.
Is there a form of GPG signing in Git which verifies more than just the commit headers and tree hash?
You are now looking for a preimage collision, and the preimage collision has to be a fairly rigidly defined format, and has to somehow be sane enough that you don’t realize half the files all got altered. (Git trees, unlike commits, do not allow extra random data, so you can’t just jam a bunch of crap at the end of the tree to make the hash work out.) I’m not saying you can’t do this, but we’re now looking at SHA-1 attacks that are probably not happening for a very long time. I wouldn’t honestly worry too much about that right now.
That said, you can technically sign literally whatever in Git, so sure, you could sign individual trees (though I don’t know any Git client that would do anything meaningful with that information at the moment). Honestly, Git’s largely a free-for-all graph database at the end of the day; in the official Git repo, for example, there is a tag that points at a blob that is a GPG key, which gave me one hell of a headache when trying to figure out how to round-trip that through Mercurial.
Without gpg signing, you can get really bad repos in general. The old git horror story artile highlights these issues with really specific examples that are more tractable.
Though, I don’t want to start a discussion on how much it sucks to maintain private keys, so sorry for the sidetrack.
I don’t see why GPG-signed commits aren’t vulnerable. You can’t modify the commit body, but if you can get a collision on a file in the repo you can replace that file in-transit and nothing will notice.
Transparently replacing a single source code file definitely counts as ‘compromised’ in my book (although for this attack the file to be replaced would have to have a special prelude - a big but not impossible ask).
Here’s an early mailing list thread where this was brought up (in 2006). Linus’s opinion seemed to be:
Yeah, I don’t think this is at all critical, especially since git really
on a security level doesn’t depend on the hashes being cryptographically
secure. As I explained early on (ie over a year ago, back when the whole
design of git was being discussed), the security of git actually depends
on not cryptographic hashes, but simply on everybody being able to secure
their own private repository.
the security of git actually depends on not cryptographic hashes, but simply on everybody being able to secure their own private repository.
This is a major point that people keep ignoring. If you do one of the following:
then the argument that SHA3, or SHA256 should be used over SHA1 simply doesn’t matter.
And here’s the new thread after today’s announcement
(the first link in Joey Hess’s e-mail is broken, should be https://joeyh.name/blog/entry/sha-1/ )
sienote: i think they look a bit like the classic plan9 fonts :)
Plan 9 fonts were also designed by B&H. Plan uses Lucida Sans Unicode and Lucida Typewriter as default fonts. Lucida Sans Unicode, with some minor alterations was renamed as Lucida Grande, the original system font on OS X, replaced only recently by Helvetica Neue. It’s funny that several people say this reminds them of Plan 9, but not OS X :-).
However, these fonts are more similar to the Luxi family of fonts (also from B&H) than the Lucida family.
Personally, I am going to continue programming (in acme, of course) using Lucida Grande (yes, I use a proportional font for programming).
What do you like in acme, compared to other editors (vim, Emacs, Atom, Visual Studio Code, Sublime Text…)?
Executable text, mutable text (including in win terminal windows), mouse chording, and mouse support in general, structural regexp, integrates well with arbitrary Unix tools, tiled window management, no distracting fluff; no options, no settings, no configuration files, no syntax highlighting, no colors.
Acme is by far the most important tool I use. If it were to disappear from the face of the earth, the first thing I would do is reimplement acme. Luckily, it would not take me very long as acme has very few features to implement, it relies on abstractions, not features.
A good demo: http://research.swtch.com/acme
To expand on that, I think macOS uses San Francisco UI nowadays. Helvetica Neue didn’t last long.
Indeed. AFAIK Helvetica Neue was only used by macOS 10.10 - it was replaced with (Apple-designed) San Francisco in 10.11.
It’s funny that several people say this reminds them of Plan 9, but not OS X :-).
well, i’ve never really used os x ;)
I loved the classic Plan 9 pelm font. The enormous and curvaceous curly brackets are still a wonder.
How well does mecurial work with git servers? Does the hg-git bridge works properly in general? Do you use it in work? Most of my client related work is done on git and I don’t want to screw things.
I know some people use hg-git, but how well it works depends heavily on your code review workflow. It gets kind of thorny around the edges when you want to edit history.
I’ve been tinkering with other ideas to try and make it more pleasant, but nothing real has materialized.
This is a bit of an hg FAQ. Here are some responses from a recentish time when this was asked in HN:
In short: easier to use, has some powerful features that git doesn’t have, such as revsets, templating, tortoisehg, giant semi-centralised repos, and changeset evolution
Performance wise is it fast enough to deal with code bases with 100k or more lines? I have read some comments stating that it is not very fast.
In general, yes, it’s extremely fast. 100k lines is fine. Mercurial itself is almost 100k lines (ignoring tests, which adds more), and I’d classify that as small for what hg is used for by FB and Mozilla.
The repo I work in, mozilla-central, has around 19 million lines I believe and it is very fast. I’m sure Facebook has a similar number if not more.
I work for Facebook.
Facebook’s mercurial is faster than Git would be on the same repository, but that’s largely because of tools like watchman hooking in to try to make many operations operate in O(changes), instead of O(reposize). It’s still very slow for many things, especially when updating, rebasing, and so on.
Your comment made me curious, so I ran cloc over my local copy of mozilla-central. By its count there are 18,613,213 lines of code in there at the moment; full breakdown here.
Yes. For example, facebook’s internal repository that is hundreds of gigabytes is run on hg. For really huge repositories (much, much bigger than 100k lines) you can use some of the tricks they have for making things like “hg status” very fast for such a huge repository.
IIRC, Linux was kept in bk for a while before Linus got tired of it and wrote up git.
How has BitKeeper progressed over time?
What advantages does it have over git, bzr, darcs, Mercurial, etc.?
Linux was in bk, under a no-cost closed-source license for the kernel devs. Bitkeeper prohibited attempts to clone/reverse engineer it. A dev reverse engineered it by telnetting to the server port and typing ‘help’. Bitkeeper revoked the license. Linus coded git in the next few weeks.
Linus coded git in the next few weeks.
Let’s not forget that hg was also released within a couple of weeks to replace bk.
Writing a vcs within a few weeks isn’t a task that only Linus can do. ;-)
Just to add more details, Linux was happy using bk. He worked in the same office as Andrew Tridgell. Andrew didn’t use bk and hadn’t agreed to no EULA. Andrew begun to reverse engineer the bk protocol (by sniffing network traffic in his office iirc). Linus asked him to stop doing it. He refused. Linus was forced to write git (and called Andrew and ass iirc)
Any source for this?
This mostly lines up with stories I’ve heard from people that were present in the kernel community at the time, for what it’s worth. I’ve only ever gotten it as an oral history though, so I can’t really provide any concrete evidence beyond what JordiGH offers in terms of “search the LKML”.
Most of the drama was public on mailing lists, but it’s kind of hard to find. Look at LKML around April 2005 and earlier.
Here’s some of the blow back, https://web.archive.org/web/20060328061810/http://www.realworldtech.com/forums/index.cfm?action=detail&PostNum=3322&Thread=2&entryID=49312&roomID=11
It’s mostly from memory from reading Slashdot and osnews at the time. The parts I’m not 100% certain have iirc next to it.
The website has a “Why?” page that tries to answer some of those questions.
BK/Nested allows large monolithic repositories to be easily broken up into any number of sub-repositories.
“I see you have a poorly structured monolith. Would you like me to convert it into a poorly structured set of micro services?” - Twitter
How can the code “just happen to be owned by Google”?
Author works at Google and is using his work computer to work on this project?
He wouldn’t necessarily have to be using his work computer :(
Google claims ownership of work done on personal time with personal resources?
That’s incredibly shitty of them, if so.
It’s being done on 20% time, from what I understand.
There’s a process to get the company to formally disclaim ownership of things, but then you’re pretty heavily restricted in terms of when you can work on it. If you don’t care about ownership, just getting an OSS license on something is the simpler path by a wide margin.
If it’s useless enough then the process is easy :-)
Shitty, perhaps, but also not uncommon.
Not uncommon, but I normally associate the practice with companies that don’t “get” Open Source, or why devs might pursue side-projects and what their personal IP means for their careers in general.
I wouldn’t normally associate those attitudes with Google. And since a lot of developers refuse to sign agreements signing personal IP over to their employer, I’m surprised to hear Google requires it, given how popular they have been among developers as a “good” employer.
Is anyone using Mercurial instead of Git? I thought about switching to Mercurial once, but now it seems the project is slowly dying. Are there benefits?
Mercurial development is not dead at all:
The userbase is dwindling, but the development, if anything, is speeding up.
I use it almost exclusively for my personal projects.
I’ve found that Mercurial’s plugin system lets you build any workflow you want straight into source control. I also don’t think Mercurial is dying off, just that Github has really pushed Git up and nobody has tried to do something similar for Mercurial.
There’s a couple of people at bitbucket who care about really pushing the envelope with what Mercurial can do. Sean Farley is rolling out Evolve for select bitbucket beta-testers upon request.
Any public information on this change?
I don’t think so, no. Feel free to stop by the #bitbucket or #mercurial channels on freenode to ask questions.
That’s good to hear. I use Mercurial on all my personal projects and strongly prefer it to Git, but reading the blog posts and announcements from Atlassian, it’s really felt like the development velocity there has much more been on the Git side of Bitbucket.
I started using Mercurial for work, and have since grown to prefer it over Git. In large part because of it’s extensibility, but also ease of use. Mercurial makes more conceptual sense to me and is easy to figure out from the cli/help alone. I rarely ever find myself Googling how to do something.
I still like Git though, and it’s likely better for people who don’t like tinkering with their workflows.
Lots of people, including some big names (e.g., Facebook). I find git’s merging more reliable, but prefer hg’s CLI. They both get the job done.
I’d love to know about cases where you find git’s merging to be more reliable. Samples would be awesome, so we can figure out what’s tripping you up.
It’s a known issue.
Sort of. It’s not a known issue that BidMerge (note that we’ve shipped BidMerge, which is an improvement over ConsensusMerge as a concept) produces worse results than Git. I really meant it when I said I’d appreciate examples, rather than handwaving. :)
I was using hg pre-3.0 (via Kiln). The problem that BidMerge is intended to solve is the problem which gave us so much trouble. I can’t speak to how well BidMerge would have fixed that, as the company is no longer in business.
Fair enough. It should be pretty well solved then. Thanks for responding!
It may well have technical advantages, but if you’re working on a project that other people will one day work on, I’d strongly urge you to use git. Being able to use a familiar tool will be far more valuable to other contributors. Look at e.g. Python, which chose mercurial years ago but has recently decided to migrate to git.
Given the size of the repository, it’s not clear that Git would be significantly better or different.
In all the really big repos I’ve used, a limit gets hit and some wacky customizations are applied. The alternative being that you just have to put up with the sluggishness.
Facebook actually hit git’s limit a while back and contributed patches, etc to Mercurial to work with it. Really interesting stuff. But, stemming from that observation and other experiences, I am a superfan of breaking up repos in DVCS systems. I maintain a mercurial extension to coordinate many repos in a friendlier fashion than hg subrepos (guestrepo!).
I’m kind of persuaded that dvcs is a smell at a stereotypical company though, I think there’s room for an excellent central VCS out there.
I think where we’re heading with Mercurial over the long term is a set of tools that makes doing centralized-model development painless with DVCS tools, while retaining most of the benefits (smaller patches, pushing several in a group, etc) of a DVCS workflow. I don’t think it’s a smell at all.
As for splitting repositories, there are definitely cases where it makes sense, but there’s also a huge benefit to having everything be in one giant repository.
(Disclaimer: I work on source control stuff for a big company, with a focus on Mercurial stuff whenever possible.)
FWIW, I use git with mozilla-central and find it a much more pleasing experience than hg (which I still export to when pushing to shared remote repos). That said, it is also what I am more familiar with, although I did use hg exclusively for a year or so.
I really enjoy having everything in the game repo for many reasons such as the lack of syncing overhead, but it does tend to push performance of version control.
I’m interested in this but I’m hung up on the bespoke ‘Fair Source’ license. You mention that it is meant to be used as Fair Source ___ where blank is the number of users before you have to start paying. But I don’t see a user limit anywhere on the site. Without that limit specified can it be assumed to be infinite? It’s hard to sell new licenses inside an environment where the lawyers have already taken on the GPL vs LGPL vs BSD vs MIT vs APLv2 and drawn lines on what they want to risk litigation on.
Thanks for the question. The use limit (15) for self-hosted Sourcegraph is specified in the LICENSE file: https://src.sourcegraph.com/sourcegraph@master/.tree/LICENSE. Sourcegraph.com is free to use for everyone.
We worked with a well-known open-source lawyer to draft Fair Source. If you’re using Sourcegraph for a team of above 15 people (and paying us to do so), then it would be a standard commercial license. Fair Source enables us to make the source code publicly available and to let teams with fewer than 15 users try it out both free as in freedom and free as in beer.
So, I think you’d raise a lot fewer hackles if you totally reworded your elevator pitch on fair.io:
The Fair Source License functions just like an open-source license—up to a point. Once your organization hits the license’s specified user limit, you will pay a licensing fee to continue using the software.
It’s not at all open-source. You’re restricting my right to redistribute, I can’t meaningfully sell my modified version, etc. It’s shared source with free for N users. Maybe something like:
The Fair Source License grants everyone the ability to see the source code and makes the license free for a limited number of users. It attempts to offer some of the benefits of open source software while retaining the ability to profit from a codebase.
I have nothing wrong with trying to sell software - but when you (unintentionally or not) make it look like you’re being “open source” you’re going to have a bad time.
Excellent to know. I should’ve checked the source… :) I looked at your lawyer’s blog and they do seem to be legit which is a selling point. I would say that I wouldn’t have a hard time selling this to the powers that be.
That said I’d really like to see C/C++ support in srclib. Thanks for the response.
C/C++ is on the roadmap in the next couple of months. Shoot me an email at firstname.lastname@example.org and we can let you know when that’s ready :)
In the meantime, feel free to check out our code analysis library, which is completely open source: https://srclib.org
I get what they’re going for, but defining things as loosely as by “people using the code” is fairly meaningless. What if I have zero users because people don’t use a particular system? Can I then install it on 1000 servers or only 25? What if the code (on a different system) sends an email alert to 1000 people, are they all “users” or just the person that actually interacts with the system?
Also, can I split my company into 25 people groups to go around the 25 people limit?
Oddly this was sort of what I was thinking, have venn diagram groups of teams (which actually fits with Conway’s law) and just end run around the user limit w/ a containerized sourcegraph for each team.
This connects to another issue with new/unfamiliar licenses: the concern that the protections or guarantees they claim to provide are not actually ensured. The advantage of a familiar license is that the language has been vetted by a number of entities, and you can rest reasonably assured that it does what it claims. I do not know if this particular license has any problems with the assignment of rights or anything like that, but I do not have the same level of assuredness I would with a more usual license.
I think part of what journeysquid was trying to point out is that this license is worded so imprecisely that it’s impossible to know when you’ve violated the terms.
Maybe all of us should be a bit more loud mouthed and get ourselves heard?
Because I don’t want to be an hg island in an ocean of git, I keep talking about hg long after most of my audience has left the building.
What are the advantages of Mercurial over Git? I’ve never used Mercurial, but have seen a number of very enthusiastic users of it online. What about it inspires such a strong preference?
It’s a variety of things, but in broad strokes (for me, anyway):
A specific thing that looks minor, but went from “neat toy” to “how did I do work without this” in a matter of months: https://selenic.com/hg/help/revsets
Some of those are minor (terms are easy-ish to relearn), and some are major (evolution was a significant productivity improvement for me when I worked it into my workflow about 3 years ago, and we’re on a good path towards working out the lingering problems and shipping it more broadly).
Then there’s some architectural decisions in the codebase. It matters less to typical end-users, but it’s made it feasible for Facebook to do some really neat work around lazy-loading file content, and for some related work that I’ve been doing elsewhere.
Off the cuff, mercurial does the job right, whereas git is a sea of hackery.
Whenever I collaborate with a junior-medium experienced git user, they screw up - regularly, repeatedly, and often, badly. This does not happen nearly to the same extent in HG.
I still need to rewrite this response for the Mercurial FAQ, because you just asked a FAQ: