1. 7

    I just wanted to say that this article was well-written. I had been meaning to learn about Pijul and this was a good starting point.

    Pedantic: I think there is a small typo where there should be a line-break before the pijul record command to separate it from pijul ls.

    1. 4

      Thx very much! It was very fun to learn about Pijul and work on the article, so glad it was useful. The typo should be fixed as well ;)

    1. 2

      On the topic of #5 (running commands w/o exiting vim), this is certainly a way to do it, but in my experience, it’s a lot easier to Ctrl+Z and background the vim job and do my stuff there. That said, this command is super useful when combined with :r, :r!ls will run ls, and then paste the contents into your file. It’s a nice way of pulling data into somewhere where you can screw around with it, a little bit quicker than dropping to the CL and dumping the content into a file where you can edit it.

      Otherwise great tips!

      1. 1

        Thank you and good points! When I first started using the command line I remember learnings about backgrounding tasks but for whatever reason, it didn’t become a long term habit for me. I can’t really explain why though. Nice point about adding the command ouput into the file to play around with!

      1. 2

        Some of these I didn’t know, so it got my upvote.

        1. 1

          Glad to hear it :D

        1. 3

          Surprised they didn’t include bat, one of my favourite “modern” tools.

          1. 1

            I just installed this after some recommendations in the Reddit comments and so far it seems really cool!!

            1. 2

              If you like it, check out other tools by the author - fd, hyperfine, hexyl, etc

              https://github.com/sharkdp?tab=repositories

          1. 3

            What about performing sentiment analysis on all those commit logs?

            1. 2

              Haha I’d think most commit messages would be a little too dry/robotic to get a good sentiment reading on? What do you think?

              1. 2

                I’m reminded of once reading about a team that set up their computers to take a webcam snapshot of the developer’s expression (face) when a git conflict occurred.

                1. 1

                  Well… I was bored the other day which led me to just posting this… https://lobste.rs/s/0zxoap/suggested_improvements_for_tool

                1. 3

                  Another place where the use of imperative mood makes sense is in option descriptions that you get in --help output for commands. It’s often the shortest form, not only in English but also many other languages. And other forms can be clumsier with regard to whether the command, the option or the human is doing the action. It allows you to leave out the subject of the sentence but still have a valid sentence.

                  1. 2

                    Yes. And this reminds me of a comment from Reddit:

                    “You wouldn’t name a function madeWidget or addedBarToFoo. You’d say make or add. Because you’re not describing what you did. You’re describing what the thing you made does. A commit is a thing you made. Every time someone applies it, it’s going to do something”

                  1. 5

                    My top requirement for a commit message is that I be able to understand it a month from now.

                    1. 2

                      Hehe yes that is certainly an important requirement!

                    1. 3

                      Given the free-for-all nature on most Github projects I’m actually very surprised the number you came up with is as high as it is. I would have expected much worse (I’m a strong supporter of following this guideline). If anything there are a lot you might be missing since projects that do have style requirements for commits often also have prefixes (such as Conventional Commits. The most likely projects to stick to mood rules are also the most likely to not make it into these numbers — which just makes the stat even more impressive. Only 45% to go!

                      1. 1

                        Agreed. I touched on the prefixes in my article - I think there is a good chance that they are the biggest source of error in my calculation and would lead to 44% being a low estimate. I may try to add in a regex to account for many of the common prefix formats.

                      1. 10

                        title should say “GitHub.” the precision is worth three characters.

                        1. 2

                          Fair point.

                        1. 4

                          How many have more than 1 line of commit message?

                          1. 4

                            From my testing, I did find that there were about 2 million commit messages with only a single present-tense imperative verb, like “Update”, or “Commit”. Also, I did notice a lot of commit messages with multiple lines while testing stuff out, but can’t give you an exact amount at the moment.

                            I would answer your question by writing/running a new query, but I racked up about $500 of charges playing with Google BigQuery yesterday, so trying to get that sorted before I run anything else lol…

                            1. 5

                              Those aren’t necessarily imperative verbs - “update” and “commit” are both perfectly good English deverbal nouns.

                              1. 2

                                Haha very true. I guess at that point it comes down to the intent of the developer, which seems impossible to gauge. Imagining a frustrated developer at his wit’s end commanding his computer to “UPDATE!” or “COMMIT!” is amusing though…

                          1. 3

                            You can restrict the commits searched to initial commits with array_length(parent). Playing with that column, apparently some commits manage to have dozens of parents. I don’t even know how you manage that.

                            select lower(trim(message)) as message, count(*)
                            from bigquery-public-data.github_repos.commits
                            where (array_length(parent)=0)
                            group by message
                            order by count(*) desc
                            limit 100;
                            

                            First few values:

                            message                                                            f0_
                            initial commit                                                     1957292
                            first commit                                                       151151
                            init                                                               39375
                            initial commit.                                                    36600
                            initial                                                            17894
                            initial import                                                     14737
                            create readme.md                                                   11510
                            init commit                                                        9692
                            update license.md                                                  6606
                            first                                                              6034
                            first commit.                                                      5688
                            initial version                                                    5325
                            create license.md                                                  3968
                                                                                               3908
                            inital commit                                                      3854
                            initial import.                                                    3459
                            create gh-pages branch via github                                  3371
                            initial release                                                    3348
                            initial checkin                                                    3194
                            initial commit to add default .gitignore and .gitattribute files.  2967
                            initial revision                                                   2676
                            :boom::camel: added .gitattributes & .gitignore files              2200
                            :neckbeard: added .gitattributes & .gitignore files                2198
                            first version                                                      2193
                            :octocat: added .gitattributes & .gitignore files                  2159
                            :space_invader: added .gitattributes & .gitignore files            2154
                            :confetti_ball: added .gitattributes & .gitignore files            2154
                            init project                                                       2150
                            :tada: added .gitattributes & .gitignore files                     2139
                            :circus_tent: added .gitattributes & .gitignore files              2134
                            :lollipop: added .gitattributes & .gitignore files                 2079
                            
                            1. 3

                              apparently some commits manage to have dozens of parents. I don’t even know how you manage that.

                              This is called an “octopus merge” if you want to search for more information. You may also be interested in this article about octopus merges in the Linux kernel

                              1. 2

                                FYI I just updated my article to include the array_length(parent)=0 filter. Thanks for the input!

                                1. 2

                                  You’re welcome :)

                                  Why did you do this? AND LENGTH(TRIM(LOWER(message))) > 0 Surely the empty commit message is still a valid commit message?

                                  1. 1

                                    Hehe yes it was to filter out the empty commits. You’re technically right, but I was more interested in the actual text in the initial commit messages, although like you mention it is worthy to note that empty messages are up there. However, it’s less clear that the empty commit messages are actually initial commit messages since they could come from detached head states with no parents.

                                    Also, in case you’re curious, I just wrote a post using a similar method to try and answer What % Of Git Commit Messages Use The Imperative Mood?.

                                2. 2

                                  Haha just reading this - and it’s funny you mention this because I realized the same way of identifying initial commits in a discussion on the Reddit thread yesterday. I think this way is certainly more accurate than the method I used (just looking through the top counts and picking ones that looked like initial commit messages). I will most likely update my post to reflect this method.

                                1. 6

                                  If it wasn’t Initial commit I was going to call shenanigans.

                                  1. 0

                                    lolz

                                  1. 2

                                    Very good article! Are you going to talk about PlasticSCM? I’ve used it in the past because the company is from my city in Spain.

                                    1. 1

                                      Thank you! Glad you enjoyed it. I actually haven’t heard of PlasticSCM but I will look into it!

                                    1. 8

                                      Nice article. I cut my teeth on Apple’s internal Projector system; dunno when it was introduced, but it was already there when I started in 1991, and was used until the late 90s when everything was gradually imported into this “CVS” tool the new NeXT overlords brought with them. Projector was CVS-like. What I remember most was its terrible merging: it originally had no 3-way merge, so all competing changes had to be merged by hand! My AppleScript co-worker Wm. Cook got frustrated enough to write a 3-way merge script (in MPW shell) which eventually became a standard part of the workflow though it was never integrated into the tool itself.

                                      When I was briefly at Sun in 1997-98 I was introduced to a system they’d built atop SCCS, that was distributed in the same sense as modern 3rd generation tools. I thought it was genius the way you could have your own repo on your local machine, and how commits could be successively pushed into dev/integration/build servers. Back at Apple I told my co-workers about this awesome idea, but resigned myself to CVS and then SVN. So I was very happy when the 3rd-gen systems like Monotone, Darcs and Mercurial started to appear in the wild in the early 00s.

                                      1. 4

                                        Very cool! Glad you enjoyed it! I find it very interesting to hear about the internal solutions that companies devise to fill the tooling gaps in their workflows.

                                        Luckily CVS is before my time so I never had to experience the pain of a brutal merge with it on a real project (and it sounds like Projector was even more painful). I just set up some test projects with CVS, played with it, peeked until the hood, and consulted my buddy Teknikal_Domain to get a feel for how it works.

                                        Despite the negative sentiment (generally frustration) that the older tools tend to evoke, I was surprised to find that most of the features of most of the tools seem to work very well (at least on my trivial test projects).

                                        I had the luxury of starting my VCS learning with Git once it was already a well-formed project. I think newer developers tend to discount (or be completely unaware of) the impact that the older tools had on this field. My eyes were opened by the influence (including direct integration and extension) that the “legacy systems” have on the newer tools. Expressing that view became a (initially unplanned) purpose of the article.

                                        1. 2

                                          I wonder, just out of curiosity: Do you think there’s any merit to the older systems like CVS and SVN? Is there anything more that we could learn from them, or for the most part are they obsolete and their secrets exhausted.

                                          1. 5

                                            Overall, I don’t think so. I would never go back!

                                            I remember a few times thinking an innovation in a new system was a bad idea, but changed my mind after using it. Like, I didn’t like how in SVN a file didn’t have a consecutively-numbered history anymore; and the “staging area” feature of git seemed needlessly over complicated. I learned better.

                                            I don’t find the pre-3rd gen systems interesting, because they’re not distributed. What I’m fascinated by is propagating content (documents, discussions, databases…) across a decentralized network, because I think that’s the future of the open Internet. Modern VCSs are obviously good at that, but unfortunately they’re optimized for source code (directory hierarchies of smallish line-based text files), and the amount of metadata they keep is excessive for many use cases where it’s not critical to keep a full revision history.

                                            1. 3

                                              What I’m fascinated by is propagating content (documents, discussions, databases…) across a decentralized network

                                              May I suggest you take a look at IPFS if you haven’t already? That sounds like something you’d enjoy playing around with.

                                            2. 2

                                              I sometimes use first generation version control, such as RCS for files I only work on (configuration files, html documents, etc.). The main benefit over git is that I can have multiple independently versioned files in one directory.

                                              Emacs has a great interface for interacting with it (vc), so I don’t have to bother with the specific commands, but still can easily browse the history, create blames and add changes.

                                              1. 4

                                                A lot of my servers have RCS guarding config files with strict locking disabled, just so I always have the ability to undo to a clean copy, and once something works I can actually describe the change out-of-band instead of making my config 80% comments as to what piece does what, why, the history of it all…

                                                If you’re going to do that, fun tip, create an RCS/ directory, and RCS will put all it’s ,v files in there instead of in the same directory, makes things cleaner. Especially when I have 20 different files of VCL (Varnish) and then 20 other files that are the exact same to my tired eyes.

                                                1. 2

                                                  I just use Ansible to manage those config files, with the Ansible playbooks and templates in a Mercurial repo. In my experience it works far better because you have everything for a system in one place.

                                                  1. 3

                                                    I looked at Ansible and Chef (and still can’t really decide which one I think is “better”), but in the end, my, err… organically-grown network is just a bit too much of an unorganized mess to properly set that up. I decided that next time I tear it down and do a full rebuild (which will be done one day), then I’ll start with those tools from the ground up instead of trying to mash them into an already existing system that really doesn’t want to change.

                                          1. 4

                                            For “internals” I find it quite superficial. There is nothing about Daarcs scalability problems which Pijul claims to fix.

                                            1. 6

                                              Me-ow! Maybe you can get a refund?

                                              I found the level of detail about right for a high-level historical overview. I’m sure I could dig up more information if I want… for instance this “interleaved deltas” thing SCCS used sounds fascinating.

                                              1. 4

                                                It is, from a conceptual standpoint. Unlike the successive (or “reverse”) deltas of RCS, SCCS can construct any revision in about the same amount of time because it’s only dependent on the size of the history file. Since RCS successively deltas revisions, the father back in history you go, the more you have to “un-delta” to extract a revision, and the longer that takes to perform.

                                                I know Wikipedia has an article on the interleaved deltas, if you want a starting point..

                                                1. 2

                                                  Lol and thanks! Yes there is always a balance to strike between depth/breadth/accessibility/length. I tried to appeal to the technical side of the uber-nerd without shunning the interested novice, and added a pinch of historical context.

                                                2. 3

                                                  Appreciate the feedback. I will look into that Darcs item you mentioned and consider adding a note about it. If you have any other suggestions I’d love to hear them.

                                                1. 4

                                                  What is the purpose of this post, other than simply pasting the original header file? Even the comments were already in the original header file. Why wouldn’t you simply link to the original header file, if there is nothing to be added?

                                                  For those who can read C, the original git commit (which is actually extremely simple): https://github.com/git/git/tree/e83c5163316f89bfbde7d9ab23ca2e25604af290

                                                  1. 6

                                                    The purpose of the post is to get people curious about looking under the hood at Git’s code (which was very interesting to me). It is also to help clarify an aspect of how Git’s code works - the header file. The original header file does have comments, but they tend to be on the technical side. I expanded on those to help (esp newer folks) better understand how it works and to provide some context for the structures that are created.

                                                    As you mention, another approach would have been to write out my comments as points in the article itself and link back to the original version. But it helped me learn to go through and document the code in this way (esp in the context of the rest of the codebase), so I decided to make that available.

                                                    1. 5

                                                      My apologies, I was too quick with my judgement. You actually did add comments in the source code. Well done, they felt like they were there already.

                                                      1. 2

                                                        No problem, and thank you :)

                                                  1. 4

                                                    I wrote this a long time back about the evolution of version control systems. I think it is a bit more broader than this one, while perhaps not as deep.

                                                    1. 2

                                                      Just read it. This is an excellent article - thanks for sharing. The descriptions are extremely clear and well written. I think yours gets a little more into how the design principles of each RCS affect the usage, whereas mine touches a little more on how/where the revision data is stored and what it looks like sitting on the filesystem.

                                                      1. 2

                                                        Thanks! much appreciated.

                                                    1. 5

                                                      In this article, we provided a technical comparison of some historically relevant version control systems. If you have any questions or comments, feel free to reach out to jacob@initialcommit.io.

                                                      No Fossil. sniff

                                                      I reached out (since I was feeling open – ahem! – free)

                                                      1. 6

                                                        Hey appreciate you reaching out @lettucehead! I poked around on the Fossil website and the integrations look pretty sweet. I installed it and will hopefully get some time to play around with it this week. Hopefully will add a section into the blog post in the near future.

                                                        p.s. I’m just learning the ropes here, but I submitted a hat request as the creator of the Initial Commit site.

                                                        1. 1

                                                          nice!!!