1. 4
  1. 6

    If it wasn’t Initial commit I was going to call shenanigans.

    1. 0

      lolz

    2. 3

      You can restrict the commits searched to initial commits with array_length(parent). Playing with that column, apparently some commits manage to have dozens of parents. I don’t even know how you manage that.

      select lower(trim(message)) as message, count(*)
      from bigquery-public-data.github_repos.commits
      where (array_length(parent)=0)
      group by message
      order by count(*) desc
      limit 100;
      

      First few values:

      message                                                            f0_
      initial commit                                                     1957292
      first commit                                                       151151
      init                                                               39375
      initial commit.                                                    36600
      initial                                                            17894
      initial import                                                     14737
      create readme.md                                                   11510
      init commit                                                        9692
      update license.md                                                  6606
      first                                                              6034
      first commit.                                                      5688
      initial version                                                    5325
      create license.md                                                  3968
                                                                         3908
      inital commit                                                      3854
      initial import.                                                    3459
      create gh-pages branch via github                                  3371
      initial release                                                    3348
      initial checkin                                                    3194
      initial commit to add default .gitignore and .gitattribute files.  2967
      initial revision                                                   2676
      :boom::camel: added .gitattributes & .gitignore files              2200
      :neckbeard: added .gitattributes & .gitignore files                2198
      first version                                                      2193
      :octocat: added .gitattributes & .gitignore files                  2159
      :space_invader: added .gitattributes & .gitignore files            2154
      :confetti_ball: added .gitattributes & .gitignore files            2154
      init project                                                       2150
      :tada: added .gitattributes & .gitignore files                     2139
      :circus_tent: added .gitattributes & .gitignore files              2134
      :lollipop: added .gitattributes & .gitignore files                 2079
      
      1. 3

        apparently some commits manage to have dozens of parents. I don’t even know how you manage that.

        This is called an “octopus merge” if you want to search for more information. You may also be interested in this article about octopus merges in the Linux kernel

        1. 2

          FYI I just updated my article to include the array_length(parent)=0 filter. Thanks for the input!

          1. 2

            You’re welcome :)

            Why did you do this? AND LENGTH(TRIM(LOWER(message))) > 0 Surely the empty commit message is still a valid commit message?

            1. 1

              Hehe yes it was to filter out the empty commits. You’re technically right, but I was more interested in the actual text in the initial commit messages, although like you mention it is worthy to note that empty messages are up there. However, it’s less clear that the empty commit messages are actually initial commit messages since they could come from detached head states with no parents.

              Also, in case you’re curious, I just wrote a post using a similar method to try and answer What % Of Git Commit Messages Use The Imperative Mood?.

          2. 2

            Haha just reading this - and it’s funny you mention this because I realized the same way of identifying initial commits in a discussion on the Reddit thread yesterday. I think this way is certainly more accurate than the method I used (just looking through the top counts and picking ones that looked like initial commit messages). I will most likely update my post to reflect this method.