The math is a bit strange:
So this can’t all be code, right? Less than 2 lines per file and maybe 285 files per commit?
The full quote “The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google’s entire 18-year existence. The repository contains 86TBa of data, including approximately two billion lines of code in nine million unique source files.”.
So actually about 222 lines per unique source file, and perhaps about 4 commits per file.
The linked article goes on to say that the over 1 billion number comes from the inclusion of “source files copied into release branches, files that are deleted at the latest revision, configuration files, documentation, and supporting data files”
I remember reading an article that talked about how there are a lot of automated processes also committing things to their repo besides developers, so if that’s true I assume many of the files are artifacts of some kind.
Plus there are probably plenty of non-code files like images as well.