One of the maintainers of Libraries.io here :wave:
Today we’ve released around 200 million lines of open source metadata in csv form that we’ve been indexing over the past two years. This includes dependency information for 25 million open source repositories from GitHub, GitLab and Bitbucket, mapping out a huge dependency graph for open source software.
The actual csv download is available on Zenodo: https://zenodo.org/record/808273
Further documentation on the site itself: https://libraries.io/data
All we need now is for somebody to analyze the data ;-)
They’re starting too: https://www.slideshare.net/tommens/towards-laws-of-software-ecosystem-evolution-an-empirical-comparison-of-seven-software-packaging-ecosystems
Have you considered putting the dataset on Kaggle?
Looks like they have a max upload size limit of 500mb of uncompressed csv, the Libraries.io release is ~25GB uncompressed!
25 million seems a bit tail heavy, to the extent its not representative of the software people use.