Today we’re publishing another Libraries.io open data release with over 311 million rows of metadata about open source projects and the network of dependency data that connects them all.
This project, like others I’ve found, doesn’t seem handle to Python dependencies properly. For example, Flask 0.12 is listed as having no dependencies, but actually it depends on Werkzeug, Jinja2, click, and itsdangerous. [1]
I think this is because these sites are grabbing package JSON data from Pypi, but many (most?) packages don’t declare their dependencies there. As far as I know the only way to accurately resolve dependencies for Python packages is
Grab the Wheel (if there is one) and inspect the metadata.json file, otherwise
Download the source distribution and actually install the package. This is very often necessary, as many projects (or older releases) don’t have a Wheel.
I’ve been working on a project that actually has the correct dependency graph for Python libraries. I’ve had to follow the approach above. It’s not quite ready to show the world, but I’m hoping it’ll be interesting for people.
Yeah python dependencies are not easily machine readable, and for the moment I’m trying to avoid executing setup.py files downloaded from the internet on the Libraries.io servers, any help contributing better python support would be great.
I’m not particularly familiar with the python world, but this sounds like the perfect use-case for containers, no?
Note I said containers not “docker”. I believe what you want is a quick “spin up $distro, install $package, analyse installed deps” flow, which imo would suit lxc/lxd perfectly.
Yup. The place I work at (shameless plug: https://fossa.io) does this using ephemeral Docker containers.
We scan projects to check if they’re compliant with the licenses of their open source libraries. To do this, we need to compute the dependency graph of a project. For most build systems (the exceptions are usually NPM and Golang tools), this means running a full build to execute any arbitrary build scripts.
If you only do static analysis of package manifests, you tend to overreport and underreport – you’ll miss packages brought in by build scripts, and you’ll have extra packages (or extra versions of packages) that are included in the manifest but might be unused/optimised out by the build system/brought in by version constraint solver weirdness.
This project, like others I’ve found, doesn’t seem handle to Python dependencies properly. For example,
Flask0.12 is listed as having no dependencies, but actually it depends onWerkzeug,Jinja2,click, anditsdangerous. [1]I think this is because these sites are grabbing package JSON data from Pypi, but many (most?) packages don’t declare their dependencies there. As far as I know the only way to accurately resolve dependencies for Python packages is
I’ve been working on a project that actually has the correct dependency graph for Python libraries. I’ve had to follow the approach above. It’s not quite ready to show the world, but I’m hoping it’ll be interesting for people.
[1] https://libraries.io/pypi/Flask/0.12
Yeah python dependencies are not easily machine readable, and for the moment I’m trying to avoid executing setup.py files downloaded from the internet on the Libraries.io servers, any help contributing better python support would be great.
That’s wise - I’ve seen all sorts of shenanigans in those files. Just importing some of them causes attempted
sudooperations.I’m not particularly familiar with the python world, but this sounds like the perfect use-case for containers, no?
Note I said containers not “docker”. I believe what you want is a quick “spin up $distro, install $package, analyse installed deps” flow, which imo would suit lxc/lxd perfectly.
I’ve hacked something similar together over here: https://github.com/librariesio/pydeps
You’re right - I solved this issue by using docker
Yup. The place I work at (shameless plug: https://fossa.io) does this using ephemeral Docker containers.
We scan projects to check if they’re compliant with the licenses of their open source libraries. To do this, we need to compute the dependency graph of a project. For most build systems (the exceptions are usually NPM and Golang tools), this means running a full build to execute any arbitrary build scripts.
If you only do static analysis of package manifests, you tend to overreport and underreport – you’ll miss packages brought in by build scripts, and you’ll have extra packages (or extra versions of packages) that are included in the manifest but might be unused/optimised out by the build system/brought in by version constraint solver weirdness.