I can’t reproduce this. I’m running the exact same commands locally, and it doesn’t appear to run the code in collections.py. Is this windows specific behaviour or something?
In general I don’t find the idea that generating docs involves running arbitrary code surprising, build systems often involve running arbitrary code, and generating docs often involves running build systems. It being python, I wouldn’t even be that surprised if the whole thing was implemented via reflection. I was wondering if I could get the same behaviour with something like python3 -m http.server though, because that would be approaching surprising behaviour.
In general I don’t find the idea that generating docs involves running arbitrary code surprising,
I misunderstood this originally. The problem is not that it runs arbitrary code to generate the docs (if you’re importing a package, you’re letting the author of the that package [and, by extension, the docs for that package] run arbitrary code anyway, because Python does not have a capability model). The problem is that the docs tool reads files matching a specific name in the current directory and executes them. If someone manages a drive-by download exploit on your browser (I think Safari is still the only mainstream browser that requires you to confirm per site that you want to allow it to download files) then they can drop a file like this in your downloads directory, and if you run the docs command in your downloads directory then you’re owned. Fortunately, it only checks the current directory and not arbitrary parents (as the git vulnerability a month or two ago did), so dropping a file like this in ~/Downloads doesn’t exploit you if you look at docs in ~/Downloads/SomePythonPackage-1.2.3.4/)
Yeah, what this really boils down to is “if there’s a foo.py in the working directory directory and you run python foo.py or anything else that imports foo, you will get the working directory’s foo.py”.
Which is one of those deep tensions between making a thing discoverable/learnable (working directory being on the import path is huge for that) versus trying to lock it down as much as possible. And is getting into an area where it’s hard to really have the language stop you – Python could maybe refuse to run if it detects it’s being invoked in a directory matching common download/home dir names, or change import behavior silently, but now you get confusing inconsistency in how it works, and no amount of “are you sure you want to trust this directory?” popups will actually help the people who are most likely to need the help, since they’ll probably just click through those.
As some folks have noted, Python has a command-line flag that lets you explicitly decide to minimize the import path, which maybe is the way forward for some tutorials and other beginner/first-time materials. Or maybe it’s a thing that needs to be solved at the operating system level.
Unrelated: this is also why I and several other people strongly advocate for a code repository layout with a src/ directory top-level, and any modules/packages inside that directory. If the modules are top-level, it’s very easy to trick yourself into thinking your packaging process works because you’re likely running it from the root directory, which implicitly puts all that stuff on the import path. Using a src/ (or similar name) directory means you actually have to get the packaging right in order to successfully install/test.
The problem is that the docs tool reads files matching a specific name in the current directory and executes them.
I thought the issue was that running python3 -m foo would run foo.py (or similar, don’t know specifics off the top of my head) – that is, this is nothing to do with the docs tool itself. Am I mistaken?
Running python -m foo will run whatever foo module is found first, starting with the current working directory. The same is true of running python -m pydoc foo.
The specific “exploit” shown here is more like
Module foo imports standard library module collections
I manage to get a malicious file named collections.py into your current directory and convince you to run python -m pydoc foo
The import collections inside foo gets resolved to the current directory’s collections.py, so that’s the file that gets imported. If it has any import-time side effects, they execute.
It’s a bit convoluted to actually pull off, because generally you need to convince someone to run python from their downloads directory or something like that.
I tried it on void linux and mac os. With python3.11 on both systems. EDIT: Oops, python3.10.8 on mac, I checked the version number in a terminal with ssh open (but did fail to reproduce on the actual mac).
Your docker repro works for me, and installs python3.9. Maybe the behaviour has changed in more recent versions of python?
Continuing weirdness: I cannot reproduce with Debian’s python 3.9.2, but if I install 3.10.6 (using pyenv) I can finally reproduce what no_gravity is seeing. But I cannot reproduce it using Debian’s system python (which he seems able to do on Ubuntu).
I’m going to stop messing around with this now, but there seem to be other factors at work here.
# I installed python 3.10.6 using pyenv and made that the local python
telemachus(digitalocean) wtf$ python3 --version
Python 3.10.6
telemachus(digitalocean) wtf$ python3 -m http.server
P0wned
Could not import runpy module
# Now I've go back to the system python3
telemachus(digitalocean) wtf$ rm .python-version
telemachus(digitalocean) wtf$ python3 --version
Python 3.9.2
telemachus(digitalocean) wtf$ python3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
^C
Keyboard interrupt received, exiting.
I also cannot reproduce—not on macOS 12.6.1 with python 3.10.8 and not on Debian 11 with python 3.10.5. (My shell is bash on both systems, though I doubt that matters.)
I can finally reproduce this, but only with some pythons and in some cases. I don’t understand this at all. In any case, I’m glad to learn about the larger issue.
Yes, if you write the python command by hand and are aware of the issue, you can probably mitigate it.
The tricky thing is that the python call might be somewhere in a shellscript you use.
The issue actually came up when an irc user reported his computer goes bananas when he cds into a certain dir. Turned out he was using a tool that executes some Python onevery cd. In that dir, there was a “types.py”. And the shellscript ran some Python that imported “types”.
Fair enough, though (weirdly) I still cannot reproduce the original example or your example with http.server. It’s not that I disbelieve you, but I don’t understand why I cannot reproduce this.
I have tried Python installed by Debian (via apt), MacPorts, and pyenv (pyenv on macOS and on Debian). The results are not consistent. That is, sometimes, a pyenv-python ignores collections.py and other times a pyenv-python reads the local file and shows the vulnerability. So far, I have not been able to get a Python installed by Debian or MacPorts to read the local file and show the vulnerability.
If using an older version of Python (I still run py36-py38), an alternative solution is to add an extra import hook (via sys.meta_path, but an extra FileFinder in sys.path_hooks might also be enough) before the defaults to avoid this behavior.
Why does this happen? When running “import foo”, what python (approximately) does is the following:
Python looks through its sys.meta_path entries to see if any of them can handle importing foo (example if foo involves a dynamic shared object, you’d need an entry that calls dlopen).
how does the entry check if it can handle the import? It does a few checks (is foo compiled within the default libpython.so or is it frozen bytecode?) or sometimes just attempts the import outright with a try-except fallback
the lastsys.meta_path entry is usually the one that walks through the files and folders on your filesystem and this is where the whole PYTHONPATH/PYTHONSAFEPATH gimmick comes into play: Python checks the entries of sys.path (which is influenced by the env vars) one-by-one for foo.py, and the first entry in sys.path is usually "." i.e. the current directory. this is why when you have a types.py in your current directory and a script does import types, you get the local file instead of the types module from the stdlib.
Therefore to avoid this mistake we can add an entry at the start of sys.meta_path or sys.path_hooks that checks the “safe” locations first, before punting to the latter entries that use the local directory
To avoid the inverse of this mistake (ie I want to import my local foo.py but name clashes with stdlib), I try to import .foo or from . import foo but usually I just rename the local file :P
I can’t reproduce this. I’m running the exact same commands locally, and it doesn’t appear to run the code in collections.py. Is this windows specific behaviour or something?
In general I don’t find the idea that generating docs involves running arbitrary code surprising, build systems often involve running arbitrary code, and generating docs often involves running build systems. It being python, I wouldn’t even be that surprised if the whole thing was implemented via reflection. I was wondering if I could get the same behaviour with something like
python3 -m http.server
though, because that would be approaching surprising behaviour.I misunderstood this originally. The problem is not that it runs arbitrary code to generate the docs (if you’re importing a package, you’re letting the author of the that package [and, by extension, the docs for that package] run arbitrary code anyway, because Python does not have a capability model). The problem is that the docs tool reads files matching a specific name in the current directory and executes them. If someone manages a drive-by download exploit on your browser (I think Safari is still the only mainstream browser that requires you to confirm per site that you want to allow it to download files) then they can drop a file like this in your downloads directory, and if you run the docs command in your downloads directory then you’re owned. Fortunately, it only checks the current directory and not arbitrary parents (as the git vulnerability a month or two ago did), so dropping a file like this in ~/Downloads doesn’t exploit you if you look at docs in ~/Downloads/SomePythonPackage-1.2.3.4/)
Yeah, what this really boils down to is “if there’s a
foo.py
in the working directory directory and you runpython foo.py
or anything else that importsfoo
, you will get the working directory’sfoo.py
”.Which is one of those deep tensions between making a thing discoverable/learnable (working directory being on the import path is huge for that) versus trying to lock it down as much as possible. And is getting into an area where it’s hard to really have the language stop you – Python could maybe refuse to run if it detects it’s being invoked in a directory matching common download/home dir names, or change import behavior silently, but now you get confusing inconsistency in how it works, and no amount of “are you sure you want to trust this directory?” popups will actually help the people who are most likely to need the help, since they’ll probably just click through those.
As some folks have noted, Python has a command-line flag that lets you explicitly decide to minimize the import path, which maybe is the way forward for some tutorials and other beginner/first-time materials. Or maybe it’s a thing that needs to be solved at the operating system level.
Unrelated: this is also why I and several other people strongly advocate for a code repository layout with a
src/
directory top-level, and any modules/packages inside that directory. If the modules are top-level, it’s very easy to trick yourself into thinking your packaging process works because you’re likely running it from the root directory, which implicitly puts all that stuff on the import path. Using asrc/
(or similar name) directory means you actually have to get the packaging right in order to successfully install/test.I thought the issue was that running
python3 -m foo
would runfoo.py
(or similar, don’t know specifics off the top of my head) – that is, this is nothing to do with the docs tool itself. Am I mistaken?Running
python -m foo
will run whateverfoo
module is found first, starting with the current working directory. The same is true of runningpython -m pydoc foo
.The specific “exploit” shown here is more like
foo
imports standard library modulecollections
collections.py
into your current directory and convince you to runpython -m pydoc foo
import collections
insidefoo
gets resolved to the current directory’scollections.py
, so that’s the file that gets imported. If it has any import-time side effects, they execute.It’s a bit convoluted to actually pull off, because generally you need to convince someone to run
python
from their downloads directory or something like that.It works for me on Debian 11 and Ubuntu 22. And it also works for “python3 -m http.server”.
Which OS do you use? Can you try running it in a Debian container?
When I do, it also works for me:
I tried it on void linux and mac os. With python3.11 on both systems. EDIT: Oops, python3.10.8 on mac, I checked the version number in a terminal with ssh open (but did fail to reproduce on the actual mac).
Your docker repro works for me, and installs python3.9. Maybe the behaviour has changed in more recent versions of python?
Continuing weirdness: I cannot reproduce with Debian’s python 3.9.2, but if I install 3.10.6 (using pyenv) I can finally reproduce what no_gravity is seeing. But I cannot reproduce it using Debian’s system python (which he seems able to do on Ubuntu).
I’m going to stop messing around with this now, but there seem to be other factors at work here.
[Comment removed by author]
I also cannot reproduce—not on macOS 12.6.1 with python 3.10.8 and not on Debian 11 with python 3.10.5. (My shell is bash on both systems, though I doubt that matters.)
I wonder what other variables are at play.
It works for me with Python 3.10.6:
I can finally reproduce this, but only with some pythons and in some cases. I don’t understand this at all. In any case, I’m glad to learn about the larger issue.
Python 3.11 adds an interpreter flag (
-P
) and an environment variable (PYTHONSAFEPATH
) that you can use to prevent this behavior.Yes, if you write the python command by hand and are aware of the issue, you can probably mitigate it.
The tricky thing is that the python call might be somewhere in a shellscript you use.
The issue actually came up when an irc user reported his computer goes bananas when he cds into a certain dir. Turned out he was using a tool that executes some Python onevery cd. In that dir, there was a “types.py”. And the shellscript ran some Python that imported “types”.
Fair enough, though (weirdly) I still cannot reproduce the original example or your example with
http.server
. It’s not that I disbelieve you, but I don’t understand why I cannot reproduce this.How did you install Python?
I have tried Python installed by Debian (via
apt
), MacPorts, and pyenv (pyenv on macOS and on Debian). The results are not consistent. That is, sometimes, a pyenv-python ignorescollections.py
and other times a pyenv-python reads the local file and shows the vulnerability. So far, I have not been able to get a Python installed by Debian or MacPorts to read the local file and show the vulnerability.If using an older version of Python (I still run py36-py38), an alternative solution is to add an extra import hook (via
sys.meta_path
, but an extraFileFinder
insys.path_hooks
might also be enough) before the defaults to avoid this behavior.I’ve stubbed my toe on this “feature” of Python’s import system enough times to write a custom import hook: https://github.com/ahgamut/cosmopolitan/blob/importer-cosmo/third_party/python/Lib/importlib/_bootstrap.py#L1089
Why does this happen? When running “import foo”, what python (approximately) does is the following:
sys.meta_path
entries to see if any of them can handle importingfoo
(example iffoo
involves a dynamic shared object, you’d need an entry that callsdlopen
).foo
compiled within the defaultlibpython.so
or is it frozen bytecode?) or sometimes just attempts the import outright with atry-except
fallbacksys.meta_path
entry is usually the one that walks through the files and folders on your filesystem and this is where the wholePYTHONPATH
/PYTHONSAFEPATH
gimmick comes into play: Python checks the entries ofsys.path
(which is influenced by the env vars) one-by-one forfoo.py
, and the first entry insys.path
is usually"."
i.e. the current directory. this is why when you have atypes.py
in your current directory and a script doesimport types
, you get the local file instead of thetypes
module from the stdlib.Therefore to avoid this mistake we can add an entry at the start of
sys.meta_path
orsys.path_hooks
that checks the “safe” locations first, before punting to the latter entries that use the local directoryTo avoid the inverse of this mistake (ie I want to import my local
foo.py
but name clashes with stdlib), I try toimport .foo
orfrom . import foo
but usually I just rename the local file :PThe older
-I
flag is a bit more comprehensive and was updated to imply-P
.I’m guessing someone has “.” in their PYTHONPATH.
This is a feature…