1. 30
  1. 5

    I found many of these surprising. The os.path.join behavior feels like a potent footgun, regardless of the fact that the documentation is clear. If I thought it’d be accepted, I’d be tempted to offer a patch to throw a ValueError any time a leading separator is found after the first entry.

    And changing os.makedirs the way they did probably warrants a new name to force people using old code with new python to actually take a look at the documentation.

    The parse_qsl item feels like another surprising footgun.

    And the path traversal bug in tempfile.NamedTemporaryFile seems far too serious to have had a patch submitted well over 3 years ago still sitting un-applied. Mostly because that’s just the kind of badness that people who reach for this API are seeking to avoid.

    This list was really what it says on the tin. Now I need to go look at all my os.path.join calls and all my os.makedirs calls. Bleh.

    1. 1

      Please do! As you pointed out those are some surprising footguns and I also don‘t see any valid, non changeable use cases.

    2. 4

      However, when code is optimized, all assert statements are ignored.

      Oh my! Thanks for sharing, this is good to keep in mind.

      1. 1

        actually removing the assert is the only thing that optimisation does

        1. 4

          No, it also set the __debug__ variable to False.


      2. 1

        This is really excellent and should be required reading for Python programmers. I’m astonished by the the os.path.join and assert-stripping ones.

        I assume asserts aren’t extra expensive vs if-then-throw so it’s just so surprising to me that they get “optimized” out.

        1. 2

          It looks as if Python is exactly the same as C here. Asserts in C are in a macro that is defined away if NDEBUG is defined, which it is in release builds. The core idea is that asserts are not error handling. They are a debugging tool that lets you fail-stop if something has violated your assumptions and then you can fix the program. If a condition might occur in real-world use then you should properly handle it with something that reports and error to the caller and allows them to recover. If it can’t happen without a bug then you should fix the bugs in debug builds and then the tests are redundant in release builds. Whether this is a good idea is debatable. Note that, even though the conditional branch for the assert is fairly cheap, calculating the expression that you assert is true may be very expensive.

          The os.path.join thing feels like a special case of a language-agnostic security issue: If you are treating paths as strings at any point, then you need to be careful. They’re subject to parsing bugs (are you sure you parse them the same way as the OS?), resource-limit bugs (are you constructing a path that the kernel won’t handle?) and race conditions (is this prefix still the prefix it was when you validated it?). This is unavoidable on a lot of systems, though the *at system calls on POSIX systems mean that you can now provide a safer version, it’s just that most languages don’t. The C++ Filesystems TS is particularly embarrassing because it failed to learn from the prior 40 years of experience here.

          The tempfile thing is also part of this. The root cause here is treating paths as strings. The same is true for the zip slip attack. If the destination were not a string, but a capability to the directory into which things were being extracted, then this kind of attack couldn’t happen.