Way, way back when Django was still an internal project at a little newspaper in Kansas, its original “ORM” was actually a code generator – you’d define your model classes, then run a script that would output a set of modules containing all the code to query them, work with them, etc.
The original developers got talked out of that approach before it was open-sourced, but the ORM wasn’t actually fully rewritten. It continued to be a code generator, but instead of dropping a bunch of Python files on your filesystem, it generated the module objects and their contents in-memory, and then inserted them into sys.modules to make them importable.
This stuck around until Django 0.95, which rewrote the whole ORM (and that’s why Django 0.95 was called the “magic removal” release). Custom template tag libraries continued to hack import paths for a little while longer. The modern, post-0.95 Django ORM instead does things with metaclasses, though the metaclass itself does less than people generally expect.
I’m worried that a de-facto move away from dynamic stuff in the Python ecosystem, possibly motivated by those who use Python only because they have to, and just want to make it more like the C# or Java they are comfortable with, could leave us with the very worst of all worlds.
This resonates a lot with me.
I do use type hints in most of my new Python code, largely for documentation purposes. I don’t run a type checker or enforce any sort of “static typing”. Most of the non-documentation utility I find in type hints comes from runtime things: libraries like Pydantic which can automatically derive validation and serialization rules just from declaring a list of fields with type hints.
And as Luke has mentioned before, and as I’ve explained here on lobste.rs before, Python’s type hints as enforced by checkers really do represent a separate language with significantly different semantics (and I have a litany of other complaints about the Python type-hinting ecosystem that maybe I’ll write up one day). I understand a lot of people feel “forced” to write Python and don’t like dynamically-typed languages, but the solution to that is to get comfortable with dynamic typing, not to try to change Python until it looks like the statically-typed language you would have preferred.
I spend a good whole hour today trying to type check a class decorator, unsuccessfully, until I realized that, since I was already using a slight less strict version of –strict for mypy, I could just not add type hints at all and everything would still work and type check outside of it.
I like types, they are good, and help me catch bugs before even running things. But Python’s type system is really not all there once you start to even approach any level of dynamic shenanigans.
You can type callbacks, and do some cool shit with protocols, but metaclasses and user defined generics have been a world of pain, in my experience.
But Python’s type system is really not all there once you start to even approach any level of dynamic shenanigans.
I was going to complain that it isn’t even very good at some static typing shenanigans, but it seems like they might have added support for recursive types recently. I might have to try it out again.
Huh, I’d think that was the same as forward reference, which has been around for a while. Seems like it isn’t, or at least it’s a special case special enough to require specific handling.
It’s easy to just go about your day, cranking out perfectly fine and boring-looking python code to solve business needs, and forget just HOW MUCH WEIRDNESS you can do with Python.
Descriptors, metaclasses, class decorators, dunder methods, every single ORM, how unittests work, how pytest works, how freaking namedtuple is implemented. Under the hood of business logic, python can get pretty freaky.
I’ll preface by stating that I am probably in the “just want to make it more like the C# or Java they are comfortable with” camp (even though I’ve mostly used Python in my work): I just really like the (very limited) language that is “type hinting compatible Python”. (Proper support for kwargs would be nice though!)
I think the type checking and the metaprogramming magic approaches are not mutually exclusive though, and while I’ve not had much time to think about it, I feel that:
Metaprogramming magic is good for library code: then the magic can be hidden away behind a simple and documented interface (possibly type-hinted too!). I’d also say the tolerance for dark magic depends on how well established the library is: I have no issue including pytest in every Python project I use, but the PonyORM examples are a bit scarier: what happens to my codebase if the PonyORM project dies? Forking something so involved looks like a massive undertaking.
For application code, sticking to what can be type-checked seems sufficient, and will minimize surprises for future maintainers. This does not mean you can’t do any metaprogramming in an application’s codebase, just separate it from the business logic and glue code and clearly identify it as a lib.
And then there are the debugging tools, which IMO are the best places to go nuts on metraprogramming: I don’t need to understand them and don’t depend on them too much, what I want is maximum explorability at the point in time where I’m using them.
In other languages, if you want to mock out “all date/time access” or “all filesystem access”, you may end up with a lot of tedious and noisy code to pass these dependencies through layers of code, or complex automatic dependency injection frameworks to avoid that. In Python, those things are rarely necessary, precisely because of things like time-machine and pyfakefs — that is, because your entire program can be manipulated at run-time.
This, in its basic form of mock.patch, is one of the most underappreciated superpowers of Python, which keeps it free of dependency injection and thus actually readable.
I have very mixed feelings about patching VS dependency injection.
On the one hand yes, patching makes things a lot simpler.
On the other hand, if you want to test something that has date/time access:
Either you mock.patch the specific function your implementation is using and your test is very dependent on the exact implementation of you code (plus IIRC you can’t patch datetime.now directly because it’s defined in C).
Or you use a third party library that mocks all date/time access functions, your test is now less brittle but you’ve added yet another library to your codebase.
With dependency injection, one would “simply” define a TimeManager class, make all date/time access go through this class, and swap it out for a mock during tests. That’s a way bigger upfront cost, but maybe is simpler on the long term? (you would be screwed if using a library that accesses date/time by itself though)
I’ve used the latter (dep-injection) approach in Golang, only because there was no nice way to mock/patch datetime interactions.
Something like https://pypi.org/project/pytest-freezegun/ makes this so easy in Python I’d really question trying to write your own TimeManager class. You’d have to constantly remind developers to include if in new code paths (instead of just importing/using regular datetime).
It would definitely look out-of-place in Python, and in practice I’ve always used freezegun in projects where I’m not the only dev.
But there’s something about needing a library to do something as dumb as mocking out date/time access that feels a bit off to me. And that’s only for this specific use case: I end up needing a lib to mock out a lot of stuff (boto3 for AWS access has its own mocks for example), they all have their own way of doing things.
Again, I do recognize that this goes again Python conventions, I’m just not in complete agreement with the convention here and think the DI side of the trade-off could be explored a bit more.
Consider the fact that Python has had support for keyword arguments since as long as I can remember, and for keyword-only arguments since Python 3.0. But typing.Callable has zero support for them, meaning they can’t be typed in a higher-order context.
For what it’s worth, I do not do much Python those days but I think this is wrong. Callable does not support kwargs but you can use them in a higher-order context by using a Protocol implementing __call__.
This works even better when combined with PyPy. For the simple and common cases, under CPython 3.11 my benchmarks show a solution using fluent-compiler is about 15% faster than GNU gettext, while under PyPy it’s more than twice as fast. You should take these numbers with a pinch of salt, but I am confident that the result is not slow, despite having far more advanced capabilities than GNU gettext, which is not necessarily true for my first implementation.
How much faster or slower is just… not compiling? I found (while writing the first compile-to-python implementation of werkzeug’s URL builder, which generated artisanal bytecode) that it was pretty much impossible to measure a difference between compile-to-python and interpret-with-python under PyPy, because you’re doing work PyPy was going to try to do anyway.
Apart from compile-to-python, I think most of the techniques mentioned in this article are terrible things that nobody should ever do. This does include “being an ORM”; don’t @ me. I will also note that people manage to do a lot of these terrible things in Java, which is often not considered dynamically typed, and the prevalence of awful metaprogramming bullshit in Java seems to be the main reason everyone hates it.
How much faster or slower is just… not compiling? I found (while writing the first compile-to-python implementation of werkzeug’s URL builder, which generated artisanal bytecode) that it was pretty much impossible to measure a difference between compile-to-python and interpret-with-python under PyPy, because you’re doing work PyPy was going to try to do anyway.
Good question. In this case, my micro-benchmarks show my compiler implementation to be between 2-5x faster than the interpreter for typical cases, under PyPy. With CPython, it’s about about 6-10x faster than the interpreter.
I care more about CPython myself. I also do care about the fact that, whichever Python versions you are using, this speedup means the implementation goes from being quite significantly slower than the status quo (GNU gettext), to being closely comparable or faster in most cases.
These are admittedly microbenchmarks, but in a Django app (my use case), I know from experience that these things do add up. For example, response times can be limited and dominated by things like localization functions called from templates - because you can call them many times in a single template, as you do with translatable strings.
I care more about CPython myself. I also do care about the fact that, whichever Python versions you are using, this speedup means the implementation goes from being quite significantly slower than the status quo (GNU gettext), to being closely comparable or faster in most cases.
These are admittedly microbenchmarks, but in a Django app (my use case), I know from experience that these things do add up. For example, response times can be limited and dominated by things like localization functions called from templates - because you can call them many times in a single template, as you do with translatable strings.
For sure. The motivation for the werkzeug URL builder was that url_for calls were responsible for 10% of the CPU usage of an entire deployment. So I definitely don’t think it’s worthless—for library functions that are amenable and called pervasively, one could almost argue it’s a duty to consider it.
But that does raise another question: why is it so hard? Python’s dynamism doesn’t seem to help much here: a compiler is available, but its optimizer is quite limited, and using it is often as much work as generating bytecode directly.
That was my experience, too. In a Java project, once you needed to reach for java.lang.reflect, you knew you were in for a bad day, even to do things that were absolutely trivial in Python (like map a string-keyed dictionary onto an object with properties of the same names as the keys).
I feel like a machavellian puppetmaster who tricked everybody into showing me cool hyperprogramming stuff
Way, way back when Django was still an internal project at a little newspaper in Kansas, its original “ORM” was actually a code generator – you’d define your model classes, then run a script that would output a set of modules containing all the code to query them, work with them, etc.
The original developers got talked out of that approach before it was open-sourced, but the ORM wasn’t actually fully rewritten. It continued to be a code generator, but instead of dropping a bunch of Python files on your filesystem, it generated the module objects and their contents in-memory, and then inserted them into
sys.modules
to make them importable.This stuck around until Django 0.95, which rewrote the whole ORM (and that’s why Django 0.95 was called the “magic removal” release). Custom template tag libraries continued to hack import paths for a little while longer. The modern, post-0.95 Django ORM instead does things with metaclasses, though the metaclass itself does less than people generally expect.
This is a reply to https://lobste.rs/s/yog9fh/i_am_disappointed_by_dynamic_typing
This resonates a lot with me.
I do use type hints in most of my new Python code, largely for documentation purposes. I don’t run a type checker or enforce any sort of “static typing”. Most of the non-documentation utility I find in type hints comes from runtime things: libraries like Pydantic which can automatically derive validation and serialization rules just from declaring a list of fields with type hints.
And as Luke has mentioned before, and as I’ve explained here on lobste.rs before, Python’s type hints as enforced by checkers really do represent a separate language with significantly different semantics (and I have a litany of other complaints about the Python type-hinting ecosystem that maybe I’ll write up one day). I understand a lot of people feel “forced” to write Python and don’t like dynamically-typed languages, but the solution to that is to get comfortable with dynamic typing, not to try to change Python until it looks like the statically-typed language you would have preferred.
I spend a good whole hour today trying to type check a class decorator, unsuccessfully, until I realized that, since I was already using a slight less strict version of –strict for mypy, I could just not add type hints at all and everything would still work and type check outside of it.
I like types, they are good, and help me catch bugs before even running things. But Python’s type system is really not all there once you start to even approach any level of dynamic shenanigans.
You can type callbacks, and do some cool shit with protocols, but metaclasses and user defined generics have been a world of pain, in my experience.
I was going to complain that it isn’t even very good at some static typing shenanigans, but it seems like they might have added support for recursive types recently. I might have to try it out again.
Huh, I’d think that was the same as forward reference, which has been around for a while. Seems like it isn’t, or at least it’s a special case special enough to require specific handling.
It’s easy to just go about your day, cranking out perfectly fine and boring-looking python code to solve business needs, and forget just HOW MUCH WEIRDNESS you can do with Python.
Descriptors, metaclasses, class decorators, dunder methods, every single ORM, how unittests work, how pytest works, how freaking namedtuple is implemented. Under the hood of business logic, python can get pretty freaky.
Really interesting stuff, thanks!
I’ll preface by stating that I am probably in the “just want to make it more like the C# or Java they are comfortable with” camp (even though I’ve mostly used Python in my work): I just really like the (very limited) language that is “type hinting compatible Python”. (Proper support for kwargs would be nice though!)
I think the type checking and the metaprogramming magic approaches are not mutually exclusive though, and while I’ve not had much time to think about it, I feel that:
Metaprogramming magic is good for library code: then the magic can be hidden away behind a simple and documented interface (possibly type-hinted too!). I’d also say the tolerance for dark magic depends on how well established the library is: I have no issue including pytest in every Python project I use, but the PonyORM examples are a bit scarier: what happens to my codebase if the PonyORM project dies? Forking something so involved looks like a massive undertaking.
For application code, sticking to what can be type-checked seems sufficient, and will minimize surprises for future maintainers. This does not mean you can’t do any metaprogramming in an application’s codebase, just separate it from the business logic and glue code and clearly identify it as a lib.
And then there are the debugging tools, which IMO are the best places to go nuts on metraprogramming: I don’t need to understand them and don’t depend on them too much, what I want is maximum explorability at the point in time where I’m using them.
This, in its basic form of
mock.patch
, is one of the most underappreciated superpowers of Python, which keeps it free of dependency injection and thus actually readable.I have very mixed feelings about patching VS dependency injection.
On the one hand yes, patching makes things a lot simpler.
On the other hand, if you want to test something that has date/time access:
Either you
mock.patch
the specific function your implementation is using and your test is very dependent on the exact implementation of you code (plus IIRC you can’t patchdatetime.now
directly because it’s defined in C).Or you use a third party library that mocks all date/time access functions, your test is now less brittle but you’ve added yet another library to your codebase.
With dependency injection, one would “simply” define a
TimeManager
class, make all date/time access go through this class, and swap it out for a mock during tests. That’s a way bigger upfront cost, but maybe is simpler on the long term? (you would be screwed if using a library that accesses date/time by itself though)I’ve used the latter (dep-injection) approach in Golang, only because there was no nice way to mock/patch datetime interactions.
Something like https://pypi.org/project/pytest-freezegun/ makes this so easy in Python I’d really question trying to write your own
TimeManager
class. You’d have to constantly remind developers to include if in new code paths (instead of just importing/using regular datetime).It would definitely look out-of-place in Python, and in practice I’ve always used freezegun in projects where I’m not the only dev. But there’s something about needing a library to do something as dumb as mocking out date/time access that feels a bit off to me. And that’s only for this specific use case: I end up needing a lib to mock out a lot of stuff (boto3 for AWS access has its own mocks for example), they all have their own way of doing things.
Again, I do recognize that this goes again Python conventions, I’m just not in complete agreement with the convention here and think the DI side of the trade-off could be explored a bit more.
For what it’s worth, I do not do much Python those days but I think this is wrong. Callable does not support kwargs but you can use them in a higher-order context by using a Protocol implementing
__call__
.Thanks for the correction - I haven’t tried it but it does look like that would work. Mypy docs: https://mypy.readthedocs.io/en/stable/protocols.html#callback-protocols
I have one small note:
How much faster or slower is just… not compiling? I found (while writing the first compile-to-python implementation of werkzeug’s URL builder, which generated artisanal bytecode) that it was pretty much impossible to measure a difference between compile-to-python and interpret-with-python under PyPy, because you’re doing work PyPy was going to try to do anyway.
Apart from compile-to-python, I think most of the techniques mentioned in this article are terrible things that nobody should ever do. This does include “being an ORM”; don’t @ me. I will also note that people manage to do a lot of these terrible things in Java, which is often not considered dynamically typed, and the prevalence of awful metaprogramming bullshit in Java seems to be the main reason everyone hates it.
Good question. In this case, my micro-benchmarks show my compiler implementation to be between 2-5x faster than the interpreter for typical cases, under PyPy. With CPython, it’s about about 6-10x faster than the interpreter.
I care more about CPython myself. I also do care about the fact that, whichever Python versions you are using, this speedup means the implementation goes from being quite significantly slower than the status quo (GNU gettext), to being closely comparable or faster in most cases.
These are admittedly microbenchmarks, but in a Django app (my use case), I know from experience that these things do add up. For example, response times can be limited and dominated by things like localization functions called from templates - because you can call them many times in a single template, as you do with translatable strings.
For sure. The motivation for the werkzeug URL builder was that
url_for
calls were responsible for 10% of the CPU usage of an entire deployment. So I definitely don’t think it’s worthless—for library functions that are amenable and called pervasively, one could almost argue it’s a duty to consider it.But that does raise another question: why is it so hard? Python’s dynamism doesn’t seem to help much here: a compiler is available, but its optimizer is quite limited, and using it is often as much work as generating bytecode directly.
That’s because metaprogramming in Java stinks, like, a lot. At least that was my personal impression last time I tried it.
It’s much more natural in python.
That was my experience, too. In a Java project, once you needed to reach for java.lang.reflect, you knew you were in for a bad day, even to do things that were absolutely trivial in Python (like map a string-keyed dictionary onto an object with properties of the same names as the keys).
Great post! As a result, “Jupyter guy” is now my superhero alias…