1. 26
    1. 14

      It’s fun to see how different languages handle the same patterns. My fav Clojure relies on plain maps as it’s core data structure and explicitly rejects paragraphs like this:

      Not only do dicts allow you to change their data, but they also allow you to change the very structure of objects. You can add or delete fields or change their types at will. Resorting to this is the worst felony you can commit to your data.

      1. 7

        I haven’t coded that much Clojure (although I keep tabs on it), but as someone who jumps back and forth between Python and JavaScript I’ve noticed the same distinction.

        I think the reason dicts are not used as much in Python as objects in JavaScript and maps in Clojure is that dicts feel like second-class citizens; first of all, dict access is awkward. Compare foo[“bar”] with eg foo.bar or (:bar foo)—both JS and Clojure have affordances for literal key access which are missing in Python.

        Secondly, tooling support (autocompletion etc) has historically been very poor for dicts. This has sort of changed with the introduction of types dictionaries, but then you have to wrestle with the awkward dict syntax again.

        Ultimately I think this often is to Python’s detriment—for example, translating Python classes to a wire protocol often involves quite a lot of boilerplate. Luckily there are libraries like Pydantic that provide good solutions to this problem, but it’s still not as seamless as eg serializing Clojure maps.

        1. 6

          It’s funny you mention JS because if I needed an arbitrary key-value map I would not just use literal keys, but rather a Map. Consider what happens if you try to use a key named ‘toString’, or ‘hasOwnProperty’.

          I also don’t think you should just be shoving your in-memory representation over the wire unless it’s extremely simple, especially in situations where you might need to change it and your client and your server might get out of sync in terms of versions.

        2. 4

          for example, translating Python classes to a wire protocol often involves quite a lot of boilerplate. Luckily there are libraries like Pydantic that provide good solutions to this problem, but it’s still not as seamless as eg serializing Clojure maps.

          I guess I don’t really get this one.

          Pydantic is the hot new kid on the block, sure, but if you’re building networked services this stuff is table stakes and has been for years and years. If you use Django there’s DRF serializers. If you don’t use Django there’s Marshmallow. In both cases the tooling can auto-derive serialization and deserialization and at least basic type-related validation from whatever single class is the source of truth about your data’s shape, whether it’s an ORM (Django or SQLAlchemy) model, or a dataclass or whatever.

          So I literally cannot remember the last time I had to write “quite a bit of boilerplate” for this. Maybe if I were one of the people I see occasionally who insist they’ll never ever use a third-party framework or library? But that seems like a problem with the “never use third-party”, not with the language or the ecosystem.

      2. 5

        At the same time, I understand Clojure has this inclination toward keys that aren’t just any old string, but are namespaced and meaningful. I wonder if Clojure programs at a certain complexity would still translate from wire format maps to domain model maps?

        1. 3

          At the same time, I understand Clojure has this inclination toward keys that aren’t just any old string, but are namespaced and meaningful.

          And even with that, sometimes it can be very hard to get your bearings when you’re jumping in at a random piece of code. Figuring out what the map might look like that “should” go into a certain function can be very difficult.

          I wonder if Clojure programs at a certain complexity would still translate from wire format maps to domain model maps?

          I work on a complex codebase and we use a Malli derivative to keep the wire format stable and version the API. The internal Malli model is translated to JSON automatically through this spec, and it also ensures that incoming data is well-formed. It’s all rather messy and I’m not sure if I wouldn’t prefer manual code for this because Malli is quite heavy-handed and its metaprogramming facilities are hard to use and badly documented.

      3. 5

        The advice in the article is wrong for Python as well. Dicts are not opaque, it’s wrapping them in bespoke custom classes that makes data opaque. I should probably blog about it, because there’s much more I want to say than fits in a comment.

        1. 10

          Dicts are not opaque, it’s wrapping them in bespoke custom classes that makes data opaque.

          Dicts aren’t opaque in the sense of encapsulation, but they’re opaque in the sense of making it harder on the developer trying to figure out what’s going on.

          If I’m working with a statically-typed codebase (via mypy), I can search for all instances of a given type. I can also look for all accesses of a given field on that type. It’s not possible to usefully do this with a dict, since you’re using dicts everywhere. You also can’t say “field X has type Y, field Z has type Q” unless you use TypedDict and then at that point you don’t gain anything from not using a real class.

          Similarly, I can look at the definition for the class and see its fields, methods, and docstrings. You can’t do that with a dict.

          I’ve been working with a codebase at $WORK that used dicts everywhere and it was a huge pain in the ass. I’ve been converting them to dataclasses as I go and it’s a lot more convenient.

          1. 1

            You might be interested in TypedDict (also described in PEP-585) and the additions to TypedDict in PEP-655.

            1. 1

              I’ve used TypedDict as a transitional measure while doing the dicts-to-dataclasses thing; it was definitely super helpful there.

            2. 1

              I’m not sure why TypedDict exists. You may as well opt for a dataclass or pydantic. Maybe it’s useful for typing **kwargs?

              1. 2

                The primary idea being that if a dictionary is useful in a given circumstance, then a dictionary with type assertions is often even more useful. The motivation section of the PEP expands on that a little.

              2. 1

                it exists to help add type checking support to code that needs to pass dicts around for whatever reason (e.g. interop with legacy libraries)

        2. 2

          Please post the article here if you do write it, cuz I don’t know much about Python best practices.

          1. 1

            I will. Although I don’t have any official claim at those practices being officially “best”. I only know they work best for me :-)

    2. 7

      namedtuples?!

      Inherently immutable.

      Fail on missing extra kwargs when initialized:

      NamedTupleAB = namedtuple(‘NamedTupleAB’, ‘a b’) data_dict = {‘a’: 1, ‘b’: 2, ‘c’: 3}

      ab_tuple = NamedTupleAB(**data_dict)

      Faster compared to dataclass frozen.

      1. 4

        Orrr, dataclasses. Where you also get type checking.

        from dataclasses import dataclass
        
        @dataclass(frozen=True)
        class NamedTupleAB:
            a: int
            b: int
        
        ab_tuple = NamedTupleAB(a=1, b=2)
        
        1. 1

          It is slower compared to namedtuple.

          1. 1

            Why are you using Python if you care about the minor performance difference between NamedTuple and dataclass?

        2. 1

          Surprise! you DON’T get type checking with dataclasses! (py3.10)

          @dataclass(frozen=True)
          class MyDataClass:
              a: int
              b: int
           
          mdc = MyDataClass(a='what', b='tf')
          
          print(mdc)  # -> MyDataClass(a='what', b='tf')
          
          1. 3

            If you use competent tooling, you certainly do.

          2. 1

            in the python world “type checking” typically refers to static type checking via some third party tool like mypy or pytype, not run time checking. there are libraries that enforce type annotations at runtime as well via validators (I think pydantic does this), but dataclasses do not check field values at runtime.

            if you run pytype over your code snippet it will catch and report the error.

    3. 5

      What’s wrong with dicts?

      I think the points author made are acceptable trade offs for faster development speed.

      It does not matter that refactoring a function is hard when each iteration takes a lot less time vs trying to create a concrete OOP structure around things.

      So pick the tight tradeoffs that meet your business needs. Dont refactor for the fear of ‘things will be hard in the future’, because YAGNI

      1. 5

        It does not matter that refactoring a function is hard when each iteration takes a lot less time vs trying to create a concrete OOP structure around things.

        I’ve never once seen a codebase — edit: maintained by more than one person — where untyped kwargs were net beneficial to product velocity vs. typed arguments.

        1. 3

          And I’ve seen (and worked on) several like that. Anecdote vs. anecdote?

          P.S. Those code bases also had 100% test coverage (or close to it), which I’m also hearing is nigh impossible and is too much effort.

          1. 3

            Yeah, tests are a tool to reduce risk, and it’s rare that 100% test coverage delivers benefits commensurate with its costs.

        2. 2

          I have seen millions/billion dollars companies being built with little to no typing: perl, python, ruby etc…

          Yes adding types helps making the code base more maintainable, but thats not my point. My point is that you would want to tradeoff that maintainability against velocity depending on your business needs:

          • Big enterprises code base should have type as their business needs might aim toward reliability and maintenance cost started to add up.
          • But in a startup environment, where you have to build the airplane while flying it, time-to-market is key. Loose typing enables quicker iterations for these small companies to achieve their focus goals.

          This is why we see many companies opt into a gradual typing approach as a transition bridge once their code base hit a certain size. I.e. Facebook’s Pyre or Stripe’s Sorbet are some good examples. But as always, it’s a tradeoffs that should be decided based on the business context.

          1. 1

            I have seen millions/billion dollars companies being built with little to no typing: perl, python, ruby etc…

            Me too!

            Yes adding types helps making the code base more maintainable, but thats not my point. My point is that you would want to tradeoff that maintainability against velocity depending on your business needs:

            My claim — which is definitely arguable — is that the inflection point where lack-of-typing switches from helping velocity to hurting velocity, is (a) when the codebase has more than 1 developer, or (b) when the codebase grows beyond O(1k) SLoC, whichever comes first. By which I mean: a lot sooner than you might think.

            1. 2

              I think that claim is definitely arguable and varies depending on context.

      2. [Comment removed by author]

      3. 3

        You don’t have to do everything all OOP to use dataclasses or pydantic or whatever. You just write out the types for each field.

    4. 3

      There is a pattern I enjoy for side-effecting code, where I pass a dict through the functions being called. In so doing, the side-effecting code essentially becomes a dict-in-dict-out prospect, and I can inject replacements for side-effecting functions.

      Then I just prepare a map in my tests and pass it in, and compare to the map I get out. It’s all very tidy.

    5. 1

      I always thought that Python 5 was a joke, but now I’m not so sure… https://www.youtube.com/watch?v=BvECNQRrjCY

      I definitely feel like OP would be ready for Haskell, this post just screams “Maybe using types, and immutable data as the default is a good idea”.

      1. 1

        The problem with Haskell is we’re usually stuck with the tools in-use at our jobs.

        1. 1

          Yeah, we’re all still using punchcards, because no one ever tried something new.

          This is a non-reason to me, there are plenty of good reasons to not choose a language, but “we aren’t using it” is an awful one. Someone decided to use C one day in their company, another decided to use C++, then Java, then Scala, then Swift, then… Haskell is nothing special, and being a compiled language has some advantages that you don’t need to carry around something like the JVM too.

          1. 2

            Haskell has a reputation. There’s a reason it isn’t in widespread use.