1. 69
  1.  

  2. 18

    If you want to write code this way, why would you even choose to use Python at all? I wouldn’t use typing for any of these examples. It’s overkill. Just write some tests and spend your energy on something more important.

    1. 21

      Python has a large and rich ecosystem, and is a popular choice for starting new projects. New projects often eventually grow into large projects, where switching languages would require a significant amount of rewrite. Today, when you just wrote it, remembering whether find_item returns None or throws a KeyError may be easy to keep in your head, but as a system grows and expands (especially to FAANG-scale python codebases), or must be worked on by more than one engineer, strict typing is a must. Almost all Python code at my company is strictly typed, and we have millions and millions of lines of it. It truly makes things much easier in the long run, in exchange for typing out twelve extra characters before you push your commit.

      1. 4

        As a counterpoint, I’ve built a 1MLOC+ codebase from zero in a “FAANG-scale” monorepo. More than half of it was Python without typing. We had no issues static typing would have solved. The arguments in favor of static typing seem worth it on the surface, but in practice you aren’t gonna need it unless it’s critical for performance gains (like Cython or Numba).

        1. 24

          FWIW, I’ve had the exact opposite experience in the same situation (untyped Python “FAANG-scale” monorepo with millions of lines of code). I think static typing would have helped understand what code meant.

          At one point, I wanted to modify a function that took an argument called image. But what attributes does that image have? It was passed by function after function, and so I spent a bunch of time tracing up the callstack until I found a place where the image value was generated. From there I looked at the attributes that that image had, and went on my merry way…except that the function that I wanted to modify was using duck typing, and there were multiple image types, each with different sets of attributes.

          If static typing was in use, it’d be obvious what attributes I could access on the image. Less because of compile-time/static checking, but more because it would make it easier to figure out what functions were doing. Yes, this is a problem of documentation, but if I want functions to be documented with what attributes should be passed into them, we might as well do it in a computer-understandable way, so that the computer can check our work.

          1. 4

            In my experience this is due to expecting to work in dynamic codebases the same as static. In a dynamic codebase I’d put a debugger & enter the repl & see what does the image actually has. This may seems roundabout, but in practice you see more than you can with types because not only can I see the properties, I can play with them & test out calling methods on it with the relevant information.

            1. 8

              Except that duck typing means that what the image has could change from run to run, and if you need to access something outside the existing (assumed) interface, you might make a bad assumption.

              1. 2

                This isnt a direct answer, but i think both these question miss the bigger picture of what we want to build towards:

                Can we have both a dynamic&inspectable runtime with development tools to catch bugs during development with minimal effort. Type hints are a halfway solution, the future i dream of is a fully inferred static analysis system (idk if types are enough) that can understand the most dynamic code of python/ruby/js/clojure and let us know of potential problems we’ll encounter? Current gradual type systems are too weak in this regard, they don’t understand the flow of code, only “a may have x properties”.

                For example:

                a = {x: 10} useX(a)

                a = {y: 20} useY(a)

                Looking at this code, its clear this won’t crash, yet our type systems fail to understand shapes over time. The best we can currently do is {x: number} | {y: number}, which requires a check at each location.

                Can we imagine a future where our tools don’t prescribe solutions, but trust you to do what you want & only point out what will fail.

                All this being said, this may be bad code, but bad code that works is better than bad code that doesn’t work. This could also enable possibilities we cant dream of.

                And then what we traditionally call “pair programming compiler” ala elm, can be lint rules.

                1. 4

                  I mean, Typescript does have an understanding of the execution of code, where it can do type narrowing, and it allows you to write your own type predicates for narrowing types down.

                  Shapes over time is definitely a thing typescript can do, though it can take a bit of convincing.

          2. 8

            More than half of it was Python without typing. We had no issues static typing would have solved.

            Every time I’ve had this discussion with someone it turns out they had tons of issues that static typing would have solved, they just didn’t realize it, or they were paying massive costs elsewhere like velocity or testing.

            That said, Python’s type system is kind of shit so I get not liking it.

            1. 5

              We had no issues static typing would have solved

              How do you know this?

              I work with a large progressively-typed codebase myself, and we frequently find thousands of latent bugs when we introduce new/tighter types or checks.

              1. 1

                We had good test coverage, leveraging techniques like mutation testing (link1 link2). The other half of the codebase used static types and didn’t have a higher level of productivity because of it. Once you have such a large and complex codebase, the fundamental issues become architectural, things like worst-case latency, fault tolerance, service dependency management, observability, etc.

          3. 13

            Why would I write tests when I have types? :^)

            1. 5

              Serious answer: types are, among other things, like very concise unit tests (I seem to recall a comment like “Types are a hundred unit tests that the compiler writes for you”, but I can’t find it now), but some bug might still slip through even a strong static algebraic dependent liquid flow quantitative graded modal type system, and tests are another level of defense-in-depth (and I don’t think any language has such a hypothetical type system — I’d like to see the one that does!).

              1. 3

                I remember seeing someone (I think Hillel) explain that the sensible way to check whether a really complicated static analysis thingy (such as a very complicated static type signature) actually says what you think it says is to try applying it to some obviously-wrong code that it ought to reject and some obviously-right code that it ought to accept.

                The idea of unit testing your static types is greatly amusing to me and feels like an obviously good idea.

                1. 2

                  Don’t think I ever said this. It’s a really good idea though!

                  1. 2

                    Hm must’ve misremembered, sorry. Cheers! :)

                2. 1

                  (Here are some links for anyone wondering what a strong static algebraic dependent liquid flow quantitative graded modal type system would be.)

                3. -1

                  A+ trolling but also the author might agree.

                  1. 7

                    Saying that you don’t need types as long as you have tests is A+ trolling as well.

                    1. 1

                      I’m only talking about Python, not in general.

                      1. 6

                        “Talking about Python” is general enough with how big and diverse are the use cases and contexts involving Python.

                4. 6

                  If you’re forced to write Python, perhaps because you have a large existing Python codebase that would be prohibitive to port to another language, setting up a typechecker and using type annotations is an investment of energy that will greatly pay off over time in terms of minimizing bugs. I do agree that it would be better to not write code in Python at all, though, and choose a language that gives you better tools for managing your types from the get-go.

                  1. 5

                    Having written an immense amount of Python and uplifting a massive commercial Python 2 codebase to fully type-hinted Python 3: there’s something to be said for being able to drop into a REPL, experiment, mess around, prototype, import your existing modules, mess around some more, and then lean on a static type checker once things are more solidified. It’s a nice workflow for exploratory programming.

                    1. 3

                      Yes, I think adding types once an interface has completely solidified and has many dependencies could make sense because at that point it’s mature and you care more about stability than velocity. But starting with type hints when you’re first authoring a file undermines the advantage that Python provides for exploratory programming. That is what I’m against.

                    2. 3

                      As a more ops-focused person, Python is still the language of choice for a lot of teams I’ve worked in. I’ve used patterns like this when we needed to write code that the team could easily pick up and maintain, but which also needed more explicit safety guarantees than the Python of 5 years ago might be expected to give you.

                      More concretely, why test for “this never happens” when you can simply make sure it can’t happen and then test that?

                      1. 1

                        I don’t see anything un-pythonic about any of the examples there except if you consider typing in general to be so (which, fair…).

                        This is if you want to program Python in a very structured somewhat anal-retentive way.

                      2. 6

                        If I understand correctly, this is about avoiding common pitfalls with dynamic types in Python, and making code more robust, by adding types everywhere. Where it differs from Rust is that even with all these types, Python interpreter will still happily run whatever it is given (while Rust compiler will complain loudly). So one still needs to rely heavily on mypy and pyright.

                        1. 10

                          It’s not only about type annotations. Even in a fully dynamically typed unannotated language, code might be more or less prone to runtime type errors. A good example here is null-safety.

                          Python is not null (None) safe — it occasionally happens that you try to call a method on something, and something turns out to be None.

                          Python is more null-safe than Java. Java APIs tend to happily return null, while Python in general prefers to throw. For example, looking up a non-existing element in a dictionary returns null in Java, and throws KeyError in Python. Python behavior prevents silent propagation of nulls, and makes the bugs jump out.

                          Erlang is null-safe. It is dynamically typed, but, eg, map lookup functions return ('ok, value) pair on success, and 'error atom on failure. This forces the call-site to unpack an optional even for happy cases, signaling the possibility of nulls.

                          1. 9

                            Python and Java are safe in a type safety sense with respect to None/null. The behavior is well defined (raise/throw an exception). Your complaint is about API design not language design.

                            For contrast, in C/C++ it is not defined what happens if you dereference a NULL pointer and that makes it unsafe. The NULL constant might be some thing else than 0x0. Dereferencing it could return a value or kill the process with a segmentation fault or something else.

                            1. 6

                              Yes, this is mostly a question of API design, rather than a question of language design (though, to make Erlang API convenient and robust to use, you need to have symbols and pattern-matching in the language).

                              Whether null-unsafety leads to UB is an orthogonal question.

                              1. 1

                                The NULL constant might be some thing else than 0x0.

                                https://en.cppreference.com/w/c/types/NULL

                                The macro NULL is an implementation-defined null pointer constant, which may be

                                • an integer constant expression with the value ​0​
                                • an integer constant expression with the value 0 cast to the type void*
                                • predefined constant nullptr (since C23)

                                https://en.cppreference.com/w/c/types/nullptr_t

                                nullptr_t has only one valid value, i.e., nullptr. The object representation of nullptr is same as that of (void*)0.

                                1. 3

                                  The integer literal 0 when used as a pointer is a valid way to specify the null pointer constant, but it doesn’t mean the representation of null pointers is all-zero bits.

                                  https://en.cppreference.com/w/cpp/language/zero_initialization

                                  A zero-initialized pointer is the null pointer value of its type, even if the value of the null pointer is not integral zero.

                                  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf#%5B%7B%22num%22%3A649%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C-27%2C816%2Cnull%5D

                                  7.20.3.1 The calloc function

                                  Synopsis

                                  1 #include <stdlib.h>

                                  void *calloc(size_t nmemb, size_t size);

                                  Description

                                  2 The calloc function allocates space for an array of nmemb objects, each of whose size

                                  is size. The space is initialized to all bits zero.261)

                                  Returns

                                  3 The calloc function returns either a null pointer or a pointer to the allocated space.

                                  1. Note that this need not be the same as the representation of floating-point zero or a null pointer constant.
                                  1. 1

                                    Everything that you say is true with respect to the de jure standard. The de facto standard is somewhat different and a platform where null is not 0 will break a lot of assumptions. In particular, C’s default zero initialisation of globals makes it deeply uncomfortable for null to not be a zero bit pattern.

                                    For CHERI C, we considered making the canonical representation a tagged zero value (I.e. a pointer that carries no rights). This has some nice properties, specifically the ability to differentiate null from zero in intptr_t and a cleaner separation of the integer and pointer spaces. We found that this was a far too invasive change to make in a C/C++ implementation that wanted to be able to handle other values.

                                    Similarly, AMD GPUs have the problem that their stack starts at address 0, so 0 is a valid non-null pointer. I proposed that they lower integer to pointer and pointer to integer casts as adding and subtracting one. This would let their null pointer representation be -1. This broke a lot of assumptions even in a fairly constrained accelerator system.

                                    In particular, C has functions like memset that write bytes. C programmers expect to be able to fill arrays with nulls by using memset. They expect to be able to read addresses by aliasing integers and pointers. They expect to be able to stick objects in BSS and find that they are full of nulls. They expect the return from calloc to be full of nulls, even if the standard does not require it.

                                    My favourite bit of the standard’s description of null is that a constant expression 0 cast to a pointer may compare not equal to a dynamic value of zero converted to null. If an and b are integers and a is constant, a can compare equal to b, but a cast to a pointer would compare not equal to b cast to the same pointer type. That’s such a bizarre condition that I doubt anyone writing C has ever considered whether their code is correct if it can occur.

                                    In general, I am happy what WG14 and WG21 try hard to avoid leaking implementation details into the language, but there are two places where I think that the ship has sailed. Bytes (char units) are octets and null is represented by zero. So much C/C++ code assumes these two things that any implementation that changes them may, technically, by C or C++, but it won’t run more than a tiny fraction of code written in either language.

                                    1. 1

                                      Oh, I agree that de facto C code almost universally assumes the null pointer representation to be zero when initializing data with memset or calloc, for better or worse (my C code does too). And in practice assuming nulls are zero is rather convenient. Though machines with non-zero null pointers do (or did) exist (https://c-faq.com/null/machexamp.html)

                                      In particular, C’s default zero initialisation of globals makes it deeply uncomfortable for null to not be a zero bit pattern.

                                      Pointers are initialized to the null pattern in the default zero initialization, so I’m not sure what you mean. Are you saying that it is inconvenient for null representation to be non-zero because then the compiler couldn’t put the zero-initialized global in .bss? I agree that’s a good point in favor of having a zero null in practice, but it wouldn’t be so dire, an implementation could put the global data in .bss to be zero-initialized by the loader and then manually initialize the pointer fields, for example with a special pseudo-relocation. Though I don’t know if there’s any implementation that does anything like that, and that doesn’t solve the problem for calloc/memset.

                                      1. 1

                                        Pointers are initialized to the null pattern in the default zero initialization, so I’m not sure what you mean. Are you saying that it is inconvenient for null representation to be non-zero because then the compiler couldn’t put the zero-initialized global in .bss?

                                        Yes, and the mid-level optimisers in most compilers really, really like that assumption. If they need to fill in another bit pattern then causes problems. More importantly, it’s surprisingly common to put huge data structures in BSS. Every platform except Windows was very happy for snmalloc to put a 256 GiB structure full of pointers in BSS and then have the OS lazily allocate pages as needed. If null were non-zero, then suddenly that would require a 256 GiB binary. We are quite an extreme example of this but I’ve seen variants of it elsewhere.

                                        implementation could put the global data in .bss to be zero-initialized by the loader and then manually initialize the pointer fields, for example with a special pseudo-relocation

                                        That’s basically what I did for early versions of CHERI/Clang and it has some really bad pathological cases in some common codebases.

                                        1. 1

                                          This reminded me of MM_BAD_POINTER. I wonder whether its marginal utility is practically zero, or even negative due to the .bss considerations, now that NTVDM is rather long in the tooth. That’s the main reason I’m aware of for having the null page be dereferencable (the VM86 IVT was situated there). Even then, who knows, with all the speculative execution mitigations, I’ve lost track of whether a 32-bit Windows driver running code on behalf of an NTVDM process would actually be null-dereference-exploitable.

                          2. 6

                            This is exactly how I use Python, and it’s a breeze to program in! The benefits of a strong type system with the ability to ignore it when trying things out is a really nice combination if one does not need performance.

                            Small note : the assert False at the end of the matches of a union type can be replaced by a assert_never, this ensures the match is exhaustive.

                            1. 4

                              The proper solution is to return a strongly typed object with named parameters that have an attached type. In Python, this means we have to create a class.

                              No love for TypedDict?

                              1. 4

                                If you don’t have to use dicts due to external constraints, they’re pretty strictly inferior.

                                1. 2

                                  Why, because they lack methods?

                                  1. 2

                                    Also because ‘x bracket quot myfield quot bracket’ is more (and more annoying) to type than ‘x dot myfield’.

                                    1. 2

                                      I would never let something so insignificant and macro-able impact my opinion on a feature

                                      1. 2

                                        Editor features around fields (ex: find usages of this field) won’t work as well too.

                              2. 3

                                This looks a lot like any statically type language with some functional flair. I mean the article basically is about types and ADTs.

                                1. 3

                                  ADTs (enums with data) work really well in Rust, and it’s a widely-applicable feature that could easily work in other languages. I’m surprised other languages haven’t added them yet.

                                  1. 4

                                    I think everyone actually have added an equivalent already

                                    • Java has sealed classes
                                    • Python has union types
                                    • TypeScript has union types
                                    • C++ has std::variant and visit
                                    • Kotlin has sealed classes
                                    • Dart has sealed classes
                                    • Swift has Rust enums basically
                                    • C# doesn’t seem to have exhaustiveness checking out of the box, but it does have pattern-matching at least, and you can simulate sealed class with a private constructor.
                                    • Haskell is in the same boat as C# — pattern matching without exhaustiveness
                                    • Go! I think Go genuinely doesn’t have an equivalent feature! Though iirc there are some planes to use interface type-sets to do this.
                                    1. 5

                                      Haskell is in the same boat as C# — pattern matching without exhaustiveness

                                      GHC has it under -fwarn-incomplete-patterns which is part of -W, so it’s hardly the same boat as C#.

                                      1. 2

                                        What about C and — one you use — Zig?

                                    2. 1

                                      A similar problem to exhaustive matching is exhaustive initialization – making sure a variable is being set by all branches.

                                      Problematic pattern:

                                      while True:
                                          if …:
                                              foo = …
                                          else:
                                              # Say someone adds a branch that forgets to set foo.
                                      
                                          use(foo) # BUG: May use the foo value from previous iteration!
                                      

                                      In C and Rust, the compiler will complain if we use a variable without initializing it in all branches. So just don’t initialize it:

                                      while (true) {
                                          Foo foo; // Declare, but don't initialize!
                                          if (…) {
                                              foo = …
                                          } else {
                                              // Say someone adds a branch that forgets to set foo.
                                          }
                                          use(foo); // -Werror=uninitialized: "foo" may be used uninitialized
                                      }
                                      

                                      In C++, default constructors come in the way of this pattern. But just wrap it in a lambda – now you have an expression to assign from:

                                      while (true) {
                                          Foo foo = [&]() {
                                              if (…) {
                                                  return …
                                              } else {
                                                  // Say someone adds a branch that forgets to return a Foo.
                                              }
                                              // -Werror=return-type: Control reaches end of non-void function
                                          }();
                                          use(foo);
                                      }
                                      

                                      This is even easier in Rust, since scopes, if- and match expressions are, well, expressions already. Python only has if- and match statements. I haven’t seen a good way to do this in python yet, other than breaking it up into separate functions.