Threads for benji

  1. 13

    It’s fun to see how different languages handle the same patterns. My fav Clojure relies on plain maps as it’s core data structure and explicitly rejects paragraphs like this:

    Not only do dicts allow you to change their data, but they also allow you to change the very structure of objects. You can add or delete fields or change their types at will. Resorting to this is the worst felony you can commit to your data.

    1. 7

      I haven’t coded that much Clojure (although I keep tabs on it), but as someone who jumps back and forth between Python and JavaScript I’ve noticed the same distinction.

      I think the reason dicts are not used as much in Python as objects in JavaScript and maps in Clojure is that dicts feel like second-class citizens; first of all, dict access is awkward. Compare foo[“bar”] with eg foo.bar or (:bar foo)—both JS and Clojure have affordances for literal key access which are missing in Python.

      Secondly, tooling support (autocompletion etc) has historically been very poor for dicts. This has sort of changed with the introduction of types dictionaries, but then you have to wrestle with the awkward dict syntax again.

      Ultimately I think this often is to Python’s detriment—for example, translating Python classes to a wire protocol often involves quite a lot of boilerplate. Luckily there are libraries like Pydantic that provide good solutions to this problem, but it’s still not as seamless as eg serializing Clojure maps.

      1. 6

        It’s funny you mention JS because if I needed an arbitrary key-value map I would not just use literal keys, but rather a Map. Consider what happens if you try to use a key named ‘toString’, or ‘hasOwnProperty’.

        I also don’t think you should just be shoving your in-memory representation over the wire unless it’s extremely simple, especially in situations where you might need to change it and your client and your server might get out of sync in terms of versions.

        1. 4

          for example, translating Python classes to a wire protocol often involves quite a lot of boilerplate. Luckily there are libraries like Pydantic that provide good solutions to this problem, but it’s still not as seamless as eg serializing Clojure maps.

          I guess I don’t really get this one.

          Pydantic is the hot new kid on the block, sure, but if you’re building networked services this stuff is table stakes and has been for years and years. If you use Django there’s DRF serializers. If you don’t use Django there’s Marshmallow. In both cases the tooling can auto-derive serialization and deserialization and at least basic type-related validation from whatever single class is the source of truth about your data’s shape, whether it’s an ORM (Django or SQLAlchemy) model, or a dataclass or whatever.

          So I literally cannot remember the last time I had to write “quite a bit of boilerplate” for this. Maybe if I were one of the people I see occasionally who insist they’ll never ever use a third-party framework or library? But that seems like a problem with the “never use third-party”, not with the language or the ecosystem.

        2. 5

          At the same time, I understand Clojure has this inclination toward keys that aren’t just any old string, but are namespaced and meaningful. I wonder if Clojure programs at a certain complexity would still translate from wire format maps to domain model maps?

          1. 3

            At the same time, I understand Clojure has this inclination toward keys that aren’t just any old string, but are namespaced and meaningful.

            And even with that, sometimes it can be very hard to get your bearings when you’re jumping in at a random piece of code. Figuring out what the map might look like that “should” go into a certain function can be very difficult.

            I wonder if Clojure programs at a certain complexity would still translate from wire format maps to domain model maps?

            I work on a complex codebase and we use a Malli derivative to keep the wire format stable and version the API. The internal Malli model is translated to JSON automatically through this spec, and it also ensures that incoming data is well-formed. It’s all rather messy and I’m not sure if I wouldn’t prefer manual code for this because Malli is quite heavy-handed and its metaprogramming facilities are hard to use and badly documented.

          2. 5

            The advice in the article is wrong for Python as well. Dicts are not opaque, it’s wrapping them in bespoke custom classes that makes data opaque. I should probably blog about it, because there’s much more I want to say than fits in a comment.

            1. 10

              Dicts are not opaque, it’s wrapping them in bespoke custom classes that makes data opaque.

              Dicts aren’t opaque in the sense of encapsulation, but they’re opaque in the sense of making it harder on the developer trying to figure out what’s going on.

              If I’m working with a statically-typed codebase (via mypy), I can search for all instances of a given type. I can also look for all accesses of a given field on that type. It’s not possible to usefully do this with a dict, since you’re using dicts everywhere. You also can’t say “field X has type Y, field Z has type Q” unless you use TypedDict and then at that point you don’t gain anything from not using a real class.

              Similarly, I can look at the definition for the class and see its fields, methods, and docstrings. You can’t do that with a dict.

              I’ve been working with a codebase at $WORK that used dicts everywhere and it was a huge pain in the ass. I’ve been converting them to dataclasses as I go and it’s a lot more convenient.

              1. 1

                You might be interested in TypedDict (also described in PEP-585) and the additions to TypedDict in PEP-655.

                1. 1

                  I’ve used TypedDict as a transitional measure while doing the dicts-to-dataclasses thing; it was definitely super helpful there.

                  1. 1

                    I’m not sure why TypedDict exists. You may as well opt for a dataclass or pydantic. Maybe it’s useful for typing **kwargs?

                    1. 2

                      The primary idea being that if a dictionary is useful in a given circumstance, then a dictionary with type assertions is often even more useful. The motivation section of the PEP expands on that a little.

                      1. 1

                        it exists to help add type checking support to code that needs to pass dicts around for whatever reason (e.g. interop with legacy libraries)

                  2. 2

                    Please post the article here if you do write it, cuz I don’t know much about Python best practices.

                    1. 1

                      I will. Although I don’t have any official claim at those practices being officially “best”. I only know they work best for me :-)

                1. 3

                  Caveat: I don’t have a lot of experience with C (or C++).

                  I had the impression from reading various internet discussions that null-terminated strings were considered a mistake. After some searching, I found multiple impassioned defenses of them on Stack Overflow. This gives me more context and understanding for why null terminated strings were chosen for C, but doesn’t provide any reason why almost no languages since uses them.

                  What is Zig’s rationale for using null terminated strings?

                  1. 4

                    Zig has support for both null and non-null terminated strings. The []const u8 type, which is the convention for strings is non-null terminated. The default type for a string literal is *const [N:0]u8. This can then coerce into a []const u8 which is a slice. Null terminated strings are useful for c interop, but slices are very useful also.

                    1. 3

                      As someone who only knows a little of Zig, my guess is that the decision is a consequence of Zig’s origin. Zig is meant to be a better C. C uses null-terminated strings and (nearly) every C library does. Therefore, supporting them in an essential way seems hard to get away from.

                      1. 3

                        EDIT: looks like g-w1 actually knows: Zig has both kinds of strings, and the null-terminated ones are for C interop.

                      2. 1

                        Relying on the null terminator causes problems because calculating lengths (and doing bounds checks on random access) are O(n). C used null terminators because space was very constrained. A length field the same size as a null byte (as Pascal used) limited strings to 256 characters, which caused a lot of problems. If you have a 32-bit or 64-bit size field, you’re typically not losing much (especially if you do short-string optimisation and reserve a bit to indicate whether the short string is embedded in the space used by the size and pointer).

                        In contrast, having the null terminator can make C interop easier because you don’t need to copy strings to convert them to C strings. How much this matters depends a lot on your use case. Having the null terminator can cause a lot of problems if you have one inconsistently. For example:

                        $ cat str.cc
                        #include <string>
                        #include <cstring>
                        #include <iostream>
                        
                        int main()
                        {
                                std::string hello = "hello";
                                auto hello_null = hello;
                                hello_null += '\0';
                                std::cout << hello << " == " << hello_null << " = " << (hello == hello_null) << std::endl;
                                std::cout << "strlen(" << hello << ".c_str()) == " << strlen(hello.c_str()) << std::endl;
                                std::cout << "strlen(" << hello_null << ".c_str()) == " << strlen(hello_null.c_str()) << std::endl;
                        }
                        $ c++ str.cc && ./a.out
                        hello == hello = 0
                        strlen(hello.c_str()) == 5
                        strlen(hello.c_str()) == 5
                        

                        Converting a C++ standard string to a C string implicitly strips the null terminator (it’s there, you just can’t see it), which means that strlen(x.c_str()) and x.size() will be inconsistent.

                        The biggest mistake that a string library can make is coupling the string interface to a string representation. A contiguous array of bytes containing a UTF-8 encoding is fine for a lot of uses of immutable strings, but what happens if you want to iterate over grapheme clusters (or even unicode code points)? If you do this multiple times for the same string then you can do it much more efficiently if you cache the boundaries with the string. For mutable strings, there are a lot more problems. Consider adding a character to the middle of a string with the contiguous-array representation. It’s a O(n) operation in the length of the string, because you have to reallocate and copy everything. With a model that over-allocates the buffer then it’s O(n) in the length of the tail of the string, with periodic O(n) copies when the buffer is exhausted (amortised to something better depending on the policy). With a twine-like representation, insertion can be cheap but indexing may be more expensive. The optimal string representation depends hugely on the set of operations that you want to perform. If your string operations aren’t abstracted over the representation then there’s pressure to use a non-optimal representation.

                        Objective-C did this reasonably well. Strings implement a small set of primitive methods and can implement more efficient specialised versions. The UText interface in ICU is very similar to the Objective-C model, with one important performance improvement. When iterating over characters (actually, UTF-16 code units), implementations of UText have a choice of providing direct access to an internal buffer or to a temporary one. With a twine-like implementation, you can just update the pointer and length in the UText to point to the current segment, whereas with NSString you need to copy the characters to a caller-provided buffer.

                      1. 1

                        What is a layer?

                        1. 3

                          On many keyboards there is a mechanism such that some or all of the keys can change behavior based on which “layer” is active. The layer can be changed by holding down a key or pressing a key. In this way, “shift” is somewhat like a layer change key—it swaps from the “lower-case” layer to the “upper-case” layer.

                          On very small keyboards, layers are required to generate numbers and F-keys because there aren’t enough physical buttons to represent them.

                          1. 12

                            Make is definitely my favourite of the usual unix tools, and one I’d recommend all beginners to learn about. I’m betting most people think it’s only a build system for C, and even when it’s used sometimes it’s only as a task runner. It works for any command that takes files as an input and produces a file as output, and you get incremental and (if you define the tasks right) parallelized builds for free! And if you don’t like make itself, there’s a ton of language-specific clones like Rake or Jake that support most of its features .I just wish it could deal better with tasks that produce multiple files, not even the clones usually support that.

                            I’m using Jake on a project where I have to parse HTML and JS files and do some compilin’ and interpretin’; it’s important that I have very clear data provenance from the source files because it’s copyrighted material and I want to be able to distribute the project without including any of the incriminating data, so having a tool like make to keep track of all the inputs and outputs is very useful. JS also has some great libraries for parsing both HTML and JS, and it would be really awkward to wrap the functions that do these transformations as command line programs, and with Jake I can just call the functions in the tasks.

                            1. 2

                              As a fellow lover of Make, I’ve been getting some good use out of remake lately and thought you might like it if you hadn’t seen it.

                              1. 2

                                I just wish it could deal better with tasks that produce multiple files, not even the clones usually support that.

                                In make 4.3:

                                • New feature: Grouped explicit targets

                                Pattern rules have always had the ability to generate multiple targets with a single invocation of the recipe. It’s now possible to declare that an explicit rule generates multiple targets with a single invocation. To use this, replace the “:” token with “&:” in the rule. To detect this feature search for ‘grouped-target’ in the .FEATURES special variable. Implementation contributed by Kaz Kylheku kaz@kylheku.com

                                1. 1

                                  I agree make is great! I use it to run my backup. (There are different portions of my hard disk that need to be backed up in a certain order.)

                                1. 2

                                  Very nice writeup, especially the trick with NoReturn!

                                  I also find type narrowing extremely useful for error handling, you can use Union[Exception, T] as the result type, and ‘pattern match’ with isinstance, which works in runtime and checkable by mypy. In addition, covariant Union type lets use it with generators and consume without extra boilerplate. I write more about it here

                                  1. 2

                                    So go(lang) in Python? 😉

                                    1. 4

                                      go returns a Tuple[Exception, T]

                                      1. 2

                                        Yep. I’m describing the Go approach here

                                        1. 1

                                          Thanks for the article karlicoss. We were just discussing this pattern the other day and we ended up reaching similar conclusions. The only difference in our case was that we wanted to return T along with the exception. This makes the “pattern” of the return type a bit more messy. We need to keep exploring.

                                    2. 1

                                      You mean returning an Exception instead of raising it?

                                      1. 1

                                        ah yes, sorry! Too late to edit now :(

                                    1. 21

                                      My 2 cents. The other day, I wanted to test a new feature of Hugo that has not been released:

                                      https://github.com/gohugoio/hugo/pull/6771

                                      Hugo doesnt have a CI, so the only option was building it myself. This might not sound like a big deal, but I am on Windows. On Windows, you typically have problems building projects as the developers many times dont even test on Windows. This will lead to compile and dependency errors.

                                      Hugo is a big project, the Zip file is 12 MB, so I was pretty sure I would have some trouble with this. But I didnt. I just followed the instruction:

                                      git clone git://github.com/gohugoio/hugo
                                      cd hugo
                                      go install
                                      

                                      and it took a while, but not a single error. I have built many projects with C, C++, C#, D, Nim and others over the years, and this is the first time I have had this experience with this large of a project. The closest to this I think is FFmpeg, but even with that you needed to install dependencies else something would be missing from the output or simply fail. I had a similar experience with Rust, where I needed a new build:

                                      https://github.com/getzola/zola/issues/893

                                      except Rust just failed spectacularly, because it seems one of the dependency crates uses C on the backend, and has poor or no support for Windows:

                                      https://github.com/compass-rs/sass-rs/issues/63

                                      Go lets me get work done. I can focus on the code, where with other languages, I find myself getting distracted with the tooling or build process.

                                      1. 15

                                        Every time I write code in a language other than Go, I’m startled by how weak the tooling is. After writing all Go for about two years at a job, I switched to a place that has almost exclusively Python projects. To say that I miss go fmt is the king of all understatements. Also, after Go, I don’t see why every programming language doesn’t just come up with a native way of running tests a là go test. It’s weird to have a series of codebases where the command to execute their test suites is only the same because it’s placed behind a standardized make target.

                                        1. 16

                                          Maybe it’s just the ecosystems I play in, but it appears that language tooling is converging on this pattern though.

                                          Rust:

                                          • cargo fmt
                                          • cargo test

                                          DotNet:

                                          • dotnet format
                                          • dotnet test

                                          Elixir:

                                          • mix format
                                          • mix test
                                          1. 11

                                            Zig, too:

                                            • zig fmt
                                            • zig [build] test
                                            1. 2

                                              That’s awesome, thank you for your response! I know about prettier for Javascript (and to their credit, it really makes absolutely no sense to have a formalized code formatter for a language with no official interpreter). I also knew about cargo (slipped my mind), but am firmly outside the .NET/Elixir ecosystems, so glad to see that happening.

                                              1. 7

                                                There’s also pyfmt for python, and pytest. I think pytest came before go, though.

                                            2. 5

                                              Black (https://github.com/psf/black) is the Python equivalent of go fmt.

                                          1. 18

                                            It’s nice to see someone else abusing Python for the sake of fun. I’ve blogged in the past about many hacks, including: let, attempting to make call/cc, worlds, pattern matching with with, and dispatching with with. Basically, yes, yes, yes, more of this kind of stuff! It makes languages really fun.

                                            1. 6

                                              I’ll join this party :) I figured out how to make Rust-like macros in Python by stuffing things in type-annotations, which you can read here.

                                              1. 1

                                                I don’t write much Python these days, but as a Schemer interested in macros, I used to try out all sorts of stuff. There was one really good attempt: MetaPython which hasn’t had a release since 2009. I think that was the one I felt worked best, so if you’re still interested, you might play with it.

                                                Also, this is awesome! And, I don’t know much rust, but I did not realize that what I would call “pragmas” are powered by macros (in hindsight this makes total sense!), making them accessible for all sorts of hackery and wizardry. Thanks for sharing!

                                                1. 2

                                                  … but as a Schemer interested in macros

                                                  Do you know about Hy?

                                                  1. 1

                                                    I do! A long time ago I had a similar project called Ruse, which aimed to be a compliant Scheme on top of Python 2, which fizzled before I prepped it for release. I’m happy that someone else, independently, thought the idea of a Lisp targetting Python was good. :)

                                              2. 3

                                                It sounds like you’d enjoy my (now quite old) blog post about abusing encodings in Python: http://benjiyork.com/blog/2008/02/programmable-python-syntax-via-source.html

                                                1. 3

                                                  Thanks, that is amazing.

                                                  Very relatedly, I blogged about sourefiles using built in rot13 with a starting comment #encoding: rot13 that then have all Sourcecode encoded: https://frederik-braun.com/rot13-encoding-in-python.html

                                                2. 1

                                                  I think you might enjoy the complete works of Oleg Kiselyov, full of mind-bending trickery in Scheme, ML and others. I sometimes wish I could just set aside a year and thoroughly study and understand what Oleg is publishing.

                                                  1. 2

                                                    I sometimes wish I could just set aside a year and thoroughly study and understand what Oleg is publishing.

                                                    I sometimes ask the question, legitimately, “What would Oleg do?” – Yes, aware. But similar to you, completely understudied due to time.

                                                1. 2

                                                  Can someone confirm that except Exception: is bad practice for the use cases given in the article?

                                                  1. 2

                                                    It’s bad if you really need what that construct does.

                                                    On the other hand, if you’re just being lazy and you could find out the exact exception(s) that you want to handle, you should do that instead.

                                                  1. 1

                                                    s/invreased/increased/

                                                    1. 2

                                                      Thank you!

                                                    1. 5

                                                      I’m working on a hobby programming project that I’m very excited about: A build tool.

                                                      1. 3

                                                        Any plans for features you want to include? or is it to recreate something as a learning experience?

                                                        1. 5

                                                          My aim is to replace make. I have the core algorithm done, it takes a very plain tab separated format as input. I think there needs to be a good UI to describe builds. I’m proving it by building existing projects with it.

                                                          1. 10

                                                            Within a few weeks of writing Make, I already had a dozen friends who were using it.

                                                            So even though I knew that “tab in column 1” was a bad idea, I didn’t want to disrupt my user base.

                                                            So instead I wrought havoc on tens of millions.

                                                            (from https://beebo.org/haycorn/2015-04-20_tabs-and-makefiles.html)

                                                            1. 6

                                                              The side-note is even better:

                                                              Side note: I was awarded the ACM Soft­ware Systems Award for Make a decade ago. In my one minute talk on stage, I began “I would like to apologize”. The au­di­ence then split in two - half started laughing, the other half looked at the laughers. A perfect bi­par­tite graph of pro­gram­mers and non-programmers.

                                                      1. 4

                                                        The irony of his manufacturing analogy is delicious.

                                                        1. 1

                                                          I like split, ortho-linear (or rather, staggered columnar) keyboards, but I also like having lots of keys (5 rows, 14 columns), so I use a Diverge (2, the 3 is out now) from UniKeyboard: https://unikeyboard.io/product/diverge/

                                                          I modified mine to have a fixed base and mounted a trackball in the middle: https://imgur.com/AIW3vzy

                                                          1. 1

                                                            Recording two comedy shows for producing my buddy’s album!

                                                            Recording two (1, 2) podcasts because they are fun!

                                                            I don’t program for fun anymore and it really doesn’t bother me.

                                                            1. 2

                                                              As an after-hours audio guy, I wish you good cables and low noise floors.

                                                            1. 1

                                                              For a very well-done video walk-through, see https://www.youtube.com/watch?v=a9xAKttWgP4

                                                              1. 4

                                                                This person misses the point of their own collection.

                                                                1. 5

                                                                  Sounds like a language that doesn’t try to waste my time :) Speaking of which, i think it might be interesting to think about what an “iteration” in the language might look like - something to address “I changed some stuff, what is the result?”.

                                                                  Now, there are people who use “tests” for this purpose, but that term comes with a lot of baggage, much of which results in my time being wasted, at least in the short term. So perhaps it’s possible to factor out the quality-assurance/maintainability/code-monkey-metrics aspects and find something that a) is in-between a REPL and a test suite and b) has a simple, baggage-free term for a name.

                                                                  Perhaps “examples”? As in, some sort of collection of inputs and their previous/expected/possible results together with short-term time-saving automation. Could be as simple as recording stdout on each run and showing a diff on the next re-run. Or something very clever with generators and fuzzing.

                                                                  (Edit) Adding on to the diff idea, having useful “diffs” for (nested) maps, lists, and primitive types out-of-the-box might already go a long way.

                                                                  1. 3

                                                                    I think this is a fantastic idea. I can’t speak for other developers, but even when I don’t write full tests for a piece of code, I always run it on some example input, in a REPL or a scratch executable file, and “eyeball” the results to see if they match my expectations. Even though these “examples,” as you call them, don’t come with a machine-checkable criterion for their correctness, they are still valuable in their own right as a first step towards validating that a codebase behaves as intend, and also serve as a kind of executable documentation that future developers can use to understand how a project’s APIs work in practice. Current tooling seems to lack first-class support for this kind of example code, and in some cases (e.g. REPLs) actually actively works to ensure it never lasts beyond a single session, but I think it would be valuable to treat it as an important persistent artifact of the development process; an asset to stand alongside code, documentation, and tests.

                                                                    1. 4

                                                                      I take this approach much further than this. And make every single key on my keyboard a custom modifier key. 🙂

                                                                      1. 1

                                                                        That keyboard mapping is something else. I’m not sure it would work for me, but I’m going to keep that idea in my pocket.

                                                                      1. 14

                                                                        “A perfect keyboard would look something like this”

                                                                        [an image of a keyboard with a spacebar that’s 6x as big as every other key]

                                                                        I think … maybe we could do better than that?

                                                                        1. 4

                                                                          My 1u space key suggests you might be right.

                                                                        1. 1

                                                                          Nice!

                                                                          I do something similar, but the other way around: I select a file (and optional line number) and then switch to my editor and hit a hotkey that parses the current selection to find the file and line and navigate there.

                                                                          1. 3

                                                                            Before writing this I was awkwardly opening files by hand, which was slow error prone and awful, though I refused to install language specific, or even editor specific plugins just to add this sort of simple functionality.