Threads for hankstenberg

  1. 6

    Regarding the issue with the “shape of data”, you may want to have a look at Clojure Spec. But if you ask me, you can skip it and just take the incredible library malli. Not only is it a fantastic schema engine where types are arbitrary predicates instead a narrow selection of hard-coded categories like “string” and much better composability than in e.g. JSON Schema. It is also programatically extensible and has some great features like: generating data pased on a schema, conversion from and to JSON schema, function schemas and so on.

    1. 4

      Thank you! I will have to check out malli. I’m not sure specs are exactly the solution that I’m looking for. In many languages, it’s pretty easy to determine the exact return type as it’s declared as part of a function definition. In Lisps, this is not the case, and Clojure is no different in this regard. Functions in Lisps don’t explicitly define a return (via a statement) or return type, which can also change depending on logic. The tradeoffs for this are interesting, because it means the code is less cluttered and potentially easier to just read, but understanding the interface to a function (ie. how to use it, and what it’s used for) effectively requires either good docs or reading and understanding the function itself. Other languages, such as heavily type-hinted Python, Java, C, Go, etc. all have functions explicitly return a value of a defined shape, so just glancing at a function and having a rough understanding of what it does is generally easier. Whether that’s a good thing or not I think is up for debate, there is value in forcing the user of an interface to actually understand the function, but it may slow progress down unnecessarily in cases. This also could be interpreted as a general problem dynamically typed languages have, but I think the implicit return construct makes dealing with it a little worse.

      1. 4

        In my experience as a Scheme (and back in the day, Ruby) programmer, it’s not so much the implicit return that makes it difficult to figure out the types, but the way everything is super generic. There’s the sequence abstraction which works for lists, vectors, lazy sequences and even maps and sometimes strings. So if you’re reading a method’s code that only uses these abstract methods, you have no idea what goes in and only a vague idea of what comes out of that method. With Scheme, for example, you’d see (string-ref x 1) and immediately know that argument x must be a string. With Clojure, you’d see (nth x 1) and be none the wiser. Of course, it allows for code that’s more generic so you could use the same code with different types of inputs, but in most many cases that genericness isn’t important and actively hindering your understanding.

        Couple this with the practice of using maps everywhere (which can have any ad-hoc set of attributes), it gets pretty murky what’s going on at any point in the code. When you’re reading a method, you have to know what the map looks like, but you don’t know unless you trace it back (or put in a printf). Compare this with user-defined records (which Clojure has but isn’t typically used as much), where the slots are all known in advance and are always present, it’s much easier to read code that operates on them, because whenever a value is extracted, you can derive the type simply from the fact that an accessor method is called.

        Malli or spec are a good way to introduce “sync points” in your code at which maps are checked for their exact contents, but I’ve found that more useful for constraint checking at the input/output boundaries. Doesn’t help that much while you’re writing the main code that actually operates on your types. Especially with Malli, I’ve had to remove some internal checks due to performance issues when using validate.

        1. 2

          I totally see where you’re coming from. Personally I came from Java when I discovered Clojure and I also sorely missed the type system.

          Clojure is much more about abstracting behavior and it actually matters a lot less what exactly the shape of the data is in a certain context as long as you know you have the guarantees you need in your current context. It’s not considered good style to write operations that only work with a super specific data structure. It’s actually the same in Java where this can be done using interfaces.. with the drawback of being limited to one interface at a time. Actually there is something like an inverse of drawbacks of dynamic typing in the static typing world too, and that is the global scope of type names. If you limit yourself to a narrow set of global types, you usually pass around way too much data and/or behavior. The more specific you get, the harder it becomes to properly name things, because you need to differentiate everything from everything and you end up with a zoo of poorly named stuff. Dynamic typing on the other hand allows you to be terse and contextual with the drawback of having to be quite disciplined about making the context understandable.

          What I love about malli is that you can actually defer some pretty tedious-to-model logic that would otherwise bloat your code base to the schema engine. Let’s say you have a medical questionnaire where the gender is asked for and if the gender is “female”, then the data should contain the answer to the question “pregant yes/no”. And you need to validate the data in the back-end and update the database. Malli allows you to write a schema in which contextual dependencies between single data points can be captured. Being able to let the schema allows you to write much better code that doesn’t need to know such details. Maybe the actual logic left is then just to apply a JSON merge patch, completely independent of the specifics of the data at hand.

      1. 3

        In my experience, the issue is usually distributed state and the absence of constraints. In the synchronous SoA world, the data flow often has no clear direction and the result is an explosion of complexity that requires more and more complex tools to manage it. In my current job we have a single sophisticated service for state management (ES/CQRS due to regulatory requirements) that does just that. And every other (of about 20) services is either preparing data to be sent to it or reacting to state changes happening in it. For every pair of services that interact with each other, it’s always clear which direction the data is going and there are no loops. For me it’s the first time I actually have a good feeling about a distributed architecture and the reason is simplicity through tough constraints.

        1. 5

          We just started using it, it’s pretty neat! No more updating of redundant diagrams, you can define subsystems once and then reference then. And the results really look nice. Only thing that would be even better would be a way to build the diagrams based on terraform filles. I guess the main problem here is that in reality there are too many types of relationships.

          1. 3

            Would “Command sourcing” be a fitting name for it? It looks interesting. What kind of problem would this be a good solution for?

            1. 2

              I think “Command sourcing” is a great name, it brings “Event sourcing” to mind as both analogy and contrast. Perhaps it’s a better name than “Memory Image Pattern”.

              The kind of problems MIP is a good solution for include:

              • Complex business logic
              • Tight deadlines
              • Low latency or realtime requirements
              • Frequently changing requirements that traditionally would result in laborious database schema changes in production
              • Frequently changing requirements that makes it hard to know what kind of data is useful
              • A need to handle historical data and metadata
              • Any combination of the above

              But it’s probably not a good solution if you have any of the following:

              • A compute-intensive or very data-intensive application
              • Requirements for very high throughput
              • Requirements to purge/forget data
              • Very complex integration interfaces
            1. 7

              One of my favorite talks on software development and problem solving in general. Closely related to Peter Naur’s Programming as Theory Building”. And reminiscent of Leslie Lamport’s famous observation:

              I believe the best way to get better programs is to teach programmers how to think better. Thinking is not the ability to manipulate language; it’s the ability to manipulate concepts.”