Threads for viebel

  1. 3

    Sounds like an interesting book. Would love to hear from others what their experiences with this are.

    1. 2

      The original ‘Data Oriented Design’ book was a good read. I wonder if this might be a more approachable introduction? The techniques are great, but that’s not a surprise: it’s relational design.

      1. 3

        Author of the book here: The book is about “Data-Oriented Programming” not about “Data-Oriented Design”.

        1. 4

          Ah, I hadn’t read closely enough. The naming is deceiving. How is this different from functional programming, or maybe just “program like you’re writing Standard ML”?

          1. 1

            DOP embraces generic data structures like string maps to model domain data. DOP separates between data representation and data validation.

            1. 1

              Okay. I can see how that comes out of the Clojure community. More like how you do FP in Erlang than ML.

              1. 1

                Exactly!

                In the book, the character that unveils the secrets of DOP presents himself as a Clojure developer.

    1. 8

      I’m not sure I understand forcing things into classes when literally all the methods are staticmethod. If the point is purity and always passing in all needed input data without needing to carry internal state (and the classes have no members aside from their methods anyway), why not just replace the class with a set of functions in a module?

      1. 9

        It’s a readability thing:

        Please note that examples will use the classes+static method form in order to make this article readable. In a production code, the modules+functions form is the way to go.

        1. 4

          I was thinking the same thing. It’s probably just for pseudo namespacing within a single file instead of 3 separate modules. Total guess though.

          Edit: Also hey @ubernostrum! I’ve followed your for years via Django (maybe since 0.96 days), but hadn’t seen you in a few years. Just noticed that was you. Thanks for all your work.

          1. 4

            Indeed, in Python, having a module with a set of functions would be more appropriate.

            I guess that the reason the author of the article used Python classes is related to the fact that in the book, I have illustrated the separation between code and data using JavaScript classes instead of JavaScript modules.

            My idea was to illustrate how separation between code and data could be applied even in languages where everything has to be a class (e.g. Java). But in languages that support module (or namespace) like Python, JavaScript and Clojure, I think it’s better to avoid classes for aggregating functions.

            1. 2

              Besides readability (classes can act as a namespacing construct within a module), you can use classes to define environmental data (ie. secrets, urls, host/ports, etc), in a pattern like so:

              class MyAPI:
                 def __init__(self, environment):
                    self.environment = environment
              
                 def perform_action( self, arg ):
                    ....      
              

              and then

              api = MyAPI(...)
              api.perform_action(...)
              

              The key problem with DOP is that if you’re doing concurrent state updates, you end up with two distinct state branches that need to be merged. This is a slow operation unless the transforms are incremental (consume the previous state, a data atom, and produce a delta on the previous state). If Python gets immutable data structures and differential dataflow primivitives, then this will be much easier and performant!

              1. 2

                Indeed managing concurrent mutations with mutable data structures is not efficient. In the book, I am devoting Chapters 4 and 5 to illustrate how immutability make it more efficient using Lodash FP.

                I would love to see a similar implementation in Python. Maybe pyrsistent can help?

                1. 1

                  What do you mean by efficient? I suspect that the kernel and the JVM runtime would be massively slower if they couldn’t use atomics and compare-and-exchange primitives.

                  1. 1

                    Could you elaborate a bit?

              2. 1

                it tends to get a lot of pushback, but i personally like the use of classes as namespaces for a bunch of related functions.

                1. 2

                  I agree. It can be nice if you want to write something in a single file and still have some namespacing.

              1. 1

                Nowadays, efficient persistent data structures are available in most programming languages.

                But not C++ nor Rust … ?

                1. 4
                  1. 1

                    Thanks! This looks great.

                  2. 2

                    There are definitely plenty for Rust.

                    https://lib.rs/search?q=Persistent

                  1. 3

                    Confusingly, what is discussed in the post is not the same as data oriented design: https://en.m.wikipedia.org/wiki/Data-oriented_design

                    1. 1
                      1. 7

                        It is, but so is steering a ship, and yet no captain would steer right into the reefs.

                    1. 1

                      Their server is overloaded.

                      Here is the archived (non-interactive) version: https://web.archive.org/web/20210829092736/https://blog.klipse.tech/golang/2021/08/29/blog-go.html

                      1. 2

                        That’s the main differentiator of Klipse: It’s client side => No scalabilty and stability issues!

                      1. 4

                        Ooh, nifty! There’s Common Lisp support, so I might explore setting this up for PAIP.

                        1. 2

                          What is PAIP?

                          1. 4

                            Paradigms of Artificial Intelligence Programming (1992), by Peter Norvig, is an advanced Lisp text. I’ve worked, off and on, on fixing up the online copy; it’s a work in progress.

                            I was going to leave a breadcrumb on the relevant issue, but Klipse was already mentioned there.

                        1. 2

                          What are the security implications?

                          1. 7

                            No new implications. The website executes code which you provide and executes it locally. You already trust it with externally provided code, so nothing changes.

                            1. 3

                              In case Klipse would be used on a blogging platform like medium, it might allow a malicious blogger to run arbitrary code on a medium page. That’s why blogging platform forbids script tags et al… But on your own blog, there are no security issues.

                            1. 4

                              See also Immer, a similar, popular library. A comparison:

                              immutability-helper
                              import update from 'immutability-helper';
                              
                              const newData = update(data, {
                                x: {y: {z: {$set: 7}}},
                                a: {b: {$push: [9]}}
                              });
                              
                              Immer
                              import produce from 'immer';
                              
                              const newData = produce(data, draftData => {
                                draftData.x.y.z = 7;
                                draftData.a.b.push(9);
                              });
                              
                              1. 2

                                I’m a big fan of immer. One of the other things you can do is get it to output the changes in (near) standard JSONPatch format.

                                1. 1

                                  Could you share an example of how to produce a patch with immer?

                                  1. 2

                                    There’s some good doc on it here: https://immerjs.github.io/immer/patches/

                                    1. 1

                                      Very cool!

                                2. 1

                                  How does immer avoids deeply copying the source object?

                                  1. 2

                                    I think draftData is similar to a Proxy that records what you try to do, and the docs and source code of immer seem to support that, though I have not checked much.

                                    1. 1

                                      Could someone explain in plain english how does immer leverage Proxy to implements strucural sharing?

                                      1. 2

                                        When you say draftData.x.y.z = 7, draftData intercepts the accesses to x, y and z, recording the path and the value 7. After the drafting function returns, it knows where to make copies as needed to support the modifications requested.

                                  1. 2

                                    I don’t see what any of this has to do with functional or object-oriented programming. It seems more to do with referential transparency, which is not a distinguishing feature of functional programming languages (though it is admittedly more common among functional languages than others).

                                    1. 1

                                      I agree. But I think it that in OO, it’s very hard to embrace referential transparency.

                                    1. 4

                                      This has led to the quip “a functional programmer knows the value of everything and the cost of nothing”.

                                      1. 17

                                        whereas a Pascal programmer knows the Wirth of everything.

                                        1. 5

                                          Whereas Europeans generally pronounce his name the right way (‘Nick-louse Veert’), Americans invariably mangle it into ‘Nickel’s Worth.’ This is to say that Europeans call him by name, but Americans call him by value.

                                          – Introduction by Adriaan van Wijngaarden at the IFIP Congress (1965).

                                        2. 2

                                          Since the discovery of efficient persistent data structures and Clojure implementation of HAMTs, FP doesn’t suffer from performance hit like it used to be with LISP when every piece of data was represented with an immutable list.

                                          1. 2

                                            I agree, I write Haskell for money and the cost of computing a value is much lower than advertised!

                                        1. 4

                                          Math is value oriented programming.

                                          Oh and here we go again with the flawed analogies. So let’s dig deeper.. a deterministic view of the universe will be able to have a function describing the state of any particle/wave as a function of time. Another view is that things are scholastically mutating with time. Functions are not even primitive though if you consider them a special case of relations between sets.. and so on. Haskell does just fine with lambda calculus and category theory on the extreme end.

                                          One of the reasons why a program written in functional programming is less complex than an object-oriented program is because, as we have just seen, it is more complex to define the sameness of objects than the sameness of values.

                                          It’s less complicated because it tries its best to decouple code from state.

                                          1. 1

                                            When we decouple code from state, most of our code manipulates state.

                                            1. 1

                                              State is unavoidable but it’s best to contain it than spread it around like a toddler throwing spaghetti. Imagine writing a sql operation that saved state between calls? Yuck.. except that’s what ORM does in a sense :P

                                          1. 4

                                            The cost of the mug example is a great illustration of the ways that mental shortcuts can lead to big problems when we’re thinking about software. The problem I see starts with the simplification that a mug is just a description and a price. In other words:

                                            type Mug = (Description, Price)
                                            

                                            or if we want to use more fundamental types

                                            type Mug = (String, Integer)
                                            

                                            The author then raises the perfectly reasonable question “What if we want to change the price of the mug”. So far so good, but the next step the author makes is an error caused by conflating two different things. The author says “of course the attribute shouldn’t change! it’s the same mug!” but of course that’s completely wrong. We’ve already defined Mug to be, by definition, the product of a price and a description.

                                            Instead, we have implicitly moved to a second definition of a Mug that has some other defining attribute. Now, our price is in fact some function of time or some other global property, and the identifier of items in the shop:

                                            type Mug' = String
                                            mugPrice : Mug -> Integer
                                            

                                            Equality isn’t actually broken, or even unexpected in this view of the world, it’s simply that we’ve switched into an alternative definition of Mug and price in the second part of the description.

                                            1. 2

                                              Author here: If you read carefully the article you’ll see that I didn’t write: “a mug is a description and a price” but “a mug has a description and a price. It makes a big difference.

                                              An object has a state. An object is not a state.

                                            1. 1

                                              I wish they were persistent, though I suppose using the spread operator can work

                                              1. 1

                                                In the proposal, they mention that the spread operator should work.

                                                const proposal2 = #{
                                                  ...proposal,
                                                  title: "Stage 2: Record & Tuple",
                                                };
                                                

                                                For tuples, they mention .with():

                                                const measures = #[42, 12, 67, "measure error: foo happened"];
                                                const correctedMeasures2 = measures.with(3, -1);
                                                
                                                1. 1

                                                  I noticed that - it will work but not as elegantly as persistent structures would. Though, in theory, I don’t see any reason why a vendor couldn’t make Tuple#with persistent underneath.

                                                  1. 1

                                                    Why won’t it be as elegant as persistent structures?

                                              1. 2

                                                I like this library because it allows us to enjoy the benefits of data immutability without the need to use non-native data structures (e.g. Immutable.js).

                                                1. 2

                                                  My favourite piece from the article:

                                                  There are patterns in C to write object-oriented code but we chose C++.

                                                  There are patterns in C++ for managing object lifecycles but we chose the JVM.

                                                  There are patterns in Java for managing concurrency but we chose persistent data structures

                                                  1. 2

                                                    Great article!

                                                    I am curious to hear thoughts regarding the application Data-Oriented Programming approach to other programming languages.

                                                    1. For languages like JavaScript, we need to enforce data immutability
                                                    2. For languages like Haskell, we need to represent data with hash maps
                                                    3. For languages like Java, we need to separate code from data

                                                    There is no question that it is possible to do so.

                                                    The real question is: would the community adopt it?

                                                    1. 4

                                                      Are there implementations of persistent/immutable collections in Swift, similar to what we have natively in Clojure or via libraries like Immutable.js in JavaScript or Paguro in Java?

                                                      1. 5

                                                        If I take your meaning, yes. But the implementation is different from the way a Java class would have to enforce its own immutability. Collections in Swift are generally aimed at preventing shared mutable state. They have a way to provide mutability, but those mutations aren’t shared with other users of the collection.

                                                        Swift collections like Array are generally not classes but struct types, with value semantics. Assignment copies the value, not a reference to it. Mutations are disallowed through any constant variable, which is the idiomatic normal way to make any local variable or property. Behind the scenes there is a reference to a backing store, so that copies are cheap and mutations can grow the collection, and the value can remain a compile-time constant size. But mutations are still possible through a mutable variable, and in that case the backing store will be copied on write when it isn’t uniquely owned by just one copy already. So it’s a private mutation, not a shared one. The Array type didn’t have to assert immutability; rather it had to implement a copy on write pattern.

                                                        Finally, in the upcoming Swift Concurrency implementation of the actor model, only copyable data types can pass across actor isolation boundaries.

                                                        1. 3

                                                          I don’t think this is what was being asked, and I think the correct answer is “no, Swift doesn’t have persistent collections”:

                                                          The core property of persistent collections is that “mutating” operations are asymptotically as cheap as mutable collections, by retaining the unchanged parts instead of copying everything.

                                                          So if your COW collection (like array) copies everything on a change, it isn’t persistent.

                                                            1. 1

                                                              Yes, please read it.

                                                              1. 3

                                                                So if your COW collection (like array) copies everything on a change, it isn’t persistent.

                                                                Your claim is that “if your collection copies everything on a change, it isn’t persistent”. This is directly disputed by the wikipedia article where it states COW is a method for creating Persistent Data Structures. Inefficent, yes. But a method nevertheless.

                                                                1. 0

                                                                  [citation needed]

                                                                  1. 1

                                                                    There is a naive scheme to make any data structure persistent. This scheme performs the operations exactly as they would have been performed in an ephemeral setting but before each update operation it makes new copies of all input versions. Then it performs the update on the new copies.

                                                                    Kaplan, Haim (2001). “Persistent data structures”. Handbook on Data Structures and Applications https://cs.uwaterloo.ca/~imunro/cs840/Kaplan_persistent-survey.pdf

                                                                2. 1

                                                                  Reading it, I think you’re pointing to this part:

                                                                  This is an inefficient technique because the entire backing data structure must be copied for each write

                                                                  But I’m not sure how being labeled an inefficient technique for persistent data structures makes it not a persistent data structure at all.

                                                                  In Swift’s case, there’s an attempt to mitigate that cost, which is that copying only happens on write when the backing data structure is not yet uniquely owned, after which the new copy is uniquely owned, and further mutations do not copy again. In effect, what the programmer sees as a collection is a history snapshot of the collection. The backing structure of items is preserved not for every step of history, but for every step that some collection still has a handle to.

                                                                  1. 1

                                                                    Because it’s as if you called cutting the power cord to your workstation “garbage collection”.

                                                                    Yes, technically all memory has been cleaned up, but everyone knows that this is not what people have in mind when using the term.

                                                                    Wholesale copying the complete array on change is certainly not what people have in mind when talking about persistent data structures.

                                                                    1. 1

                                                                      Ha, ok. Well I don’t doubt you. It just seems like there are strict and loose interpretations of the term at play here. I’ll encourage you to contribute changes to that Wikipedia article since you’re familiar with the research.

                                                              2. 1

                                                                If I understand what you mean, in Swift there is a copy of the buffer, but it’s a shallow copy, not a deep copy. In that sense it doesn’t copy everything on a change. But there are two cases, and I wonder whether you’d call each case a persistent collection. Before a mutation of a collection whose backing store is not uniquely owned:

                                                                If the items are class objects, then the backing store of many pointers is copied. The objects themselves are shared.

                                                                If the items are value types, like integers, strings, collections, or algebraic data type values, then they’re stored inline in the backing store, and indeed they’re all copied up to but not past any reference property they contain. So mutating an array of sets, for instance, will copy the array’s backing store of many sets, but not the sets’ backing stores. Again those are shared.

                                                                1. 1

                                                                  No, that’s not what is meant by persistent data structures. Here is an introduction to the topic with a few examples.

                                                                  1. 2

                                                                    I think I’m reading here that under lazy evaluation, the work entailed by preserving history across many mutations is reduced by evaluating once at the end. Retaining the unchanged parts is a matter of adding interstitial thunks through which you’d observe the result of changes, including undisturbed original items. I can see how that doesn’t involve a copy across the length of the structure, and I imagine the work is instead on the order of the number of mutations.

                                                                    In Swift, the work entailed by preserving history across many mutations is reduced by copying once for the first local change, after which point the structure is uniquely owned and can be mutated in place. Swift also idiomatically discourages mutation in the first place. When the need arises, though, it will have to copy those N items.

                                                                    Swift doesn’t pretend to be a functional language, it just points us that direction from an imperative starting line.

                                                                    1. 1

                                                                      Persistent data structures do not require lazy evaluation or being a functional language, see HAMTs and its usage as an example for that.

                                                                      I think it’s fair to say that Swift simply does not offer anything in this regard.

                                                              3. 2

                                                                Interesting!

                                                                Could you explain what happens in Swift in terms of mutations and copies when we write something like this pseudocode:

                                                                var m = bigNestedMap
                                                                m["b"]["c"] = 2
                                                                
                                                                1. 2

                                                                  This is more complicated than it seems :) First, for map (dictionary) type in Swift, subscript will return optional, thus, m[“b”][“c”] won’t compile. You can, however, do: m["b"]!["c"] = 2. In this case, if “b” exists, it will modify m such that m["b"]! will contain the new “c” key. If “b” doesn’t exist, it will panic. This works because m is mutable, and m["b"]!["c"] will make a mutating func subscript(key: String) { set { } } call, which can mutate m["b"]!.

                                                                  To make it is easier in Swift in case “b” doesn’t exist, Swift support this syntax: m["b", default: [:]]["c"] = 2, it will create an empty dictionary in-place if cannot find with key “b”.

                                                                  Note that this behavior is different if you do:

                                                                  var m = bigNestedMap
                                                                  var mb = m["b", default: [:]]
                                                                  mb["c"] = 2
                                                                  

                                                                  In the above case, you are not making in-place mutating func call, hence, m won’t change.

                                                                  (Actually, thinking about it again, I don’t think that I explained well why for your case, it will mutate m, this does better job at it I think: https://github.com/apple/swift-evolution/blob/main/proposals/0165-dict.md)

                                                                  1. 1

                                                                    Happy to. First, Swift subscripts are like computed properties in many languages: They have get and set functions that you implement. The difference from a computed property is that a subscript’s two functions take an index argument too. So a single subscript assignment x[1] = 2 is equivalent to a hypothetical x.put(1, 2). A subscript read x[1] is like x.get(1).

                                                                    Subscript assignment on a struct is like any other mutating function on a struct: it’s allowed only if the struct value is in a mutable variable. That applies to both the major and minor levels here.

                                                                    We can think of your example case as:

                                                                    var m = bigNestedMap
                                                                    
                                                                    // Make a copy of m["b"], sharing its backing store using the copied reference within
                                                                    var n = m.get("b") 
                                                                    
                                                                    // Replace n's backing store with a shallow copy if it is not uniquely owned by n, then replace its item "c" with 2
                                                                    n.put("c", 2)
                                                                    
                                                                    // Replace m's backing store with a shallow copy if it is not uniquely owned by m, then replace its item "b" with n
                                                                    m.put("b", n)
                                                                    

                                                                    In the end, the number of collection backing stores has either increased by zero, one, or two depending on whether anybody else still has a copy of the originals’ backing stores.

                                                                    If m’s backing store was copied, then its items were shallow copied. Being collections, they each share their backing stores with the items in the original backing store of m.

                                                                    I’ve pretended the optional nil case doesn’t exist just to keep it simple. Dictionaries use optionals for the result of a read, but arrays just bounds-check and crash. Maybe this example is more accurate if you imagine it’s the Array type instead.

                                                                    As you would guess, this is just a naive implementation and optimizations are possible, such as the detection of unique ownership (perhaps bigNestedMap is a local that’s never used again). But in general, mutation should only affect your own view of the collection.

                                                                    1. 2

                                                                      The behaviour you describe reminds me structural sharing via path copying (See here).

                                                                      A couple of questions:

                                                                      1. Is it exactly the same as path copying?
                                                                      2. What do you mean by “backing store”?
                                                                      3. You describe how structs work. Does dictionaries (aka hash maps) work in the same way?
                                                                      1. 2

                                                                        Answering out of order:

                                                                        1: Yes, I’d call it pretty similar to that!

                                                                        3: Yes. Dictionaries are also structs in Swift, just like arrays and sets and other basic collections. All of them use the pattern I’m attempting to describe.

                                                                        2: A collection type, being a struct (a value type) and not a class (not reference type), is allocated inline with its owner, or on the stack if it’s a local variable. That allocation must be a fixed size regardless of adding items to the collection. So the struct will have a property that is a reference (a fixed size pointer) to a buffer on the heap where items are stored. That buffer is what I’m calling a backing store. Each array or dictionary or set will have one of those containing its items, and when the collection is passed around, the struct value containing the reference is copied but the buffer is therefore shared. Only when a mutation would alter a buffer that’s referenced by more than one struct value is the buffer copied. That’s a shallow copy, so if it’s for instance an array of ints, the ints will be copied then. If it’s an array of arrays, the child arrays will be copied but their buffers are again shared.

                                                                      2. 1

                                                                        Swift subscripts are like computed properties in many languages: They have get and set functions that you implement. The difference from a computed property is that a subscript’s two functions take an index argument too.

                                                                        Rather weird btw, that they still use the [], unlike more modern languages…

                                                                        1. 2

                                                                          Do you have a language in mind you’re comparing to?

                                                                1. 1

                                                                  What are common use cases - beside machine learning - that need loading SQL data in memory?

                                                                  1. 2

                                                                    I suspect there’s a whole lot of business processes that do this: generating reports, loading data into another system, etc..

                                                                    1. 1

                                                                      Why not relying on SQL queries? Why would we prefer to fetch the rows and query inside the application?

                                                                  1. 3

                                                                    However, over time, in my experience, the suitability-to-task of these statically typed class models for information systems programming was quite low, and the benefts of the type checking minimal, especially in addressing the number one actual problem faced by programmers: the overwhelming complexity inherent in imperative, stateful programming

                                                                    This is probably the most controversial opinion in the entire paper. It’s a catalyst for many decisions made in Clojure and demonstrates the importance of saying “this is the problem I’m trying to solve.”

                                                                    “The overwhelming complexity inherent in imperative, stateful programming” - does Clojure help solve it? I’d say so. Simply by making immutability a core assumption. Is it the “number one problem”? What is a better solution than Clojure’s solution? The existence of Clojure at least helps push these questions forward.

                                                                    1. 1

                                                                      The programming paradigm that Clojure embraces is “Data-Oriented programming”. I believe that in 2021, this paradigm is applicable even in statically-typed languages. It requires discipline though. But the fact that Clojure’s implementation of efficient immutable data structures has been ported over the years to many languages is helpful.

                                                                    1. 1

                                                                      I am giving an online talk about Data-Oriented programming in Java. I am excited about the opportunity to cross the boundaries between static and dynamic language and to initiate fruitful discussion between the two approaches.