Are there implementations of persistent/immutable collections in Swift, similar to what we have natively in Clojure or via libraries like Immutable.js in JavaScript or Paguro in Java?
If I take your meaning, yes. But the implementation is different from the way a Java class would have to enforce its own immutability. Collections in Swift are generally aimed at preventing shared mutable state. They have a way to provide mutability, but those mutations aren’t shared with other users of the collection.
Swift collections like Array are generally not classes but struct types, with value semantics. Assignment copies the value, not a reference to it. Mutations are disallowed through any constant variable, which is the idiomatic normal way to make any local variable or property. Behind the scenes there is a reference to a backing store, so that copies are cheap and mutations can grow the collection, and the value can remain a compile-time constant size. But mutations are still possible through a mutable variable, and in that case the backing store will be copied on write when it isn’t uniquely owned by just one copy already. So it’s a private mutation, not a shared one. The Array type didn’t have to assert immutability; rather it had to implement a copy on write pattern.
Finally, in the upcoming Swift Concurrency implementation of the actor model, only copyable data types can pass across actor isolation boundaries.
I don’t think this is what was being asked, and I think the correct answer is “no, Swift doesn’t have persistent collections”:
The core property of persistent collections is that “mutating” operations are asymptotically as cheap as mutable collections, by retaining the unchanged parts instead of copying everything.
So if your COW collection (like array) copies everything on a change, it isn’t persistent.
So if your COW collection (like array) copies everything on a change, it isn’t persistent.
Your claim is that “if your collection copies everything on a change, it isn’t persistent”. This is directly disputed by the wikipedia article where it states COW is a method for creating Persistent Data Structures. Inefficent, yes. But a method nevertheless.
There is a naive scheme to make any data structure persistent. This scheme performs the operations exactly as they would have been performed in an ephemeral setting but before each update operation it makes new copies of all input versions. Then it performs the update on the new copies.
This is an inefficient technique because the entire backing data structure must be copied for each write
But I’m not sure how being labeled an inefficient technique for persistent data structures makes it not a persistent data structure at all.
In Swift’s case, there’s an attempt to mitigate that cost, which is that copying only happens on write when the backing data structure is not yet uniquely owned, after which the new copy is uniquely owned, and further mutations do not copy again. In effect, what the programmer sees as a collection is a history snapshot of the collection. The backing structure of items is preserved not for every step of history, but for every step that some collection still has a handle to.
Ha, ok. Well I don’t doubt you. It just seems like there are strict and loose interpretations of the term at play here. I’ll encourage you to contribute changes to that Wikipedia article since you’re familiar with the research.
If I understand what you mean, in Swift there is a copy of the buffer, but it’s a shallow copy, not a deep copy. In that sense it doesn’t copy everything on a change. But there are two cases, and I wonder whether you’d call each case a persistent collection. Before a mutation of a collection whose backing store is not uniquely owned:
If the items are class objects, then the backing store of many pointers is copied. The objects themselves are shared.
If the items are value types, like integers, strings, collections, or algebraic data type values, then they’re stored inline in the backing store, and indeed they’re all copied up to but not past any reference property they contain. So mutating an array of sets, for instance, will copy the array’s backing store of many sets, but not the sets’ backing stores. Again those are shared.
I think I’m reading here that under lazy evaluation, the work entailed by preserving history across many mutations is reduced by evaluating once at the end. Retaining the unchanged parts is a matter of adding interstitial thunks through which you’d observe the result of changes, including undisturbed original items. I can see how that doesn’t involve a copy across the length of the structure, and I imagine the work is instead on the order of the number of mutations.
In Swift, the work entailed by preserving history across many mutations is reduced by copying once for the first local change, after which point the structure is uniquely owned and can be mutated in place. Swift also idiomatically discourages mutation in the first place. When the need arises, though, it will have to copy those N items.
Swift doesn’t pretend to be a functional language, it just points us that direction from an imperative starting line.
This is more complicated than it seems :) First, for map (dictionary) type in Swift, subscript will return optional, thus, m[“b”][“c”] won’t compile. You can, however, do: m["b"]!["c"] = 2. In this case, if “b” exists, it will modify m such that m["b"]! will contain the new “c” key. If “b” doesn’t exist, it will panic. This works because m is mutable, and m["b"]!["c"] will make a mutating func subscript(key: String) { set { } } call, which can mutate m["b"]!.
To make it is easier in Swift in case “b” doesn’t exist, Swift support this syntax: m["b", default: [:]]["c"] = 2, it will create an empty dictionary in-place if cannot find with key “b”.
Note that this behavior is different if you do:
var m = bigNestedMap
var mb = m["b", default: [:]]
mb["c"] = 2
In the above case, you are not making in-place mutating func call, hence, m won’t change.
Happy to. First, Swift subscripts are like computed properties in many languages: They have get and set functions that you implement. The difference from a computed property is that a subscript’s two functions take an index argument too. So a single subscript assignment x[1] = 2 is equivalent to a hypothetical x.put(1, 2). A subscript read x[1] is like x.get(1).
Subscript assignment on a struct is like any other mutating function on a struct: it’s allowed only if the struct value is in a mutable variable. That applies to both the major and minor levels here.
We can think of your example case as:
var m = bigNestedMap
// Make a copy of m["b"], sharing its backing store using the copied reference within
var n = m.get("b")
// Replace n's backing store with a shallow copy if it is not uniquely owned by n, then replace its item "c" with 2
n.put("c", 2)
// Replace m's backing store with a shallow copy if it is not uniquely owned by m, then replace its item "b" with n
m.put("b", n)
In the end, the number of collection backing stores has either increased by zero, one, or two depending on whether anybody else still has a copy of the originals’ backing stores.
If m’s backing store was copied, then its items were shallow copied. Being collections, they each share their backing stores with the items in the original backing store of m.
I’ve pretended the optional nil case doesn’t exist just to keep it simple. Dictionaries use optionals for the result of a read, but arrays just bounds-check and crash. Maybe this example is more accurate if you imagine it’s the Array type instead.
As you would guess, this is just a naive implementation and optimizations are possible, such as the detection of unique ownership (perhaps bigNestedMap is a local that’s never used again). But in general, mutation should only affect your own view of the collection.
3: Yes. Dictionaries are also structs in Swift, just like arrays and sets and other basic collections. All of them use the pattern I’m attempting to describe.
2: A collection type, being a struct (a value type) and not a class (not reference type), is allocated inline with its owner, or on the stack if it’s a local variable. That allocation must be a fixed size regardless of adding items to the collection. So the struct will have a property that is a reference (a fixed size pointer) to a buffer on the heap where items are stored. That buffer is what I’m calling a backing store. Each array or dictionary or set will have one of those containing its items, and when the collection is passed around, the struct value containing the reference is copied but the buffer is therefore shared. Only when a mutation would alter a buffer that’s referenced by more than one struct value is the buffer copied. That’s a shallow copy, so if it’s for instance an array of ints, the ints will be copied then. If it’s an array of arrays, the child arrays will be copied but their buffers are again shared.
Swift subscripts are like computed properties in many languages: They have get and set functions that you implement. The difference from a computed property is that a subscript’s two functions take an index argument too.
Rather weird btw, that they still use the [], unlike more modern languages…
Are there implementations of persistent/immutable collections in Swift, similar to what we have natively in Clojure or via libraries like Immutable.js in JavaScript or Paguro in Java?
If I take your meaning, yes. But the implementation is different from the way a Java class would have to enforce its own immutability. Collections in Swift are generally aimed at preventing shared mutable state. They have a way to provide mutability, but those mutations aren’t shared with other users of the collection.
Swift collections like Array are generally not classes but struct types, with value semantics. Assignment copies the value, not a reference to it. Mutations are disallowed through any constant variable, which is the idiomatic normal way to make any local variable or property. Behind the scenes there is a reference to a backing store, so that copies are cheap and mutations can grow the collection, and the value can remain a compile-time constant size. But mutations are still possible through a mutable variable, and in that case the backing store will be copied on write when it isn’t uniquely owned by just one copy already. So it’s a private mutation, not a shared one. The Array type didn’t have to assert immutability; rather it had to implement a copy on write pattern.
Finally, in the upcoming Swift Concurrency implementation of the actor model, only copyable data types can pass across actor isolation boundaries.
I don’t think this is what was being asked, and I think the correct answer is “no, Swift doesn’t have persistent collections”:
The core property of persistent collections is that “mutating” operations are asymptotically as cheap as mutable collections, by retaining the unchanged parts instead of copying everything.
So if your COW collection (like array) copies everything on a change, it isn’t persistent.
https://en.wikipedia.org/wiki/Persistent_data_structure#Copy-on-write
Yes, please read it.
Your claim is that “if your collection copies everything on a change, it isn’t persistent”. This is directly disputed by the wikipedia article where it states COW is a method for creating Persistent Data Structures. Inefficent, yes. But a method nevertheless.
[citation needed]
Kaplan, Haim (2001). “Persistent data structures”. Handbook on Data Structures and Applications https://cs.uwaterloo.ca/~imunro/cs840/Kaplan_persistent-survey.pdf
Reading it, I think you’re pointing to this part:
But I’m not sure how being labeled an inefficient technique for persistent data structures makes it not a persistent data structure at all.
In Swift’s case, there’s an attempt to mitigate that cost, which is that copying only happens on write when the backing data structure is not yet uniquely owned, after which the new copy is uniquely owned, and further mutations do not copy again. In effect, what the programmer sees as a collection is a history snapshot of the collection. The backing structure of items is preserved not for every step of history, but for every step that some collection still has a handle to.
Because it’s as if you called cutting the power cord to your workstation “garbage collection”.
Yes, technically all memory has been cleaned up, but everyone knows that this is not what people have in mind when using the term.
Wholesale copying the complete array on change is certainly not what people have in mind when talking about persistent data structures.
Ha, ok. Well I don’t doubt you. It just seems like there are strict and loose interpretations of the term at play here. I’ll encourage you to contribute changes to that Wikipedia article since you’re familiar with the research.
If I understand what you mean, in Swift there is a copy of the buffer, but it’s a shallow copy, not a deep copy. In that sense it doesn’t copy everything on a change. But there are two cases, and I wonder whether you’d call each case a persistent collection. Before a mutation of a collection whose backing store is not uniquely owned:
If the items are class objects, then the backing store of many pointers is copied. The objects themselves are shared.
If the items are value types, like integers, strings, collections, or algebraic data type values, then they’re stored inline in the backing store, and indeed they’re all copied up to but not past any reference property they contain. So mutating an array of sets, for instance, will copy the array’s backing store of many sets, but not the sets’ backing stores. Again those are shared.
No, that’s not what is meant by persistent data structures. Here is an introduction to the topic with a few examples.
I think I’m reading here that under lazy evaluation, the work entailed by preserving history across many mutations is reduced by evaluating once at the end. Retaining the unchanged parts is a matter of adding interstitial thunks through which you’d observe the result of changes, including undisturbed original items. I can see how that doesn’t involve a copy across the length of the structure, and I imagine the work is instead on the order of the number of mutations.
In Swift, the work entailed by preserving history across many mutations is reduced by copying once for the first local change, after which point the structure is uniquely owned and can be mutated in place. Swift also idiomatically discourages mutation in the first place. When the need arises, though, it will have to copy those N items.
Swift doesn’t pretend to be a functional language, it just points us that direction from an imperative starting line.
Persistent data structures do not require lazy evaluation or being a functional language, see HAMTs and its usage as an example for that.
I think it’s fair to say that Swift simply does not offer anything in this regard.
Interesting!
Could you explain what happens in Swift in terms of mutations and copies when we write something like this pseudocode:
This is more complicated than it seems :) First, for map (dictionary) type in Swift, subscript will return optional, thus, m[“b”][“c”] won’t compile. You can, however, do:
m["b"]!["c"] = 2
. In this case, if “b” exists, it will modify m such thatm["b"]!
will contain the new “c” key. If “b” doesn’t exist, it will panic. This works because m is mutable, andm["b"]!["c"]
will make amutating func subscript(key: String) { set { } }
call, which can mutatem["b"]!
.To make it is easier in Swift in case “b” doesn’t exist, Swift support this syntax:
m["b", default: [:]]["c"] = 2
, it will create an empty dictionary in-place if cannot find with key “b”.Note that this behavior is different if you do:
In the above case, you are not making in-place
mutating func
call, hence, m won’t change.(Actually, thinking about it again, I don’t think that I explained well why for your case, it will mutate
m
, this does better job at it I think: https://github.com/apple/swift-evolution/blob/main/proposals/0165-dict.md)Happy to. First, Swift subscripts are like computed properties in many languages: They have get and set functions that you implement. The difference from a computed property is that a subscript’s two functions take an index argument too. So a single subscript assignment
x[1] = 2
is equivalent to a hypotheticalx.put(1, 2)
. A subscript readx[1]
is likex.get(1)
.Subscript assignment on a struct is like any other mutating function on a struct: it’s allowed only if the struct value is in a mutable variable. That applies to both the major and minor levels here.
We can think of your example case as:
In the end, the number of collection backing stores has either increased by zero, one, or two depending on whether anybody else still has a copy of the originals’ backing stores.
If m’s backing store was copied, then its items were shallow copied. Being collections, they each share their backing stores with the items in the original backing store of m.
I’ve pretended the optional nil case doesn’t exist just to keep it simple. Dictionaries use optionals for the result of a read, but arrays just bounds-check and crash. Maybe this example is more accurate if you imagine it’s the Array type instead.
As you would guess, this is just a naive implementation and optimizations are possible, such as the detection of unique ownership (perhaps
bigNestedMap
is a local that’s never used again). But in general, mutation should only affect your own view of the collection.The behaviour you describe reminds me structural sharing via path copying (See here).
A couple of questions:
Answering out of order:
1: Yes, I’d call it pretty similar to that!
3: Yes. Dictionaries are also structs in Swift, just like arrays and sets and other basic collections. All of them use the pattern I’m attempting to describe.
2: A collection type, being a struct (a value type) and not a class (not reference type), is allocated inline with its owner, or on the stack if it’s a local variable. That allocation must be a fixed size regardless of adding items to the collection. So the struct will have a property that is a reference (a fixed size pointer) to a buffer on the heap where items are stored. That buffer is what I’m calling a backing store. Each array or dictionary or set will have one of those containing its items, and when the collection is passed around, the struct value containing the reference is copied but the buffer is therefore shared. Only when a mutation would alter a buffer that’s referenced by more than one struct value is the buffer copied. That’s a shallow copy, so if it’s for instance an array of ints, the ints will be copied then. If it’s an array of arrays, the child arrays will be copied but their buffers are again shared.
Rather weird btw, that they still use the
[]
, unlike more modern languages…Do you have a language in mind you’re comparing to?