I can’t tell if this is parody. Immutable collections need very careful thought to avoid quadratic behaviour. Trees can be made immutable fairly easily with spine-replacement and reference counting, but arrays require a deep copy. The recommendation to use String instead of StringBuffer will cause every modify operation to be linear in the size of the strings (a + b is linear in the length of a and b) and requires new allocation and increases GC pressure. For a lot of use cases, that’s fine - it’s way off your critical path and no one cares - but if it’s anywhere near a performance-critical portion then it can be very bad. It can also lead to DoS vulnerabilities if the attacker can control the rate of updates. If I can force you to append one character at a time to a string, rather than doing a bulk update, then I can probably make you spin in the GC and drop connections. Any time that setting the TCP window size to 1 is all that’s needed for a DoS, you might need to rethink your abstractions.
When talking about functional programming languages, there’s a difference about “zero mutations in memory” and “language interface presents as immutable”. You can present the language constructs as immutable, but when actually operating on them, you merge the operations together and if possible, just reuse the same memory space. Essentially that makes your language easy to rewrite into SSA form, which you then can run optimizations on. The motivation here is to essentially make programming simpler.
This is true for a language like Haskell, but something like Scala is a lot more constrained by the need to run on the JVM. The JVM type system isn’t expressive enough to allow you to check for aliasing and so you need to make copies defensively in a lot of cases.
something like Scala is a lot more constrained by the need to run on the JVM. The JVM type system isn’t expressive enough to allow you to check for aliasing and so you need to make copies defensively in a lot of cases
Surely that’s only significant at the edges, when doing java FFI? (Granted, a big part of the draw of targeting the JVM is that you get to take advantage of its optimiser and ecosystem.)
The JVM is a very dynamic environment. It allows class loading, reflection, and a few other things that make static analysis hard. You encounter these at any class (or, often, method) boundary. You can do things like spine replacement in tree data structures, but unless you can statically prove that an object never escapes to another thread (which, with the Java type system, is almost impossible once you’ve stored a pointer to it on the heap) then you must use atomic for the updates and you can’t lazily thunk on traversal. In a language like Haskell, you have a lot more freedom to do this kind of thing because the heap type system lets your reason about concurrent mutation (I.e. there normally isn’t any unless the VM inserted it, and in this case the VM can trivially track where it will happen and for which types). If you have a type system that guarantees no concurrent mutation then you can do a lot of optimisations that you can’t with a more general type system.
So why shouldn’t collections also be immutable by default, just like String?
Because basically everyone needs to be able to put string keys into a hashmap sometimes, whereas putting any other kind of collection into a map key is sort of weird.
People find something that seems to work well in one context, and then rush to advertise the universal brilliance of the same.
Clojure is nice. It’s (pretend) immutable collections are well done, and not as horribly inefficient as a naive implementation of the same. But it’s still dramatically slower than using Java mutable collections.
On the other hand, Clojure wouldn’t be closure without its persistent data structures, so the trade-off is quite reasonable, within the overall context. However, that does mean that when max performance is a critical goal, Clojure (and FP in general) are automatically eliminated from a significant set of problem domains.
I like both models. I don’t generally have any perf issues with functional programming (although I know that I am leaving something on the table), and I don’t generally have any “mutation problems” with non-persistent collections.
People often prefer one over the other. Most probably whatever they’re used to.
I’m not sure that getting rid of mutable structures will help many people avoid bugs, but I accept that possibility. The real issue is that many developers do not understand concurrency (whether or not they think that they do) – which is a valid argument for the use of persistent data structures.
I can’t tell if this is parody. Immutable collections need very careful thought to avoid quadratic behaviour. Trees can be made immutable fairly easily with spine-replacement and reference counting, but arrays require a deep copy. The recommendation to use
String
instead ofStringBuffer
will cause every modify operation to be linear in the size of the strings (a + b
is linear in the length ofa
andb
) and requires new allocation and increases GC pressure. For a lot of use cases, that’s fine - it’s way off your critical path and no one cares - but if it’s anywhere near a performance-critical portion then it can be very bad. It can also lead to DoS vulnerabilities if the attacker can control the rate of updates. If I can force you to append one character at a time to a string, rather than doing a bulk update, then I can probably make you spin in the GC and drop connections. Any time that setting the TCP window size to 1 is all that’s needed for a DoS, you might need to rethink your abstractions.When talking about functional programming languages, there’s a difference about “zero mutations in memory” and “language interface presents as immutable”. You can present the language constructs as immutable, but when actually operating on them, you merge the operations together and if possible, just reuse the same memory space. Essentially that makes your language easy to rewrite into SSA form, which you then can run optimizations on. The motivation here is to essentially make programming simpler.
This is true for a language like Haskell, but something like Scala is a lot more constrained by the need to run on the JVM. The JVM type system isn’t expressive enough to allow you to check for aliasing and so you need to make copies defensively in a lot of cases.
Surely that’s only significant at the edges, when doing java FFI? (Granted, a big part of the draw of targeting the JVM is that you get to take advantage of its optimiser and ecosystem.)
The JVM is a very dynamic environment. It allows class loading, reflection, and a few other things that make static analysis hard. You encounter these at any class (or, often, method) boundary. You can do things like spine replacement in tree data structures, but unless you can statically prove that an object never escapes to another thread (which, with the Java type system, is almost impossible once you’ve stored a pointer to it on the heap) then you must use atomic for the updates and you can’t lazily thunk on traversal. In a language like Haskell, you have a lot more freedom to do this kind of thing because the heap type system lets your reason about concurrent mutation (I.e. there normally isn’t any unless the VM inserted it, and in this case the VM can trivially track where it will happen and for which types). If you have a type system that guarantees no concurrent mutation then you can do a lot of optimisations that you can’t with a more general type system.
Clojure has a talk on how they do this without breaking the (memory) bank.
Nice catch on immutability as a vulnerability causing OOM.
Because basically everyone needs to be able to put string keys into a hashmap sometimes, whereas putting any other kind of collection into a map key is sort of weird.
People find something that seems to work well in one context, and then rush to advertise the universal brilliance of the same.
Clojure is nice. It’s (pretend) immutable collections are well done, and not as horribly inefficient as a naive implementation of the same. But it’s still dramatically slower than using Java mutable collections.
On the other hand, Clojure wouldn’t be closure without its persistent data structures, so the trade-off is quite reasonable, within the overall context. However, that does mean that when max performance is a critical goal, Clojure (and FP in general) are automatically eliminated from a significant set of problem domains.
Very rarely is that sort of performance worth the costs of missing mutations and dealing with that class of bugs.
Sometimes it is though.
I like both models. I don’t generally have any perf issues with functional programming (although I know that I am leaving something on the table), and I don’t generally have any “mutation problems” with non-persistent collections.
People often prefer one over the other. Most probably whatever they’re used to.
I’m not sure that getting rid of mutable structures will help many people avoid bugs, but I accept that possibility. The real issue is that many developers do not understand concurrency (whether or not they think that they do) – which is a valid argument for the use of persistent data structures.