Repeated string concatention is surprisingly difficult to optimize, and it doesn’t surprise me that these graphs show some of the engines chose to adapt their strategy once it gets large enough. I do have to wonder about the ones where the cost grows faster in the large-string case.
This would be much more straightforward if implementations could assume there would only ever be right-concatenation, which is the common case when building up a string of HTML or JSON. In that case, you can keep reallocating the same contiguous buffer, to twice the size each time. Left-pad is a pessimal task for implementations that have gone that route…
I’d be astonished if anyone can fully explain, without a reference, the performance characteristics of strings in C++, Java, Python, or PHP (languages where I’m confident that I have no idea :)). No points for explaining the Haskell String type, because the answer there is pathological but straightforward. :)
The post points out that the fast-linear approach results in high memory consumption, so I imagine those are implementations that switch to a more memory-efficient (with slower but more ) representation once strings get above a certain size.
I believe in Java Strings are just arrays of UTF-16 code units, so they have the performance characteristics of arrays (which are not simple, particularly if you’re asking about real-world performance where cache behaviour is very important, but they’re not simple is a way that’s not specific to Strings) in terms of the number of code units (which is as good a number as any to measure by, honestly - I mean no-one can talk about the performance of strings in terms of the number of grapheme clusters which is arguably the “correct” measure but also a really impractical one). Python is similar although they’re utf32 code units i.e. codepoints rather than utf-16 code units (and I bet the behaviour of Python’s sequences is much less specified than in Java). C++ I don’t actually know but I’d bet that the performance characteristics are all implementation-defined or worse.
Ah! Optimizing for memory consumption indeed explains that, thank you.
Thanks for the summary of Java and Python. :) I suppose that does answer the question. In a way it makes me wish I’d asked about allocators instead of strings, but that would have been much harder to answer since it’s almost certainly every (implementation, platform) for itself.
I agree with you, for sure, about code units being the appropriate way to talk about UTF-16 performance.
[Comment removed by author]
Knowing the algorithm tells you the asymptotic behaviour but that can be surprisingly useless - the constant factors from e.g. cache usage patterns can easily end up being multiple orders of magnitude (and you simply can’t understand those without understanding the details of the hardware - reasoning about the abstract Haskell/Ruby machine won’t tell you anything)
Have you ever switched to a custom sorting function because the one in your standard library was inappropriate?
Not personally. I would imagine that database implementations (including key-value stores) are the only software that needs to do this, these days. Knuth has an entire chapter on external sorts, optimized for different situations involving data that doesn’t fit in memory, but in the modern era we no longer write one-off storage backends with that level of complexity. I guess it’s nice to remember that computing can make real advancements sometimes…
If I know a sort is constrained somehow (i.e. a known interval of integers) I’ll use a radix sort or similar.
Radix sort is kinda amazing when one can get away with it. :)
O(n) sorting is so sweet but I’ve never hit the case where my data was simultaneously that constrained and sorting it was a bottleneck.
I’ve occasionally had that case (constrained data, sorting is a bottleneck), though not often. The last time I remember it was when I was doing some stuff with the Netflix Prize dataset, which is a giant set of movie ratings, each of them an integer 1-5. For some machine learning algorithms (e.g. decision trees), sorting the data by an axis is a bottleneck. Although even in that case, careful fiddling with data representation to be more cache-friendly made a much bigger difference to my runtime than radix sort did.
Yes. For suffix array construction, or when I require a stable sort and the standard library uses a non-stable sort.
The quadratic case can be easily avoided by using a non-dumb process to pick a pivot: random, median-of-three, etc.
The bigger problem with quicksort is usually that it’s not stable.
You should probably test this before throwing out the stdlib sort, though.
These days most, if not all, standard libraries use smarter versions of quicksort to eliminate the O(n^2) case, or at least make it a lot harder to hit. A lot of them don’t even use quicksort any more, but other algorithms (like timsort) with similar performance and guaranteed worst case O(n log n) behavior.
I know python, my go-to language, uses timsort. Timsort is complex enough that I don’t really know what is is doing under the covers at any point in time, however.
I wonder about variation in the length of the padded string, not just the padding. My intuition tells me that sometimes you’ll be adding a short prefix to a much longer string.
prefix = ch
while (++i < len)
prefix = prefix + ch
str = prefix + str
Still quadratic, but less obviously inefficient.