The GC needs twice as much memory to copy all live data from one half to another “empty” half during the copying phase. So any Haskell program will actually require at least twice as much memory as you actually use.
This was a surprising detail to me. If a Haskell program is using 2Gb of RAM is 1Gb really sitting there unused most of the time?
This is standard for anything using a copying GC. But anything GC’d (in the scanning sense, let’s ignore ARC) in general will necessarily have a higher high water mark than anything that immediately returns memory for reuse.
Essentially the high water mark for a non-GC environment is live objects + allocator caches, vs a GC environment which is live objects + uncollected dead objects + allocator caches (let’s just consider the to space to be a cache).
Consider the following trivial example of where you hit problems (this is intentionally an extreme/worst case to demonstrate this specific issue)
loop 10000 times:
value = some allocation
do enough stuff to prevent lowering....
release the allocation or stop using it if using GC
I’ve intentionally called out prevention of lowering, as that’s in principle available to GC and non-GC languages, and also not helpful for this example :D
For a non-gc environment there is typically some variation of a free list (we’ll just ignore the management of size classes, etc for now)
allocate:
if free list is empty:
perform system allocations and populate the free list
result = remove entry from the free list
return the result
deallocate(value):
insert value into the free list
This means that what the allocator ends up doing is simply producing the same piece of memory over and over again - a literal free list implementation would simply be seeing freelist.push(value) on dealloc followed immediately by value = freelist.pop() so after the initial system allocation the memory use is constant.
A GC’d environment in the general case doesn’t have the benefit of knowing immediately that an object (in the memory sense) has become dead, so there is no deallocate function, instead the allocate function looks like (in theory):
allocate:
deallocate all the dead objects
if there's no free memory:
get more memory from the OS
return some part of the available memory
This does work, and would match the memory usage of the explicit allocate/deallocate environment. The problem is in the deallocate all the dead objects part. This requires finding all the live objects in the heap, which is not only more expensive that simply removing something from a free list but in the general case scales with the live heap size. e.g.
some list = empty list
loop N times:
insert a new object into the list
Would become an O(N^2) algorithm, which is simply not tenable. So instead collection is delayed so we can amortize the cost of scanning (finding the still live objects), that deferral means that the high water memory usage of a GC environment is inherently going to be higher than an environment with explicit management of memory liveness.
This is part of what leads to GC having a much stronger heap size vs. cpu time coupling as well - the higher the proportion of live objects in the heap, the longer collection takes, and the more frequently collection is needed. Things like generational collection work by devoting a portion of the total heap to short term allocations, courtesy of the generational hypothesis the proportion of live memory in the youngest portion of the heap is generally much lower than overall so collection of that region is fast. That lets you have a potentially smaller heap but also have a faster collection phase as you’re scanning fewer objects in the common case.
Nice article and timely as I’m just playing around with a parser in Haskell and just had a problem with memory consumption.
I think this jumps to recommending bang patterns too quickly. I’m just a dabbling Haskeller but I feel like a better solution to the bang pattern example would be using foldr instead of the recursive function call.
add :: [Int] -> Int
add = foldr go 0
where
go :: Int -> Int -> Int
go x acc = acc + x
That’s basically what I did to solve the memory consumption in my parser. It’s clearer too. I think I first wrote the parser before I really understood how to use folds so now I have to re-write a lot of it using foldr.
Note that if the function that combines the accumulated value with each element is strict in the accumulator, other than a possible improvement in the constant factor, you get the same O(n) space cost as with just foldr.
If you want a strict right fold in constant space, you need a structure that supports faster than O(n) access to the right-most element, such as Seq from the containers package.
Use of this function is a hint that the [] structure may be a poor fit for the task at hand. If the order in which the elements are combined is not important, use foldl’ instead.
so the document says you still have O(n) space in this function.
I can’t pretend I really understand it. I just figured my manual recursion was causing problems and then decided on foldr because of https://wiki.haskell.org/Foldr_Foldl_Foldl'#Rules_of_Thumb_for_Folds. It did seem to solve the memory problems in my program. Maybe it was just a fluke or I’ve deferred them somehow.
That is exactly the problem with Haskell, and the exact reason OP is even a thing. I guess in simple cases, GHC’s strictness analysis is good enough to avoid space leaks. But once there are a few lazy data structures, the monads you lego together become perfect landfills for thunks, so much so that GC kicks in 90% of the CPU time.
I actually didn’t know about GHC’s strictness analysis. I’ll look into that, thanks.
I’ve dabbled with Elm and Purescript as well, which are eagerly evaluated, but somehow even with these problems and all the other warts I’m still gravitating towards Haskell for recreational programming.
When I was learning Haskell (I never finished learning), the biggest problem I had was space leaks, and I literally littered bangs everywhere. I don’t have the courage to revisit my old programs, but if I remember correctly, I used all those State monad, Writer monad, IORef, STRef, Maybe, …
I really wonder if those types were your problem or it was more about how you did recursion (see my other comment about how I failed to do it properly). Recommending not to use Maybe seems a bit overkill to me. It’s one of the really nice things about Haskell. Maybe a more experienced Haskeller could comment?
This was a surprising detail to me. If a Haskell program is using 2Gb of RAM is 1Gb really sitting there unused most of the time?
This is standard for anything using a copying GC. But anything GC’d (in the scanning sense, let’s ignore ARC) in general will necessarily have a higher high water mark than anything that immediately returns memory for reuse.
Essentially the high water mark for a non-GC environment is live objects + allocator caches, vs a GC environment which is live objects + uncollected dead objects + allocator caches (let’s just consider the to space to be a cache).
Consider the following trivial example of where you hit problems (this is intentionally an extreme/worst case to demonstrate this specific issue)
I’ve intentionally called out prevention of lowering, as that’s in principle available to GC and non-GC languages, and also not helpful for this example :D
For a non-gc environment there is typically some variation of a free list (we’ll just ignore the management of size classes, etc for now)
This means that what the allocator ends up doing is simply producing the same piece of memory over and over again - a literal free list implementation would simply be seeing
freelist.push(value)
on dealloc followed immediately byvalue = freelist.pop()
so after the initial system allocation the memory use is constant.A GC’d environment in the general case doesn’t have the benefit of knowing immediately that an object (in the memory sense) has become dead, so there is no deallocate function, instead the allocate function looks like (in theory):
This does work, and would match the memory usage of the explicit allocate/deallocate environment. The problem is in the
deallocate all the dead objects
part. This requires finding all the live objects in the heap, which is not only more expensive that simply removing something from a free list but in the general case scales with the live heap size. e.g.Would become an O(N^2) algorithm, which is simply not tenable. So instead collection is delayed so we can amortize the cost of scanning (finding the still live objects), that deferral means that the high water memory usage of a GC environment is inherently going to be higher than an environment with explicit management of memory liveness.
This is part of what leads to GC having a much stronger heap size vs. cpu time coupling as well - the higher the proportion of live objects in the heap, the longer collection takes, and the more frequently collection is needed. Things like generational collection work by devoting a portion of the total heap to short term allocations, courtesy of the generational hypothesis the proportion of live memory in the youngest portion of the heap is generally much lower than overall so collection of that region is fast. That lets you have a potentially smaller heap but also have a faster collection phase as you’re scanning fewer objects in the common case.
Nice article and timely as I’m just playing around with a parser in Haskell and just had a problem with memory consumption.
I think this jumps to recommending bang patterns too quickly. I’m just a dabbling Haskeller but I feel like a better solution to the bang pattern example would be using
foldr
instead of the recursive function call.That’s basically what I did to solve the memory consumption in my parser. It’s clearer too. I think I first wrote the parser before I really understood how to use folds so now I have to re-write a lot of it using
foldr
.In the document, https://hackage.haskell.org/package/base-4.17.0.0/docs/GHC-List.html the only place it mentions space usage is in
foldr'
, and it saysso the document says you still have O(n) space in this function.
I can’t pretend I really understand it. I just figured my manual recursion was causing problems and then decided on
foldr
because of https://wiki.haskell.org/Foldr_Foldl_Foldl'#Rules_of_Thumb_for_Folds. It did seem to solve the memory problems in my program. Maybe it was just a fluke or I’ve deferred them somehow.That is exactly the problem with Haskell, and the exact reason OP is even a thing. I guess in simple cases, GHC’s strictness analysis is good enough to avoid space leaks. But once there are a few lazy data structures, the monads you lego together become perfect landfills for thunks, so much so that GC kicks in 90% of the CPU time.
I actually didn’t know about GHC’s strictness analysis. I’ll look into that, thanks.
I’ve dabbled with Elm and Purescript as well, which are eagerly evaluated, but somehow even with these problems and all the other warts I’m still gravitating towards Haskell for recreational programming.
When I was learning Haskell (I never finished learning), the biggest problem I had was space leaks, and I literally littered bangs everywhere. I don’t have the courage to revisit my old programs, but if I remember correctly, I used all those
State
monad,Writer
monad,IORef
,STRef
,Maybe
, …I really wonder if those types were your problem or it was more about how you did recursion (see my other comment about how I failed to do it properly). Recommending not to use
Maybe
seems a bit overkill to me. It’s one of the really nice things about Haskell. Maybe a more experienced Haskeller could comment?