That statement doesn’t make sense, because C has no notion of a “map” or a “key”. The handmade hash table in this code simply casts the pointer to a big-enough int and uses that as both the key and the hash code.
C, being portable to a fault, can be compiled to e.g. run on top of a Lisp runtime; in such cases, C canmalloc() (by allocating a Lisp object) but there is no notion of “memory address” (i.e. uintptr_t) that is accessible to C. Go, of course, simply doesn’t support those environments.
In practice, almost all modern systems have something like a linear memory space; even CHERI (with its capability-/bounds-carrying pointers) has a fairly-conventional-looking uintptr_t.
I wouldn’t have come up with the intrusive map idea to also amortize the temp allocations, that’s clever!
The solution I was thinking about before reading the whole post is:
Compute the length of the list, n, and allocate 2 arrays of n elements of type node*
Iterate over the input recreating the list with new nodes
To find if you’ve already made a new copy of an old node, use those arrays as a map: whenever you alloc a new node, append the old pointer to the first array, and append the new one to the second array. That way finding the index of an old pointer will allow you to find the new pointer by indexing the second array.
Example arrays for the map {old-a: new-a, old-b: new-b}: [ old-a olb-b ] and [ new-a new-b ]
From there, the problem becomes optimization of the map implementation. I would’ve stopped at that because the ideal data structure probably depends on what size of inputs you expect.
For large inputs, splitting the arrays into a structure you don’t have to allocate in a single go saves the initial traversal of the input. A trie is great since it also helps make the search for old->new nodes more efficient. And I was surprised that such little code was enough, and would love a more in depth explanation of it!
For smaller inputs, do the opposite: use a single alloc of 2*sizeof(node*) and split it yourself into 2 arrays. The idea being using an array will give better CPU cache utilization compared to a trie, and 1 less alloc might make a perceivable difference.
If you’re really motivated you can do both and choose the algorithm based on the input size! Like most languages’ standard sort functions do.
This mapping array also occurred to me. However, with “sparse” ref pointers (i.e., not every list entry actually has a pointer to another, or most list entries pointing to a single entry), that could be somewhat wasteful.
That’s some seriously dense code. Worthy of studying exactly what it does. It’s perhaps not immediately clear how it works, but very clever!
It feels like you could do some variant of Cheney copying or Mark-Sweep pointer reversal here.
The first approach under “Solution by temporary mutation” sounds similar.
Go lets you use pointers as map keys. It’s weird that C doesn’t guarantee the ability to do that. How likely is that problem in practice?
That statement doesn’t make sense, because C has no notion of a “map” or a “key”. The handmade hash table in this code simply casts the pointer to a big-enough int and uses that as both the key and the hash code.
C, being portable to a fault, can be compiled to e.g. run on top of a Lisp runtime; in such cases, C can
malloc()(by allocating a Lisp object) but there is no notion of “memory address” (i.e.uintptr_t) that is accessible to C. Go, of course, simply doesn’t support those environments.In practice, almost all modern systems have something like a linear memory space; even CHERI (with its capability-/bounds-carrying pointers) has a fairly-conventional-looking
uintptr_t.I wouldn’t have come up with the intrusive map idea to also amortize the temp allocations, that’s clever!
The solution I was thinking about before reading the whole post is:
n, and allocate 2 arrays ofnelements of typenode*To find if you’ve already made a new copy of an old node, use those arrays as a map: whenever you alloc a new node, append the old pointer to the first array, and append the new one to the second array. That way finding the index of an old pointer will allow you to find the new pointer by indexing the second array.
Example arrays for the map
{old-a: new-a, old-b: new-b}:[ old-a olb-b ]and[ new-a new-b ]From there, the problem becomes optimization of the map implementation. I would’ve stopped at that because the ideal data structure probably depends on what size of inputs you expect.
For large inputs, splitting the arrays into a structure you don’t have to allocate in a single go saves the initial traversal of the input. A trie is great since it also helps make the search for old->new nodes more efficient. And I was surprised that such little code was enough, and would love a more in depth explanation of it!
For smaller inputs, do the opposite: use a single alloc of
2*sizeof(node*)and split it yourself into 2 arrays. The idea being using an array will give better CPU cache utilization compared to a trie, and 1 less alloc might make a perceivable difference.If you’re really motivated you can do both and choose the algorithm based on the input size! Like most languages’ standard sort functions do.
This mapping array also occurred to me. However, with “sparse”
refpointers (i.e., not every list entry actually has a pointer to another, or most list entries pointing to a single entry), that could be somewhat wasteful.