tl;dr find strongly connected components, hash all nodes within a component (I don’t know how the author expects this to work deterministically), then form a hash tree of the components
If this is actually intended for a social graph as the example suggests then I’m afraid it will very quickly degenerate into a case where almost everyone is part of the same SCC.
It sounds like Data.Graph gives that determinism:
Note that the <+> operator is not commutative, so the order in which we do the fold is important. Fortunately, the Data.Graph library takes care of that for us. The nodes returned in a component are sorted in the order we need.
Well, conceptually a graph is a set of nodes and a set of two-tuples representing edges. But in code the sets are probably represented as list (with fixed iteration order).
So I guess you have “determinism” across runs of the program because the input doesn’t change. But if you provided the same graph with the nodes in a different order then I think the output of the ‘flatten’ operation will change as well.
What is the “canonical” order of nodes in a SCC anyway?
I just realized, while you can’t find a canonical order for nodes, you can very well find one for hashes over the primitive fields. They are just integers. You can then replace all references by indices into this sorted list and then hash the whole thing.
I kinda found the implementation, in all its gory details, detracted from the simplicity of the concept here.
I appreciate the effort that went into the post — a lot! — but the idea really got lost in the Haskell here, imo.
Why is there a component step? The process could be to take the primitive fields and generate the primitive hash of each node. In the dependent fields, each node refers to the primitive hash of the nodes it depends on. The primitive fields (or hash) and dependent fields are hashed together to create the dependent hash. Only the dependent hashes are available to the outside system.
I’ve been mentally doodling boxes and arrows for a minute without spotting a way this would fail to capture a change or introduce spurious changes. But I am not a crypto expert and maybe there’s a neat corner case I’m missing someone could suggest?
Consider these two graphs:
G1: A = (1,2,B), B = (3,4,C), C=(5).
G2: A = (1,2,B), B = (3,4,C), C=(999).
In the dependent fields, each node refers to the primitive hash of the nodes it depends on.
A’s dependent field shall now refer to B via B’s primitive hash which is the same in G1 and G2.
So your dependent hash of A will be the same even though a resource reachable from A has changed.
Yep, that punctures it, thanks for the counter-example. :)