Usability and complexity are more existential threats to IPFS than any lack of privacy from content-addressing. Hashing the hash seems like just another tacked on afterthought
the Hypercore Protocol (formerly known as the Dat Protocol)
Oh. I guess a rename was needed because “dat” is too short and ungoogleable, but “hypercore” sounded a bit like one of those ponzi-coin names at first.
I’ve read the article and I’m still a bit confused. How does hash-of-a-hash solves privacy issue? If I have some content in CAS (hash -> content), someone can query CAS for hash and figure out that content is in there. Why would hash of a hash change things in any way? Is there something else hashed beside the original hash (like a password)? If yes, how does one then avoid having multiple copies of the same thing, something that CAS excels in?
Maybe all of this is answered in the original Hypercore protocol, but I couldn’t infer it from the article.
No, this flow doesn’t change by adding a second hash.
What changes is that now the hashes listed in the DHT are useless for retrieval (you need to know the first hash). So you can’t just read the DHT, check what someone is seeding, and get their files.
This article implies that the “innovation” is simply hashing the file hash, which provides precisely zero privacy. The attackers for distributed storage like this aren’t interested in “what files does dave have” they’re interested in “who is serving this file I want gone so I can attack them”
This is not fair to the article or the technology. Keeping others from knowing what files does dave have is a privacy dimension people care about. It may not be the dimension you most care about but it seems like an acceptable tradeoff to me. Content Addressable Stores can never prevent someone from finding out who has known content since it’s core to how they operate. If that is what you want you’ll have to use a different technology entirely. But if you want to keep people outside of your sharing group from knowing what you are sharing then this seems like it improves over the existing approach.
In what possible way does this “keep others from knowing what files you have”? You need to know the hashes in advance to know what files are being offered. Hashing those numbers again is not a barrier to this.
I’m assuming there is some other change that increases privacy, and I’m trying to find out what that is, because hashing the hashes can’t be it.
With a hypercore setup they can’t use the DHT to download your files and inspect them. If they are searching for a specific file that they already have then they can find out if you also have it. But they can’t just look at the list of DHT hashes for you and know then get copies of them to see what they are.
Per the OP article, the thing is, when the DHT is a list of hashes of hashes (“2nd order” hashes), but to retrieve a file you still need a regular hash (“1st order”), the DHT stops letting you discover new files. Stops being “who offers what hash and what hashes even exist”, starts being “assuming I know a hash, whom can I get it from”.
While this is a clever trick, it’s not an innovation. During the E era in the 90s, the object address hashes were called Swiss numbers or “swissnums” and were analogized to Swiss bank account numbers. Since leaking a swissnum is undesirable, swissnums were often referred to only by their hash, which was considered safe for export; these were called “exportnums”.
Usability and complexity are more existential threats to IPFS than any lack of privacy from content-addressing. Hashing the hash seems like just another tacked on afterthought
Oh. I guess a rename was needed because “dat” is too short and ungoogleable, but “hypercore” sounded a bit like one of those ponzi-coin names at first.
I’ve read the article and I’m still a bit confused. How does hash-of-a-hash solves privacy issue? If I have some content in CAS (
hash
->content
), someone can query CAS forhash
and figure out thatcontent
is in there. Why would hash of a hash change things in any way? Is there something else hashed beside the original hash (like a password)? If yes, how does one then avoid having multiple copies of the same thing, something that CAS excels in?Maybe all of this is answered in the original Hypercore protocol, but I couldn’t infer it from the article.
No, this flow doesn’t change by adding a second hash.
What changes is that now the hashes listed in the DHT are useless for retrieval (you need to know the first hash). So you can’t just read the DHT, check what someone is seeding, and get their files.
This article implies that the “innovation” is simply hashing the file hash, which provides precisely zero privacy. The attackers for distributed storage like this aren’t interested in “what files does dave have” they’re interested in “who is serving this file I want gone so I can attack them”
This must be some new meaning of “precisely” or “zero” that I wasn’t aware of.
This is not fair to the article or the technology. Keeping others from knowing what files does dave have is a privacy dimension people care about. It may not be the dimension you most care about but it seems like an acceptable tradeoff to me. Content Addressable Stores can never prevent someone from finding out who has known content since it’s core to how they operate. If that is what you want you’ll have to use a different technology entirely. But if you want to keep people outside of your sharing group from knowing what you are sharing then this seems like it improves over the existing approach.
In what possible way does this “keep others from knowing what files you have”? You need to know the hashes in advance to know what files are being offered. Hashing those numbers again is not a barrier to this.
I’m assuming there is some other change that increases privacy, and I’m trying to find out what that is, because hashing the hashes can’t be it.
With a hypercore setup they can’t use the DHT to download your files and inspect them. If they are searching for a specific file that they already have then they can find out if you also have it. But they can’t just look at the list of DHT hashes for you and know then get copies of them to see what they are.
No?? The DHT is literally a giant list of “who offers what hash”.
Per the OP article, the thing is, when the DHT is a list of hashes of hashes (“2nd order” hashes), but to retrieve a file you still need a regular hash (“1st order”), the DHT stops letting you discover new files. Stops being “who offers what hash and what hashes even exist”, starts being “assuming I know a hash, whom can I get it from”.
Thanks, this was the information I didn’t manage to grok from TFA.
I stopped after reading the first sentence.
Nothing wrong with them asking nicely.
While this is a clever trick, it’s not an innovation. During the E era in the 90s, the object address hashes were called Swiss numbers or “swissnums” and were analogized to Swiss bank account numbers. Since leaking a swissnum is undesirable, swissnums were often referred to only by their hash, which was considered safe for export; these were called “exportnums”.
Did the author or someone else open an issue or thread or something in some official IPFS venue (bugtracker? forum?) so as to discuss it officially?