A bit of context: Microsoft developed PhotoDNA to identify illegal images like CSAM – NCMEC maintains a database of PhotoDNA signatures, and many companies use this service to identify and remove these images.
Microsoft claims:
A PhotoDNA hash is not reversible, and therefore cannot be used to recreate an image.
This project shows that this isn’t quite true: machine learning can do a pretty good job of reproducing a thumbnail-quality images from a PhotoDNA signature.
There has been some previous posts about PhotoDNA, reverse engineering the algorithm and claiming that it is reversible. But there was no public demonstration of this as far as I know.
I haven’t tried it, so I can’t say for sure. Perceptual hashes basically have to leak some information about their input (because they are set up so that d(x, x') small => d(h(x), h(x')) small). But NeuralHash uses a CNN and outputs a 96-bit hash, while PhotoDNA computes image statistics and outputs a 1152 bit hash, so I expect NeuralHash hashes would reveal less information and be harder to invert.
A bit of context: Microsoft developed PhotoDNA to identify illegal images like CSAM – NCMEC maintains a database of PhotoDNA signatures, and many companies use this service to identify and remove these images.
Microsoft claims:
This project shows that this isn’t quite true: machine learning can do a pretty good job of reproducing a thumbnail-quality images from a PhotoDNA signature.
There has been some previous posts about PhotoDNA, reverse engineering the algorithm and claiming that it is reversible. But there was no public demonstration of this as far as I know.
That’s interesting. Does that imply Apple’s NeuralHash is also reversible to some extent?
I haven’t tried it, so I can’t say for sure. Perceptual hashes basically have to leak some information about their input (because they are set up so that
d(x, x') small => d(h(x), h(x')) small
). But NeuralHash uses a CNN and outputs a 96-bit hash, while PhotoDNA computes image statistics and outputs a 1152 bit hash, so I expect NeuralHash hashes would reveal less information and be harder to invert.