1. 4

The follow up blog post is also interesting: http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/


  2. 2

    Interesting to see a mathematical take on this. I don’t think it is a practical improvement (yet) as it opens up a hard operational security issue (namely, that you need to prevent someone from running too many queries against the data).

    I would be curious to see a similar mathematical treatment for the related approach of choosing & storing a fuzz-factor for each update.

    That would remove the issue that running many queries can uncover true values. The new issue is then how to keep the data useful (the fuzz-factor could drift far from the true value if many updates come through). You could avoid the total fuzz-factor drifting too far by biasing the random factor - which should work as long as an adversary can’t trigger updates with handpicked values.