1. 0
  1.  

  2. 1

    @seantallen, my impression is that storm doesn’t guarantee that you will only process a piece of data once. How did you make the PUTs idempotent? Regarding the problem you ran into with high latency when you are talking to a lot of services which have low median latency, but not great latency after that, I think this talk by Jeff Dean describes pretty useful strategies for dealing with the problem.

    1. 1

      You can get exactly once sematics with Storm using Trident –> https://github.com/nathanmarz/storm/wiki/Trident-tutorial

      Anyway, that doesn’t really matter in this case anyway as the odds of us in the last bolt, persisting to couchbase and then failing to ack the rabbitmq message thereby removing it permanently from the work queue is negligible and that level of possible data ‘corruption’ is considered acceptable. should that ever become a problem, our solution would be to have a daily batch job to refresh the views while continuing to use storm to real time updates.

      The problem we were solving was heavy read times. Given that we have orders of magnitudes more reads than writes we decided to tackle using a denormalized view. Further, the amount of time it would take to do a read was effectively unbounded because it required more and more requests the more used a resource (in this case that means ‘applications to a job’) are. a moderately popular job posting could result in hundreds of reads under the old system. over and over, on each view of said page. That is a ton of work. Latency by individual services wasn’t the issue, lack of scalability due to that unbounded nature was.

      As to idempotency of puts, in theory, you could manage to apply to the same job twice but it is so unlikely that we don’t worry about it. Ee do validation checks at time of job application to verify that you haven’t already applied. if you have, we don’t create a new application, we just return info about the existing one. It is possible in theory to have 2 manage to get stored at the same time if we change data stores but at the moment, we have a constraint on the database table that guarantees only 1 per jobseeker and job. (There are additional in code checks above that, the db is the last resort enforcer of that).