1. 7

  2. 4

    Fun problem!

    Tracking both edge/delta and level/state triggers for counts data seems a bit needless, maybe? When I have systems like this I try to reduce to just the level triggers. Each thing that manages connections could emit timestamped and labeled integers representing current counts to some central place, or into some hierarchy that feeds to a central place. Total counts by label are then the sum of the most recent integers, back to some deadline, matching that label. Failures can just be dropped.

    1. 2

      I’m not very familiar with the edge/level terminology. I found this blog post enlightening, and thought others might too: http://gengnosis.blogspot.com/2007/01/level-triggered-and-edge-triggered.html.

      This is a good point you make. In fact, we also count connections in the way you describe for a different use case: to calculate usage for billing/limiting purposes.

      I wasn’t the one who designed the system described in the blog post, but I assume it grew organically from a simpler system (like the one I described at the start of the post). I think this organic evolution probably explains the deltas+cleanup approach. It may not be the simplest/elegant design, but the current implementation has worked well for us in practice.

      I’m trying to think of a downside of the approach you suggest. The only things that comes time mind is there will be some lag between the connection counts on the nodes and the aggregated sum. There might also be some wasted events if the counts don’t change frequently. I think in practice these would not be significant issues for the requirements of our system.