1. 6

Information has become a key commodity for most service providers. Analyzing streams of data efficiently, in real time, has become increasingly more important for supporting new products and applications.

This paper outlines a novel abstraction for perform- ing incremental stream processing based on Computational Conflict-free Replicated Data Types. C-CRDTs are repli- cated objects that can be updated concurrently without co- ordination to perform a computation and still converge to a consistent state that reflects all contributions.

Results obtained with a preliminary prototype show that C-CRDTs have the potential to match and improve compu- tational throughput when compared with a state of the art stream processing system.

  1.  

  2. 1

    I wonder how these could tie in w/ Resiliant Distributed Datasets in Spark? It seems as though they could have addressed some of the possible improvements mentioned in section 2.

    I’m also curious how C-CRDT’s clean themselves up over time. Maintaining state monotonically will present some memory issues. Could TTLs be set, or could garbage collection be triggered somehow?

    Is it just me, or is the bolt/spout terminology in section 6 (Word Count) backwards?