Looks like a solid paper, and analysis thereof. Some key insights:
when storing time series keys, you can save a lot of space by encoding them as a first timestamp followed by timestamp offsets (deltas)
when storing time series values, you can save a lot of space by realizing sequential data points tend not to be volatile… e.g. a “writes per minute” series is more likely to be 100, 99, 101, etc. than it is to be 100, 999999, 1, etc. – this means you can encode the current value by using XOR tricks with prior values to save space
together with in-memory caching for recent data that eventually persists to HBase or similar distributed filesystems, you have yourself real-time operational metrics that are fast and space efficient
Looks like a solid paper, and analysis thereof. Some key insights:
when storing time series keys, you can save a lot of space by encoding them as a first timestamp followed by timestamp offsets (deltas)
when storing time series values, you can save a lot of space by realizing sequential data points tend not to be volatile… e.g. a “writes per minute” series is more likely to be 100, 99, 101, etc. than it is to be 100, 999999, 1, etc. – this means you can encode the current value by using XOR tricks with prior values to save space
together with in-memory caching for recent data that eventually persists to HBase or similar distributed filesystems, you have yourself real-time operational metrics that are fast and space efficient