1. 2
  1.  

  2. 5

    This post has a lot of huge claims, and I have no reason to doubt them. I just do not know. Has anyone here used timescaledb that can share their experience?

    1. 4

      As Postgres continues to mature, there are two main features of timescaledb that have endured. The biggest one is autopilot around partitioning; if you know you are going to have a large dataset and it has cardinality around timeseries, timeseriesdb out of the box does a lot for you by intelligently partitioning your data (“hypertables” and “chunks”) including configurable retention policies. Without it, you need to figure out how to set up and maintain partitions and clean up old data which is feasible on an RDBMS but is work and you’ll likely never reach the level of effort Timescale has put into their approach. The other thing is extensions to SQL to make certain timeseries analysis easier like downsampling, gapfilling, etc. That is further extended with the timescaledb-toolbox extension.

      I think they overplay compression a bit, this can be done on other DBs relatively well with block level compression (ZFS, btrfs). It should be more efficient to do at the data level, but it’s not really a game changer IMO.

      The other thing is managed distributed DBs like Cloud Spanner and Aurora eat into some of this space as well. Basically, there the storage mechanics become someone else’s problem.

      1. 6

        Data compression can be done a lot better when compression knows it’s dealing with timeseries data. 90% might not be realistic for rapidly changing data, but delta compression can really bring those gains when it changes relatively slowly, especially when combined with other general purpose compression algorithms.

        1. 1

          With regards to compression, TimeScaleDB details on this:

          https://www.timescale.com/blog/building-columnar-compression-in-a-row-oriented-database/

          …TimescaleDB achieves these compression rates by deploying best-in-class algorithms for compressing various types of data. We employ the following algorithms (and will allow users to choose the algorithm in future releases): Gorilla compression for floats Delta-of-delta + Simple-8b with run-length encoding compression for timestamps and other integer-like types Whole-row dictionary compression for columns with a few repeating values (+ LZ compression on top) LZ-based array compression for all other types We extended Gorilla and Simple-8b in order to handle decompressing data in reverse order, which allows us to speed up queries that use backwards scans. For super technical details, please see our compression PR.

          I do not think the above is possible with block-level compression. I also would think they use columnar compression (most column oriented DBs use that) for low cardinality data ranges (replacing ‘verbose’, but low cardinality values with numbers and then mapping back to the actual values as needed)