Nice savings! I’m a little confused as to the use of PubSub for the import/export process though. IIUC Elasticsearch already has existing methods to migrate indexes. Would it have been even cheaper to use these methods and cut the PubSub cost out of the migration process?
JSON logs are likely to be very compressible as: 1) Logs share many of the JSON keys, which can be deduplicated (kubernetes.pod_name), 2) Values of common log fields might occur very often (kubernetes.labels.namespace)
Awesome. How small is each message that you’re compressing? If < 64 KB, consider using ztd in dictionary mode. You can create a dictionary on a corpus of representative messages, then use the dictionary during compression. zstd hence allows unusually good compression on small messages.
zstd is also great for larger blobs; smaller compression ratio than gzip, faster to compress, faster to decompress. It’s used with great success throughout many internal AWS services.
Nice savings! I’m a little confused as to the use of PubSub for the import/export process though. IIUC Elasticsearch already has existing methods to migrate indexes. Would it have been even cheaper to use these methods and cut the PubSub cost out of the migration process?
Awesome. How small is each message that you’re compressing? If < 64 KB, consider using ztd in dictionary mode. You can create a dictionary on a corpus of representative messages, then use the dictionary during compression. zstd hence allows unusually good compression on small messages.
zstd is also great for larger blobs; smaller compression ratio than gzip, faster to compress, faster to decompress. It’s used with great success throughout many internal AWS services.