1. 17
  1. 5

    A data warehousing approach is to store data in layers and preserve the raw sources as the initial source layer which is often files but you can use a database if you are lightweight about it (i.e. by storing just an id, some metadata, and JSON in a blob). Ideally with this you can rebuild your entire data warehouse from the source layer.

    Additional access layers are then created as needed so you can have views over multiple sources (i.e. a single view of all messages you have written with a common schema).

    1. 3

      I have a problem with this sentence:

      to access the data, implement bindings on your favorite programming language that extract the necessary bits in runtime

      Code ages. I am really not sure if my code in $language will still run 5 years down the road. It’s likely (except if I was using JavaScript..)

      On the other hand, the code I had hypothetically used to extract and put into a DB and then READ it, might also not work anymore..

      Overall I find myself neither agreeing or disagreeing a lot… has this really ever been a problem by anyone? Or is it just me who would put in into a DB and still keep the original data if it’s not too big? :) And if it’s too big, I might especially transform it to grab the stuff I was interested in…

      1. 1

        Thanks, I agree, data format resilience is a good point actually. Although if you kept raw data, hopefully the code wouldn’t stale that much that fixing would take long time.

        To be honest it’s hard to estimate how big of a problem it is – sadly (for me) not that many people are trying to export their data and meaningfully use afterwards. I’ve just seen projects trying to automate data imports and put them in a giant database, so I wanted to share my experience with it :)

        1. 1

          Yeah I’m interested in some stats (which I usually post in my end of year review blog posts), but it’s usually self-gathered stuff that can be easily tracked manually. I’m not interested in most data I produce or anyone produces about me.