Smart meters in EU are - by mandate - moving to 15min intervals, so its actually already tighter than the 30min mentioned at the start of the talk. The meters themselves, if you add your own hardware to interface physically like we do, report every couple of seconds or more frequently.
I’m intrigued with how the TigerBeetle model could work here, I like their constraining the schema options, but I’m not sure how we would use it for our domain.. Problem being: the data we see isn’t “meter x reported y kWh at timestamp”. The data we see, which comes from the local distribution grid operators, is like:
“meter x reported y kWh at timestamp 1, quality level z”
“wait no, meter x reported a kWh at timestamp 1, same quality level”
days later
“ok ignore that, meter x reported b kWh at timestamp 1, quality level official”
months later
“wait, our mistake, that was actually meter p, the data you got before was for the apartment below, our technician wrote the meter ids down wrong, here is the data for meter x”
In other words: our data model is like an append-only ledger, but for each value we track four attributes: (meterid, value_timestamp, received_timestamp, quality). A series of rules around quality levels and receive order determines what the current value is for any given 15-min period and meter.
I imagine you’d model the quality levels as separate ledgers, but how would you track the extra timestamp?
TigerBeetle is not a general purpose database. It’s designed for financial transactions. I shared the video because I think this is inspiring engineering, not to suggest to use it for everything :)
Funnily enough i think i would use something inspired from a custom crdt. You “just” need to give more complex rules to the merge so that it takes the quality level under consideration.
Basically you build a merge algebra. The “distributed part” here is more around time than space. But it is still applicable imho.
And if distributed systems told us something, it is that time and space are quite linked in these systems ;)
I think that TigerBeetle made the right move to specialise their database, which gives them more control and performance. However, I do not agree that this is the path generic OLTP databases should follow.
OLTP DBs are complex, no question in that. Classic OLTP DBs like Postgres and MySQL can be tough to scale. Similarly to TigerBeetle, some companies decide to specialise and use much more scalable key-value stores. However, this is not a general trend and I don’t think it will be.
SQL has a huge ecosystem and allows pretty rapid development cycles. Sacrificing that is a big ask and I am not sure anybody would be up for that.
Rather than trying to fix the medium/abstraction layer, we should fix the tool itself. Postgres has a fantastic developer experience, but its design is ancient. In a world where backend developers often use coroutines, the thing still uses process-based parallelism.
There is a reason why companies like Neon and CockroachSB popped up saying “Postgres compatibility, but with proper scaling now”. I believe it is a move in the right direction.
Smart meters in EU are - by mandate - moving to 15min intervals, so its actually already tighter than the 30min mentioned at the start of the talk. The meters themselves, if you add your own hardware to interface physically like we do, report every couple of seconds or more frequently.
I’m intrigued with how the TigerBeetle model could work here, I like their constraining the schema options, but I’m not sure how we would use it for our domain.. Problem being: the data we see isn’t “meter x reported y kWh at timestamp”. The data we see, which comes from the local distribution grid operators, is like:
“meter x reported y kWh at timestamp 1, quality level z”
“wait no, meter x reported a kWh at timestamp 1, same quality level”
days later
“ok ignore that, meter x reported b kWh at timestamp 1, quality level official”
months later
“wait, our mistake, that was actually meter p, the data you got before was for the apartment below, our technician wrote the meter ids down wrong, here is the data for meter x”
In other words: our data model is like an append-only ledger, but for each value we track four attributes: (meterid, value_timestamp, received_timestamp, quality). A series of rules around quality levels and receive order determines what the current value is for any given 15-min period and meter.
I imagine you’d model the quality levels as separate ledgers, but how would you track the extra timestamp?
TigerBeetle is not a general purpose database. It’s designed for financial transactions. I shared the video because I think this is inspiring engineering, not to suggest to use it for everything :)
Yes, but exactly my use case was the example application they used in the beginning of the talk, hence my interest :)
Nice coincidence! You should ask Joran on Twitter or by email if that could work for your use case. He is usually very responsive.
Funnily enough i think i would use something inspired from a custom crdt. You “just” need to give more complex rules to the merge so that it takes the quality level under consideration.
Basically you build a merge algebra. The “distributed part” here is more around time than space. But it is still applicable imho.
And if distributed systems told us something, it is that time and space are quite linked in these systems ;)
I think that TigerBeetle made the right move to specialise their database, which gives them more control and performance. However, I do not agree that this is the path generic OLTP databases should follow.
OLTP DBs are complex, no question in that. Classic OLTP DBs like Postgres and MySQL can be tough to scale. Similarly to TigerBeetle, some companies decide to specialise and use much more scalable key-value stores. However, this is not a general trend and I don’t think it will be.
SQL has a huge ecosystem and allows pretty rapid development cycles. Sacrificing that is a big ask and I am not sure anybody would be up for that.
Rather than trying to fix the medium/abstraction layer, we should fix the tool itself. Postgres has a fantastic developer experience, but its design is ancient. In a world where backend developers often use coroutines, the thing still uses process-based parallelism.
There is a reason why companies like Neon and CockroachSB popped up saying “Postgres compatibility, but with proper scaling now”. I believe it is a move in the right direction.