You really don’t want to use Datomic. Doing history via stored procedures in PGSQL is nicer and more efficient.
I’ve gotten a really bad feeling about Datomic when I’ve looked at it before, but I’ve never actually read through enough docs/played with it enough to form an opinion. Would you mind listing why not to use Datomic? My guesses (and they’re just that) are that the performance ought to be atrocious, the amount of client-side logic should place a lot of load on network traffic and allow for utterly different behaviors between different client libraries, and scaling horizontally/sharding should be incredibly painful, but I emphatically do not know enough about the details to know whether these are valid concerns, or whether they’re addressed sanely elsewhere.
Datomic doesn’t change anything about how you scale because you still have a storage layer you’re writing to behind the transactor. What it does do is add unpredictable client caching behavior, query thrashing of the cache, and a slow-as-fuck transactor on top of whatever storage backend you’re using.
This on top of not having basic, obvious features any database should have like the ability to set a fucking timeout for a query.
Scale doesn’t matter if you’re 10,000x slower and less reliable than the competition, even though Datomic doesn’t actually do anything about scale.
Mostly when I hear about Datomic is either praise or FUD, but your concerns are very thoughtful.
The client-side logic problem is “solved” by only having one client, the JVM one. For any other languages you have to use a HTTP API.
I would love to see sharding, and in one case in particular where I’ve used Datomic, it would be dead simple since all my entities were structured under a “root” type entity (an event, like a conference etc) so one could shard on that. It is a bit annoying that if I have a long-running transaction for one event, it would block writes for all other events while it is processing, and I know that they do not share any data (read: do not need consistency). One could use multiple databases, but then you would have to juggle multiple HornetQ connections.
Hey, author here :) That sounds super interesting. When I’ve tried doing stuff myself, it was via transactions and a lot of manual work, where I typically ended up with a versioning scheme with strong ties to my table layout, where I really had to think about what I wanted to be versioned and not.. Do you have any more information about how one would go about doing it with stored procedures in PGSQL?
The SPs themselves aren’t interesting.
The trick to making JOINs not be insensibly slow is not to record single dates for events, but to use ranges. Make an update into an update/insert which caps off the valid_to of the previous state and inserts a new record that has a valid_from of now() and valid_to of lolwhenever.
Lots of fintech (including GS) companies use this approach happily.
But seriously, don’t use Datomic.
So if I understand you correctly, you end up running queries where you pass in a timestamp that is used for range queries so you only get the records where the timestamp is within from/to? So when you change a record, you copy it and give it the appropriate from/to range?
Pretty much. There’s deeper cleverness you can engage in here, but this solution was several orders of magnitude faster than Datomic with a trivial impl anyway.
With Datomic you get a snapshot of the entire database(relations included) as of any transaction time. I don’t think you can achieve this in PGSQL without knowing the database structure and it affecting all of the queries / subqueries you write. Also while history is a cool feature of Datomic it also has a pretty unique scaling model, adds schema back to NoSQL ideas, supports dynamic attribute creation, represents everything(including schema) as data and has a powerful query language(datalog).
Also while history is a cool feature of Datomic it also has a pretty unique scaling model, adds schema back to NoSQL ideas, supports dynamic attribute creation, represents everything(including schema) as data and has a powerful query language(datalog).
Why are you database-splaining the product to somebody who’s used it in production and written libraries for it?
It’s really slow and poorly engineered. NONE OF THE BUZZWORDS MATTER. None of them.
None. None at all. A poorly engineering product is poorly engineered no matter what the design is. Datomic is a labor-sink for a consulting company. We even tried to pay them to fix the obvious functionality gaps on contract and they wouldn’t do it.
Please do not reply to me with more of their marketing vomit.
I’ve used and designed a event-based stores and historical databases a few times throughout my career. Datomic was the worst I’ve used by far.
I would love to read more about your experiences! Do you have any blog posts or something around? If not, can you write some? :) And the more specific, the better!
Sad to hear, satomic and the datalog queries seem so interesting.
Was davidhampgonsalves supposed to assume you’ve used it in production and written libraries for it? I don’t get the snark.
That wasn’t marketing vomit, it was from my own experience with Datomic and also negating your claim that storing versions on rows in your tables getting you comparable functionality to that of Datomic (personal performance claims aside).
my own experience with Datomic
I built the backend to a LIMS from scratch (with coworkers, not alone) that went into production in a CLIA certified laboratory. We were legally obligated to overwrite/delete no data and be able to recall the history of anything that passed through our lab upon demand by inspectors.
That we used Datomic was 99% my fault, otherwise my coworkers wouldn’t have heard of it. Yes, fault. It was a huge mistake and I should’ve listened to my coworkers. We spent ~6 months after the initial build-out trying to paper over Datomic’s problems, including alternating between desperate begging and offering to throw money at Cognitect to fix their bullshit. That was when we realized the product was a labor dump for when they didn’t have contracts for all their people.
What’d you do?
Can I suggest you write up your experiences/problems calmly? In this whole thread you’re throwing a lot of anger, swearing and ranting around. I’d very much appreciate seeing a clear, detached, credible writeup of the problems in a blog post or similar, and would likely find that a lot more convincing.
Datomic has unlimited horizontal read scale due to the library executing the queries and immutable mirrors of data. I am not sure if PGSQL can do that, though I don’t doubt PGSQL will be faster in other cases.
Datomic has unlimited horizontal read scale due to the library executing the queries and immutable mirrors of data.
Yeah this is nonsense and doesn’t really matter because there’s still a storage backend you’re querying. The client cache is not a panacea. You’d be shocked how slow that shit gets when it keeps churning the client cache to troll the data.
You can’t even bounce the fucking client if it hangs on a stuck query (this happens a lot) without restarting the entire JVM.
I am not sure if PGSQL can do that, though I don’t doubt PGSQL will be faster in other cases.
Is there anything published on implementing some of the concepts that this article discusses?
Yeah, at least some of them:
Live merging of transaction log: BigTable, they call it a “merged view of the sequence of SSTables and the memtable” http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
One is linked in the article: The performance characteristics of OTLP databases, and how much overhead the locking etc has http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf
Info about how in VoltDB, transactions are queued and submitted wholesale instead of round-trip, can be found in many different places in their docs, here is one: https://docs.voltdb.com/UsingVoltDB/DesignProc.php