1. 14
  1.  

  2. 22

    Am I the only one that:

    • loves the idea of datomic (especially about time/immutability)
    • want the creators to be compensated for their ideas and work
    • but still feel incredibly uncomfortable with the closed source nature of a database/datastore in the 21st century?
    1. 4

      Im with you. The typical model is open core, premium addons for enterprise. You can also license open-source software to enterprises. Many actually prefer to pay a company to be responsible for what they depend on. Finally, some are offering hosting or cloud containers for their solutions. And those are on top of usual support and service revenue for OSS.

      So, yeah, I think they could make it work profitably with core product being open source. Quite a few companies do. I can’t guarantee that, though. Proprietary is still the safest route for monetizing software.

      1. 3

        Look at all the free-software politics I’m not doing?

        1. 5

          Look at all the free-software politics I’m not doing?

          Code dumps are not the preferable solution for releasing open source, but they are still infinitely better than nothing.

        2. 2

          This is why I’m pretty excited about datahike. It might actually turn into an open source dataomic-like database.

          1. 2

            Very cool! I had seen datascript before, but it’s nice to see this address my point about feeling uncomfortable with a non-free database.

        3. 4

          On HN someone said: “This will be a tongue in cheek comment, but there’s another thing Datomic isn’t making you do either: GDPR compliance.”

          Immutable data stores are great but the world wants some level of mutability. I’ll link to the comment and responses if anyone is interested.

          1. 7

            I’m probably biased, but I think Datomic’s model of deletion is perfect for GDPR.

            When you delete something permanently, we call it “excision.” (As in “cutting out”.) After the excision, the data is gone, gone, gone. Any old storage segments that held the excised data get reindexed and garbage collected.

            But, we record the event of the excision as its own transaction. So there’s a permanent record of what got deleted (by the matching criteria) and when. And like any transaction, the excision transaction can have extra attributes attached like who did it and in response to what document number, etc.

            With any other database, once data is deleted, you don’t know that it ever existed and you don’t know who deleted it, when, or why.

            1. 1

              The link @mfeathers linked to says that excision is very expensive but it’s unclear what that means for use. Do you have any guidance on that?

              1. 1

                Excision does require a full scan of the log, plus a full index job. Depending on the size of your database that can take a while. Because this has to be done atomically, the transactor can’t do anything else while that excision runs.

                This is for the on-prem version of the product. I don’t know how the cloud version does it… it may be friendlier to throughput there.

                1. 2

                  The way you describe it seems like it could be so expensive as to not be viable in production.

                  1. 1

                    EDIT: Sorry for the wall of text… I wanted to say a bit more than “you have to design for it.”

                    I have seen installations where we had to get creative to work around 3 or 4 hour excision times. But I’ve also seen installations where it took a couple of minutes. But even on the low end, it requires design work to handle those delays.

                    There’s a cluster of related design techniques to achieve high throughput with Datomic. I’m still learning these, even after 6 years with the product. But it turns out that designing for stability under high throughput makes you less sensitive to excision time.

                    Mostly it comes down to the queue. Datomic clients send transactions to the transactor via a queue. (This is actually true for any database… most just don’t make the queues evident.) Any time you look at the result of a transaction, you’re exposed to queuing time. “Transaction” here specifically means changing data, not queries. Those are unaffected by excision or the tx queue.

                    I design my systems to start with a DB value that I capture at the beginning of a request. That means I freeze a point in time and all my queries are based at that point in time. This would be similar to a BEGIN TRAN with repeatable read isolation. Then while processing a request, I accumulate all the transaction data that I want to submit. At the end of the request, I make a single transaction out of that data so all the effects of the request happen atomically.

                    When I call the transact function, I get back a future. I pass that future off to an in-memory queue (really a core.async channel, if you’re a Clojurist.) A totally different thread goes through and checks the futures for system errors.

                    All this means that even if the tx-queue is slow or backed up, I can keep handling requests.

                    As a separate mechanism, I’m also exploring the idea of separating databases by the calendar. So like you’d roll a SALES table over each year and keep a history of SALES_2016, SALES_2017, etc. Since I can query across multiple databases quite easily, I can keep my working set smaller by doing that.

                    1. 1

                      All this means that even if the tx-queue is slow or backed up, I can keep handling requests.

                      Can you? For example, let’s say we have a request from the web that is updating my Tinder profile and we’re running an excise to remove old GPS coordinates and this takes 3 minutes. That means my request will hang 3 minutes, right? While you might technically be correct, from a UX perspective, you’re not continuing to handle requests. Or did I misunderstand your description? If I understand you correctly, if you were pitching this technology to me I would probably reject it. I can’t have multi-minute write outages in my super important most popular product ever.

                      like you’d roll a SALES table over each year and keep a history of SALES_2016, SALES_2017,

                      I haven’t used Datomic so maybe the model is so great putting up with things like this is worth it, but I do really dislike having to decide a sharding strategy (should I do years? months? weeks? how do I know? How expensive is it to change after I decide?). Certainly most databases have pretty miserable payoffs, though. Also, is excise just inside a DB or is it across all DBs?

            2. 5
              1. 1

                That was one of my points against blockchains due to encumberance pollution attacks repos. I had ideas for dealing with it but each had tradeoffs. Tricky paradox to address.

              2. 3

                Datomic is really great, I just want to use Go or python and those don’t work so well with with it iirc :(

                1. 6

                  And there’s you a startup idea. ;)

                  1. 1

                    When did you last try those? I’m asking because there’s a new “client” API that might make it easier to access.