1. 20

  2. 8

    So many problems here. The really big one though: data and schema integrity. As an app developer, I can reasonably assume that I’m the only one messing with my database. The schema will be the one I added, the data will conform to the implicit assumptions of my app because my app is the one that wrote the data.

    This proposal trashes all that. The database goes from trusted store to “under user control and should be assumed malicious”. I’d have to check the schema was still the schema all the time and put lots of checks on data coming back from it. Or, I could just have my own database and not do any of those things.

    1. 3

      An easy way to do this would be to let the user perform read-only actions against your database. This is already doable for most mobile apps that use sqlite: you take the db, and query it with whatever you want.

      Admittedly, the OP wants both read and write privileges. If you think about it, this is no different from how the file system works today. Your program has no control over other programs running in your computer, and can’t prevent others from poking and screwing around with your files.

      I think shared databases could work with the caveat that there are several sandboxing mechanisms. For example, other apps can’t write to the same tables you’re writing, but only to isolated snapshots.

      If you squint, this could be no different than, say, a git repo, where multiple users can modify the same source by creating branches in the document tree. We already use this model for the source code of a program, so it’s not much of a leap to consider that we could also use this for the application data.

      1. 3

        If I (app developer) are the admin for the DB and can make sure the user only has read-only, sure. This however is much less useful, as the user loses the benefits of centralisation (e.g. single system to have to talk to). Also, I’ve then got to manage database connections as a fully secured service and make very sure that I’m careful about what users do/do not have access to at that level, as opposed to the more usual “DB access is wide open to anyone authorised and privileges are managed at app level” option.

        Alternately, if we have MyData services managing the “user is read-only” thing, I then have to make sure I trust all of them. Including the ones where the user themselves is the admin of the service, and then we’re back to square one.

        1. 2

          Access control in a DB is still easier than building out and supporting a custom REST API for 3rd parties to use. Plus, with a bit of work, an extension could be added to a Postgres or MySQL or whatever DB to allow untrusted connections that are sandboxed and granted minimal privilege, minimizing the chance of misconfiguration. Plus, it’s not like this is a totally unheard of idea – see, e.g. datasette which exposes read-only query access to a SQLite database and prevents DOSing by limiting the amount of time a query can run.

      2. 2

        Yes, this sounds like way more work for every single app developer than it would be to add export/export features or even a sync api to your app. A better solution (IMO) would be instead of hosting the data, host a hub for syncing data and schemas, and provide a metric fuckton of client libraries. But you’re still left with the problem of convincing app devs to add support for your thing, and on top of that convincing somebody to pay for it.

        1. 1

          I don’t buy this. First off, read only access on its own would be massively helpful. If apps did that, we wouldn’t have to deal with broken, crappy export tasks and could query the data however we want (without having to pull it down through the thin, brittle straw of a REST API). Massively helpful, with minimal support required by the original developers. I’m a huge fan of this – unfortunately it’ll never happen because most apps have a vested interest in user lock-in.

          It’s also possible to allow writes by putting the model code and integrity constraints in the database itself. If the model code were written in stored procedures, raw inserts/updates were disabled, and more stringent query time/access constraints were put in place than what is normally used for an app user, this could work. 99% of apps work off of one database anyway, so it’s not like it would put significantly more load on it. If there’s concern about bad actors DOSing the system with bad queries or data writes to fill up the disk, just sandbox each query, limiting the amount of time it can run or disk it can use before killing it.

          Technically, this approach makes a lot of sense, but once again, the reason it’ll never happen is because nobody will give up the keys to the kingdom. There’s no technical magic that makes Twitter/Facebook/YouTube/pick your social media platform worth sticking with – they just own your data. The business model is your data, so they’ll never let you have direct access to it.

          1. 3

            If there’s concern about bad actors DOSing the system with bad queries or data writes to fill up the disk, just sandbox each query, limiting the amount of time it can run or disk it can use before killing it.

            Hmm. I could learn enough c to patch postgres and hope my changes don’t break its data integrity guarantees (and setup database-level views/permissions so that users can’t read each others users private data), or I could focus on building something useful enough that someone feels motivated to help pay my bills.

            Hell, I don’t even trust my datastores being publicly routable, much less accessible - for all I know the connection handshake is a vector for abuse.

            the reason it’ll never happen is because nobody will give up the keys to the kingdom

            People build using the technologies and techniques they already know. As an industry we can’t even adopt exotic correctness techniques like “Get enough sleep and limit working hours”. Inertia is enough to explain this.

        2. 3

          I think there’s something to this. It’s easy to punch holes in new paradigms. The holes aren’t necessarily flaws – they’re only flaws if expectations remain constant.

          Writes to databases would benfit from software-enforced contracts mediating access such that data are of the correct types, sizes, and other constraints that we implicitly rely upon for our software to operate correctly.

          Schema changes would require a migration path from one versioned schema to another (including migrating the data to new schemas, and probablly an intra-schema sync for backward compatibility). Some features from the new schema would inevitably be unavailable to the old.

          There would be a multitude of difficult problems to solve, but if you looked at the path to our current software stacks from 1995, it would be easy to say that the vision of today is too fraught with complicated problems.

          1. 2

            That idea is pretty similar to what https://remotestorage.io/ tries to solve. One location with all your data for all your (web) apps to use…