1. 6
  1. 1

    I’m really curious how implementing a NoSQL database’s interface on top of a traditional RDBMS performs. Has anyone tried benchmarking FerretDB against MongoDB?

    1. 2

      I’ve heard of projects using JSON columns in Postgres extensively, using it effectively as a NoSQL database. Apparently it’s faster too, but I’d love hard numbers over hearsay I’ve heard.

      1. 3

        I am using Postgres JSONB extensively, I think, in practically all of the tables, (may be few that I am forgetting)

        Database and production usage are not significant in size to share benchmarks.

        Here are some rules I follow (may be this can help others) :

        In the system, usually a table contains several ‘classes’ of fields:

        • there are separate fields for things like: IDs, update_time, status, row-life-cycle

        • a JSONB field for access control (ac.). storing things like row owned by [list of user Ids], legal jurisdiction(s), if necessary, and a few other things. Those things generally help with zero-trust access control, by enabling our PEP (Policy enforcement points) to do row level filtering of data efficiently.

        • A row can belong to ‘operation’ data or to ‘model data’. A row in operation data category usually contains separate JSONB fields for each of the ‘entities’ that represent the ‘business relation’ a given row is representing. Operation rows, also, always have fields to enable sharding, and row-lifecycle indicator (a row can be active | can be archived | can be deleted). This kind of row-lifecycle indicator field allows us to tell the indexes to ignore ‘archived’ and ‘to-be-deleted’ rows – so that they do not pollute our indices. Sharding and row-lifecycle fields – have to be their own fields, not in JSONB.

        • Early in design I ran Postgres’s explain plans to make sure I can see all possible ‘table scans’. If I saw a table scan I would decide if I need to add a GIN [1] index to a specific nested JSONB field, or cache data on the application side, or ‘duplicate/denormalize’ a JSONb field into its own field (I do not remember I was ever forced to do that, though).

        • The database access is only through APIs so the generic ‘let me just fetch data any way I want’ – are not allowed. But the query APIs are reasonably ‘composable’ (in some critical areas), so as the system grows, the APIs (and therefore database access) do not need to ‘redesigned’ and ‘rechecked’ for performance, that often.

        • No triggers are allowed in the system. Postgres’s other features dealing with full text searches [2] applied to JSOB text fields are leveraged too. It works well.

        All in all JSONB feature, GIN indexes by now, seem to be very very mature, so I do not feel ‘unequipped’ compared to a document oriented database.
        I think a hybrid approach (where in one table JSOB and typical fields are used) is liberating, and efficient.

        [1] https://pganalyze.com/blog/gin-index [2] https://www.postgresql.org/docs/12/functions-textsearch.html

        1. 1

          I don’t have hard numbers but we have one of these at work. And by “one of these” i mean “most data is thrown into a single jsonb column on a single table.” Would not recommend that structure under any circumstances, but Postgres’ JSONB columns have generally been surprisingly good.

          However, good indices become important much faster than with a traditional table and you end up needing indices on derived fields (e.g. an index on cast(data->>‘foo’ as date) for date queries) much more frequently. Postgres has an index type that lets you quickly query for all rows that have a key present, so we end up (ab)using that a lot to filter result sets without needing a special index for every query.

      2. 0

        I believe the project was originally named MangoDB.

        1. 1

          yup it was