1. 29
  1.  

  2. 15

    For anyone curious, some other options for storing graph/tree data are:

    1. 1

      I also just learned about https://age.apache.org/ from the comments on the original post!

    2. 10

      I think this kind of demonstrates exactly why people don’t want to use Postgres as a graphdb (although I have advocated to many that they may want to). JOINs get complicated really quickly and it’s just not a very natural expression.

      Consider in GraphQL a “join” is just {

      Instead of:

      SELECT id, data
      FROM nodes
      JOIN edges ON nodes.id = edges.next_node
      WHERE edges.previous_node = 1;
      

      One just writes:

      {
          id,
          edges {
              id
          }
      }
      

      edges is a join. We’re saying “for the top level object expand the edge edges to give us the next level objects”. We don’t have to think of a join key or even that a join is happening at all, the relationship is very native.

      Graph query languages tend to make this joining first class.

      This is why JanusGraph, or Hasura, or Prisma, or others, have an appealing model. It basically creates a graph language frontend that plugs into your database backend.

      1. 1

        I don’t believe the target audience for this post is graph query language users, rather graph query language authors who might implement on top of Postgresql.

      2. 8

        He doesn’t mention performance implications at all, especially for large (>1m) datasets and I have no idea. Can someone comment on that?

        1. 2

          Not very good. You’d need to add 3 to 6 indexes at the very least, see https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html (SPO can be seen as a ENE, edge-node-edge).

          1. 2

            Node-edge-node, of course, not edge-node-edge!

            1. 2

              Isn’t a graphdb going to build those same indices?

              1. 2

                Sure, but the approach in this submission on top of Postgresql didn’t create such indices, hence my scepticism regarding large dataset performance.

          2. 8

            I love that this topic keeps reappearing, it’s like people discovering array or stack languages. People discovering neat things (even if it’s not new to you) and engaging anyone that sits still for too long should be encouraged.

            1. 5

              I’d like to point out that you don’t have to store the edges in a dedicated edge table to traverse the graph using recursive queries. Using a typical foreign key in the “node” table will do perfectly well.

              1. 4

                Sure, you can represent a graph in postgres. You can even run a very simple BFS. But that’s pretty much where it ends, unless you start writing procedural postgresql code.

                For starters, the given BFS implementation will run until it’s completely exhausted the graph. To limit the reach of the search you have to use window functions, which will return incorrect results for any graph that contains loops. (i.e. it’s not just a DAG)

                Need to find the path between two points in the graph? Can’t be done with normal SQL.

                Need heuristics? Best-first search? A*? Bidirectional? SQL can’t do any of that. You’ll have to write procedural code and implement them yourself.

                Just to be clear, I think it’s very valid to use postgresql to hold graph structures, if you have a library or plugin that handle these features for you. But either way, if your problem is graph heavy, using a graph database is probably a much better choice.

                1. 3

                  Also Apache AGE (A Graph Extension) provides for defining graphs in PostgreSQL and querying with Cypher or a mix of Cypher and SQL. I’ve not used this but it’s on my list to at least try it out.

                  1. 4

                    I’ve played with this on multiple occasions. It feels clunky and unnatural in comparison to plain CYPHAR as you’d expect. It also moves very slowly and is missing a lot of useful features. Can’t see it changing anytime soon tbh.

                  2. 2

                    PG also has Ltree extension (trees not graphs though)

                    https://patshaughnessy.net/2017/12/13/saving-a-tree-in-postgres-using-ltree

                    Works well, at least for relatively static data (I used it for taxonomies)

                    1. 1

                      I’ve said this elsewhere, but fun Neo4j lore is that Johan originally wrote it with Postgres as the storage engine! Just a little Java lib for doing graphs in PG, now all grown up :)