1. 2

    Aggressive use of hypermedia pays too much in terms of traversal cost. Aggressive use of GraphQL would require the implementation of a graph database engine at the endpoint of every GraphQL-enabled service, bringing in even less tractable issues of traversal and query optimization in a situation where that’s really not what you need.

    1. 3

      you don’t need a graph database, PostgreSQL can do just fine

      1. 1

        Or you can have an intermediate tier that sources data from existing APIs (like Falcor). The data source is by-the-by as long as you have something that can parse and run GQL queries.

        I’m not sure what the advantages are of creating a GQL api over creating a SQL api - which my gut feeling says would be a bad idea, but the two ideas seem functionally equivalent.

        1. 1

          I don’t mean that you need to be running a literal graph database; instead, i mean that the queries that you can perform via GraphQL need the same optimizations and calculations that a graph database requires. I.e. if there are multiple paths to a target, you have to choose the optimal one, or else pay a sometimes-large performance penalty.

      1. 3

        Hi! I’m finding this super interesting. Like 80% of the work to get a working prototype of an app is the installation of a DB and coding the ORM plus REST in order to access it from the outside and packaging it. Getting it on a container is even more convenient.

        I’ve read the PostgREST docs for the motivation but since you’re the author of the docker wrapper I’d appreciate if you can expand on it a bit more. Did you develop this to scratch your own itch? Which use cases do you have in mind?

        A quick question, I’m not an expert on Postgres. Does the starter kit provide any kind of redundancy or are all containers, well, self-contained? (REST + DB for each instance, data is not shared)

        1. 4

          @carlesfe Thanks for the questions. First of, a small correction. Although I’ve wrote a lot of the core code in PostgREST and have been involved in the project for almost 2 years, I am not the author, Joe Nelson is.

          Before i got involved in the project (2015) I had a similar prototype in Lua which i started developing as you said to “scratch an itch”. I was looking around for something similar, came across PostgREST, liked that it is in Haskell and decided to contribute to it rather then develop a separate thing. Another contributing factor was the emerging of GraphQL (there were only roomers back then) and i though i could get PostgREST to a point where it could be base for a GraphQL backend server ( which i did :) https://subzero.cloud/ )

          The usecases are every API that is primarily talking to a database (reading writing data) which i think are the majority (other actions like talking to 3rd party systems, sending emails, notifications … are still possible).

          This starter kit is a first of all a “dev env” setup to make it easier to iterate on your PostgREST based project which is a major obstacle for new users. When going to production, your db will be in something like RDS while the other components (openresty/postgrest) will run as containers (or not). Openresty and postgrest are stateless so you can run as many instances as you like for redundancy.

        1. 3

          I don’t think this is a bad pattern overall, but it strikes me as an admission of API design failure. With an API built on something like GraphQL, Falcor, etc, you can get round trips down dramatically in a generic fashion for most operations. You can still have FE-specific BEs (or BFFs if you like), but they can probably be pretty thin templating systems or a bundle of custom “mutations” that accompany the generic query endpoint. If your queries become too large/complex, the query part of the BFF system becomes mostly just stored procedures / query cache, or a generic pre-render service, rather than a complete service of its own.

          1. 3

            Exposing an API as powerful as GraphQL to strangers on the internet seems scary to me.

            • What happens when a miscreant crafts a query that joins the biggest set in the DB against itself multiple times? e.g. user { friends { friends { friends { friends { friends { name } } } } } }.
            • What do I do when I find my iPhone app is accidentally emitting a really inefficient query but it will take days at minimum for the App Store to approve the update and every existing user to install it?
            • What’s the likelihood that any given full-blown GraphQL implementation has serious bugs in it just because the attack surface is so large?

            With BFF:

            • There’s no way to submit queries to the BE other than what the BFF decides to generate.
            • Inefficient queries made by the BFF can be fixed and rolled out immediately because they don’t run on anyone else’s devices. Inefficient requests made by the FE are still a problem but hopefully less likely since now you’ve got a pretty clear design guideline that each screen or user interaction should send a small fixed number of requests to the BFF.
            • The attack surface of the BFF can be as small as implementing the specific set of things that you want to do.
            1. 1

              What happens when a miscreant crafts a query that joins the biggest set in the DB against itself multiple times?

              This means you designed a bad API if you allowed joins in you biggest dataset, you can do the exact same mistake in REST too, it’s not a GraphQL problem. It’s not like GraphQL forces you to expose all possible relations between all entities that have them, you could create a schema where it’s only possible to to do user { friends {name}} and no deeper.

              What do I do when I find my iPhone app is accidentally emitting a really inefficient query

              Why are you finding this in production? The queries do not magically get generated by the app in production, you actually wrote them down in a string, just like you would write SQL. But even if this happens, you add special code to one of your resolvers to intercept this particular query (you can inspect the AST) and take a more optimal approach until your fix your app.

              What’s the likelihood that any given full-blown GraphQL implementation has serious bugs in it just because the attack surface is so large?

              Why is the attack surface larger then any other API? Are you talking here about something like graphql-js or sangria or are you talking about systems that use these libs?

              1. 1

                Why are you finding this in production?

                I honestly don’t know where to start with this.

                Perhaps you work on literally any kind of high-value dataset and can’t just have a copy of production to test again?

                Why is the attack surface larger then any other API?

                It’s designed to let you re-use disparate elements of your API. See the thread last week: ‘most security issues come from the intersection of two or more features’.

                1. 1

                  I get the part about not having access to the actual production dataset, but the specific discussion there was more about the size of the dataset and queries against that size, and you can generate any size for testing. Also the comment was related to an iPhone app, and the app could have been tested against the live dataset before submitting to the app store. Maybe another way to put it is, yes, you can’t test everything before going to production, but the specific question/problem raised most certainly can be discovered before going to production.

                  In relation to attack surface, if you are talking about the possibility of mounting DOS attacks by executing complex queries then yes, i agree with you, a developer will need to put in a lot more thought into defending the api against that, but it’s not impossible to implement. Flexibility comes at a cost. As for other types of attacks like sql injection and things like that, i would say GraphQL is safer because of it’s “types”

                  I would also like to mention that the issue of a GraphQL api being more susceptible to dos attacks is not a real problem for most developers/projects, so dismissing the technology just on this would be a mistake. This is a issue only for relatively big players. On the small chance that they do get hit with a dos it’s going to be just some kind of network flood and not a sophisticated thing that researches the the way the api works and finds the week spots.

                  1. 1

                    Dataset size isn’t enough - you need a similar distribution of string prefixes / numbers for indexed fields (not a RNG output).

                    I’m not dismissing GraphQL for being ‘more susceptible to DOS attacks’. I’m dismissing it for organizations where the team writing the server code can sit near the team writing the client code, because the added cost of flexibility adds too little in that situation.

                    1. 1

                      For the past 5 years i’ve worked on a product where the backend is an API, similar to graphql, that supports all the frontend code (spa and mobile). The team were all in the same office, and yet i strongly disagree with your statement that this type of flexible api has little value in this situation. In fact, i would say it was the only sane way of building an api that has to support nontrivial clients, otherwise every little change in the ui would require a change in the backend as well.

              2. 1

                What happens when a miscreant crafts a query that joins the biggest set in the DB against itself multiple times?

                Just throw an exception: “too many joins” or return a partial data set or something like that. Same thing you do if somebody supplies illegal parameters to some FE specific BFF query.

                What do I do when I find my iPhone app is accidentally emitting a really inefficient query

                What do you do when you find your iPhone app is accidentally in an HTTP request loop? Write some code to detect the situation and mitigate it.

                What’s the likelihood that any given full-blown GraphQL implementation has serious bugs in it just because the attack surface is so large?

                Lower than the odds that multiple ad-hoc BFF services have serious bugs in them just because they form such a large aggregate attack service.

                Overall, I think you’re worrying about problems you already have with any backend design. I’d much rather design, implement, and harden one such service, rather than one per frontend.

              3. 1

                If your queries become too large/complex, the query part of the BFF system becomes mostly just stored procedures / query cache, or a generic pre-render service, rather than a complete service of its own.

                In my experience the generic versions of these just aren’t quite good enough yet. An explicit BFF is sort of an admission of failure in the same way that e.g. a denormalized database schema is - but I just don’t think GraphQL, pre-rendering services etc. are mature enough to rely on yet.

              1. 2

                GraphQL seems like an API that encourages the right kind of thinking from developers – and translating GraphQL->SQL is straightforward.

                1. 1

                  While the GraphQL spec does encourage the elimination of SELECT *, the reference implementation makes things worse then ORMs. It’s not the implementation’s fault, it’s a general purpose one, but people took it as the way to do everything. They bolt on a few simple resolvers, which of course generate a million + 1 queries, then when a newcomer asks how to fix this, everyone just says “use dataloader”. It never crosses their mind that they might need a custom execution module. So the best case scenario that you end up with is a few sequential queries which could have been a single query with joins.

                  GraphQL->SQL, you are probably thinking PostgreSQL when you say this (i know your work), but can you do it in MySQL where the json support is not that good?

                  1. 1

                    I see what you are saying about the N+1 queries. A good implementation would have to compile to JOINs.

                    As regards JSON and MySQL – I don’t think it makes too much of a difference if there is JSON support. The project I did a while back, GraphpostgresQL, did all the steps of GraphQL processing in the database, using stored procedures, so JSON processing definitely mattered for formatting results, but a production GraphQL->SQL implementation should definitively involve a proxy server and much of the work can be done there.

                    1. 1

                      The reason i say json matters is because without it, if have to execute say a 3 level query (join 3 tables), then you will be transferring over the wire a lot of redundant data. This is the type of queries this tool creates https://github.com/stems/join-monster but the more levels, the worse the results since there is a lot of duplication. To be efficient the database has to format the result as json (the only way a database can return a tree like response). This is the way it works in PostgREST and it works well, and it’s interface is powerful enough to build graphql on top of it (which i did :) )

                      1. 1

                        Without JSON aggregations it’s hard to pull everything up into a tree, even if there is basic JSON support, that’s true. I am not sure what MySQL’s level of support is, beyond the basics.

                1. 7

                  This. :) I hear a lot the phrase “Database is always the bottleneck” and “don’t put logic in db because it does not scale”. Of course it does not scale if instead of sending one complex query which the db can inspect and be smart about it, optimise and rewrite, people send 100+ stupid queries per request then do the joining on the other side. I am not sure why the think their code can do better then a query planner.

                  1. 2

                    I used to do that in the past where the ratio of db servers to cheap client machines was something like 1:10 or 1:15. The dbs were expensive (first Informix, then Oracle) so it made sense to do more processing on the client side. This wasn’t a web-app or something, it was data processing and we basically had to touch every byte in the db at least once. In those cases it makes perfect sense to offload the processing elsewhere.

                    1. 2

                      Sure there are different cases, it makes sense what you are saying, i was just talking about your basic CRUD app / wordpress kind of thing which are the vast majority. But even in your case, from what you say, the reason was that the DB was expensive and you could not afford to have read replicas, the reason was not that it is faster to do it in the client or that the code is simpler then a long query

                    2. 1

                      I haven’t seen app-joining in a while, but working with Django, people sure could use more of .values() and listing the ones they need when doing read-only.

                      And naturally naming the columns whenever possible.

                      It’s just too easy to get things up and running fast with ORMs and then suffer from retrofitting all that into the code.

                    1. 5

                      If the point of the api is to get data in an out of the db, you could use PostgREST (also haskell :))

                      1. 1

                        Not a great idea since you’ll usually want to extend it, gate access, do various other things and putting that logic in your database is fraught.

                        1. 3

                          That’s a dismissive comment. Have you actually looked at PostgREST?

                          Here’s what I’ve found from two months of hacking on a project built on it:

                          • Extend it: I haven’t found myself held back while adding features like OAuth for authentication and a Places Search that talks to Nominatium. I haven’t found anything about PostgREST that makes this harder. I still write back-end code, PostgREST just takes care of the data stuff.
                          • Gate access: PostgreSQL actually has great authorization features via roles, including Row Level Security. I’m actually finding myself feeling much happier about my the security of my data now that I’m writing rules in declarative SQL, instead of manually writing app logic. Recent comment
                          • do various other things and putting that logic: I mean, sure, I have a login function as a Stored Procedure. The horror! It’s 8 lines of nicely-formatted plpgsql. I have an autocomplete endpoint that does some pretty smart querying as a Stored Procedure, which is longer, but is just a plain parameterizied sql query.

                          I won’t be surprised if I hit pain points if this project becomes a hit and grows to be huge… but from here, that day feels like a long way away, and PostgREST feels really valuable.

                          (edited to clarify a point)

                          1. 1

                            My comment was not intended as “the thing the article describes is not that good, try this”. The article describes a traditional way of building a rest api, it just happens to be in Haskell, nothing controversial about that. Since it’s traditional, of course it works and there is nothing wrong with it, but since the example shown was about how to get data in and out the db, it’s a good place to use PostgREST. It’s also ok that people dismiss it, the docs do not explain (yet) “the big picture” and how postgrest is only a part of the new stack, not the whole stack. The only objection i have to your comment is the part about “logic in the db” :) because that’s not dismissive of postgrest but of the databases in general (postgresql). Literally decades of work have gone into implementing views/stored procedures/triggers/role system/rls … and you are basically saying that was all for nothing since “putting that logic in your database is fraught” :) If the logic is very closely related to the data being handled, the database is exactly the place to put it.