1. 15

  2. 13

    We hit something similar to this at reddit ages ago. reddit exposes postgres IDs to users encoded as base36 in URLs, so www.reddit.com/details/123abc which it then decodes 123abc->63978744 and then does the obvious SQL query to find the post and its comments. But sometimes broken scrapers would hit something like www.reddit.com/details/wwwredditcomdetails and we’d happily decode wwwredditcomdetails->339490705556535069608646345136 and do the obvious SQL query, but because the giant number is a bigint there’s now a cast so the query is now a sequential scan which can never have results but takes a long time and eats up your disk bandwidth.

    IIRC we didn’t discover this from the disk bandwidth but because having those sequential scans all of the time also held a lock on some kinds of writes so we’d have long queues of backlogged INSERTs, pending on all of those sequential scans. (Writing this out it doesn’t sound like locking that should happen with postgres’ MVCC system so take my hazy memory with a grain of salt.)

    The fix was of course simple validation which we should have been doing the whole time, but it did surprise me that postgres didn’t infer that a value outside of the possible range of the column must not have any results. It seems like when you involve casts all bets are off and it just individually runs the cast for every row to check. We were building these queries through sqlalchemy which may have been involved somehow in exactly how it spells the query it generates, but I don’t recall looking too deeply into this aspect vs just adding the validation and moving on.

    1. 2

      Writing this out it doesn’t sound like locking that should happen with postgres’ MVCC system so take my hazy memory with a grain of salt.

      Could it be that you had some other operation that tried to take an exclusive lock on the table, such as a schema migration. Then the exclusive lock would block on the sequential scans and the INSERTs would queue behind the schema migration.

      1. 2

        Yeah, though I wouldn’t expect these tables to be doing several schema migrations per second :) The details here are probably lost to the sands of time

    2. 2

      Under mitigations, you say

      One way is using bound variables.

      What do you mean by this exactly?

      1. 3

        Use PQexecParams / PQexecPrepared (or the ‘extended query’ protocol they rely on). They take a datatype for each parameter, which means you can be explicit about what those types should be, and not have one of your values implicitly upcast to something expensive. The alternative is to use PQexec (the ‘simple query’ protocol) with parameter-binding handled on the client side as an operation that just outputs an SQL string.

        1. 2

          I figured maybe that’s what they meant, but that depends a lot on the specifics of how that’s done. It’ll only work if you explicitly pass in the type, or the driver does not specify types at all (in which case they’ll default to unspecified, which is a bit like, but not exactly the same, as putting it in a string literal). Some drivers will automatically detect the type to use from the value you pass in, which leads back to the original problem.

          For instance, the official Postgres JDBC driver’s .setObject method (which adds arguments to a prepared statement) will detect the input value’s class and if it’s a number other than Long, Int or Double, it will call the .setNumber method which helpfully tells Postgres that the argument type is NUMERIC.

          In Clojure (and probably also in JRuby), this means that if the input is a bignum it’ll behave the same way as if you plugged the large number in the query as a literal. I.e., (jdbc/query db ["SELECT pg_typeof(?)" 9223372036854775808]) returns numeric.

          If you’re using the JDBC driver more directly (eg from Java or Kotlin), you have to call .setLong() for example explicitly, not .setObject(), otherwise you may be affected as well. Although there you more usually have a more explicit type anyway and will not end up with random bignums so easily.

          1. 1

            Thanks for the information. I was not aware of the JDBC behavior with setObject. I only use JDBC for testing Sequel, I don’t have production experience with it.

            Sequel’s pg_auto_parameterize extension always forcible casts integers to integer or bigint based on the value of the integer, so it shouldn’t be subject to this issue. Sequel’s JDBC adapter always uses setLong for integer bound variables, so it is also not subject to this issue. I also checked Sequel’s postgres adapter when using bound variables and it is also not subject to this issue. If you don’t forcibly cast when using Sequel’s postgres adapter, PostgreSQL will implicitly cast, or raise an error if no implicit cast is possible. For example, SELECT pg_typeof($1) will result in an error because the type cannot be determined.

            1. 2

              Here’s how we worked around it in our Clojure codebase. We reject bigints (of both kinds! sometimes you may get a Java BigInteger, like when deserializing JSON) and provide an escape hatch so that one can still decide to serialize bigints when it’s really needed on a case-by-case basis.

              (defn bignum->pgobject
                "Wrap a BigInt (or BigDecimal) instance in a PGObject so that it can
                be passed on properly as a query argument.  This is necessary
                because we reject regular BigInts for safety reasons, and we may
                decide to do the same for BigDecimals later on too."
                (doto (PGobject.)
                  (.setType "numeric")
                  (.setValue (str num))))
              (extend-protocol jdbc/ISQLParameter
                  (set-parameter [v ^PreparedStatement _ ^long _]
                    (throw (ex-info "Serializing bigints in queries has been disabled to avoid forced sequential scans by malicious user input; if you know it's safe, use bignum->pgobject" {:value v})))
                  (set-parameter [v ^PreparedStatement _ ^long _]
                    (throw (ex-info "Serializing bigints in queries has been disabled avoid forced sequential scans by malicious user input; if you know it's safe, use bignum->pgobject" {:value v}))))
            2. 1

              Yeah, I think you’re right. What it really means is “use this interface, and don’t put anything clever between it and the user that could possibly make a bad guess in the same way that the SQL parser does”.

        2. 1

          After some experimentation, I’ve discovered that it’s not just with large integers that this happens. Even with a lowly floating-point number it’ll fall back to sequential scan behaviour. i.e, if you do SELECT * FROM table_name WHERE id = 0.5 it’ll read the 0.5 as NUMERIC. Note that this even happens for integral values in numeric syntax like 2.0. But even if you explicitly pass in a type of double, it’ll fall back to sequentially scanning the table.

          This should be fun in any JSON API, even if the implementation language doesn’t support bignums…

          1. 2

            This is covered in the post under the “Behavior of Other Numeric Type Comparisons on PostgreSQL” heading. :)

            1. 1

              I should really work on my reading skills ;)