1. 29

https://github.com/begriffs/pg_rational

I wanted to create a custom type for precise fractions in postgres. They are useful for making user-defined row ordering (like a todo list) that you can rearrange without having to renumber surrounding items.

This is my first extension, and also the first time I’ve used C in many years. I wrote a lot of tests for this extension, and would love a code review.

Some coding questions that came to mind:

  • I use pointers in many places where I could have passed a 128-bit argument by value. Does this give a speed boost? Is it worth the messier looking code?
  • How portable is my use of __int128_t and __builtin_saddl_overflow?
  • I have guards for dangerous two’s complement operations, but do some CPUs use another representation that I ought to respect with conditional compilation?
  • In a few places I turned some loops into a goto construction. Sounds nasty, I know, but I thought it helped emphasize that the common code path is not to loop at all.
  • Coding conventions: naming, indentation, etc.
  • Any optimizations I’m missing?
  • Any bugs?

(Cross-posted on r/PostgreSQL)

  1.  

  2. 12

    I use pointers in many places where I could have passed a 128-bit argument by value.

    Not very many places, just add, mul, cmp, and mediant. Both neg and simplify depend on mutating their arguments.

    Does this give a speed boost?

    No. All your call sites will be inlined and those struct accesses will be turned into direct register manipulation, whether they’re pointers or not.

    On the contrary, all your preamble calling memcpy on PG_GETARG will be slower because you’re putting useless junk on the stack for no reason. Again, the compiler is just going to inline everything and use registers anyway, and if it decides not to then it can save the values on the stack itself. You’d be better off doing this:

    @@ -177,6 +177,6 @@ rational_add(PG_FUNCTION_ARGS) {
       Rational x, y;
    -  memcpy(&x, PG_GETARG_POINTER(0), sizeof(Rational));
    -  memcpy(&y, PG_GETARG_POINTER(1), sizeof(Rational));
    +  x = *(Rational*)PG_GETARG_POINTER(0);
    +  y = *(Rational*)PG_GETARG_POINTER(1);
    
    -  PG_RETURN_POINTER(add(&x,&y));
    +  PG_RETURN_POINTER(add(x,y));
     }
    

    Is it worth the messier looking code?

    It’s not really messier. Change some -> to ., drop some &, add some different &, no problem.

    I have guards for dangerous two’s complement operations, but do some CPUs use another representation that I ought to respect with conditional compilation?

    There are probably intrinsic replacements for what you’re doing.

    In a few places I turned some loops into a goto construction. Sounds nasty, I know, but I thought it helped emphasize that the common code path is not to loop at all.

    It’s not nasty at all, these are perfectly reasonable uses of goto.

    1. 3

      Love it! Super useful, and thanks for including the how-to, too. I just created a pull request on Github to fix a typo.

      1. 2

        They are useful for making user-defined row ordering (like a todo list) that you can rearrange without having to renumber surrounding items.

        Would you mind explaining how that works?

        1. 4

          Suppose I used integers to keep track of the order of todos:

          create table todos (
            prio bigserial unique,
            what text not null
          );
          insert into todos (what) values ('A'), ('B'), ('C'), ('D');
          

          If I wanted to insert a new value E between A and B when ordering by prio, I’d have to do:

          -- this action needs to happen transactionally
          begin;
          
          -- move items ahead to make room
          update todos
          set prio = prio + 1
          where prio > 1;
          
          -- add the new one
          insert into todos values (2, 'E');
          
          -- don't forget to advance the sequence too
          select nextval('todos_prio_seq');
          
          commit;
          

          Basically I have to move a bunch of items ahead to make room for E. Whereas if I number the rows with fractions I can find a place for E without moving any other items.

          1. 2

            If I understand you, then you could, e.g., set the priorities of A, …, D to be 1, 1/2, 1/3, 1/4, respectively. Then the priority of E could be (1 + 1/2) / 2 = 3/4. Generally, the priority of an item inserted between items x and y would be the average of their priorities. Super neat!

            Edit: I now see your rational_intermediate function, and realize that my approach definitely wouldn’t scale. E.g., my method to find a priority between 3/4 and 1/2 would give a denominator of 8. In the pathological case, these denominators could quickly become too big to efficiently represent.

            1. 3

              Averages were my first attempt as well. Turns out clockmakers in the 1800s were pretty ingenious, though. :)

        2. 1

          You eventually need to normalize the fractions somehow and I don’t see this in your work or documentation.

          Andrew (the author of the “User-specified ordering with fractions” wiki page) has given a query to update all fractions across the set of rows to be ordered. That query could be further simplified if you use deferred exclusion constraint instead of a unique constraint, though he prefers the latter.

          1. 1

            OK, good call I’ll implement a similar function for that. As a side note, I wonder how long it takes before the growing terms of the fractions created by rational_intermediate() begin to cause problems and should get renormalized?