1. 33
  1. 13

    A good candidate for our new performance tag!

    Thanks for the good writeup @schneems…it shows one of the reasons I’m a little weary of ORMs and other interfaces to SQL that have a lot of magic. Very easy to shoot self in foot performance-wise.

    1. 11

      Yay performance tag!

      I’m on the fence with ORMs. On one hand they do stuff like this that doesn’t make any sense (doing the validation even when fields aren’t changed). On the other hand they help me catch a bunch of other common mistakes, like SQL injection.

      Having this mistake originate from the ORM instead of my own custom written SQL actually means I can potentially fix this for everyone who uses that lib, instead of just for myself. Changing the validation code to not fire when those fields aren’t changed would be good. Better would be having the validation code assert that there is a constraint in postgres.

      1. 6

        Yep. Just because I’m weary of them doesn’t mean I don’t support their use where it makes sense. :)

        Ecto over in Elixir seems to have struck a good balance between “magic” and “more magic”.

        1. 6

          On the other hand they help me catch a bunch of other common mistakes, like SQL injection.

          Systems like Yesql or HugSQL also prevent SQL injections while providing a nice API for the user without having a difficult to predict ORM system under the hood. I’ve grown to like that approach quite a bit.

          1. 1

            I love mybatis approach which allow sql template which looks similar to hugsql

          2. 5

            ORMs cause these issues for sure, but they still crop up when you write raw SQL. It’s always important to monitor your query workload for slow queries. ORMs may not be perfect, but people aren’t either. Nice work actually profiling your database usage, I wish more people would!

            Also, you’ve duplicated an index key. Index prefixes work just as well as individual indexes, so you don’t need (name, user_name) and (name). Removing that extra index will speed up inserts.

            Also also, now that you have your new index, that Rails count validation will run much faster as long as you remove the case_sensitive: false from your validation. Since you add lowercase to before_validation, the index will still get used. Without it, you have to catch the Rails RecordNotUnique exception as described here under concurrency and integrity. Though if you read, you’ll see why you should catch it with or without the model-provided validation. If you’re feeling fancy you can add a record error like the active record uniqueness validation, which will let you use the validation error in the same you would use any other.

            Unfortunately you’ll only be able to handle one attribute in this way. If you have multiple uniqueness constraints there isn’t a way to tell which constraint was violated with Rails. Postgres adds an extended error field with the constraint name, but unfortunately the Ruby PG library never uses it, or you’d see a call to PQresultErrorField with PG_DIAG_CONSTRAINT_NAME when the sqlstate indicates a unique_violation error.

            Even if PG did return that extended information, Rails throws away any extra information when it converts to RecordNotUnique and just returns the normal error message text.

            Adding back the validation—while keeping the unique index as a safety net and performance optimization—will show the normal pretty error message instead of throwing a uniqueness exception. In my opinion, as long as the validation isn’t slow anymore, it’s worth keeping it for the pretty error message. It also lets you have a bunch of uniqueness validations and other constraints, and Rails will display the correct error messages for each of them. Very very occasionally the validations might race and result in that RecordNotUnique exception being thrown at a user, but I personally don’t think that’s worth fixing.

            1. 3

              Regarding unnecessary indexes: I recommend you give my open source tool a try. active_record_doctor identify database performance issues before they hit production.

              @schneems, your story inspired me to add a check for an expression index when a case-insensitive validation is used. Adding to my to-do list :-)

              1. 2

                I think you accidentally linked to this thread instead of your project: https://github.com/gregnavis/active_record_doctor

                1. 1

                  Good catch. Thank you!

            2. 4

              You can, and at a bare minimum should, prevent SQL injection by use of parameterized queries.

            3. 6

              it shows one of the reasons I’m a little weary of ORMs and other interfaces to SQL that have a lot of magic. Very easy to shoot self in foot performance-wise.

              The danger of not using ORMs tough, is that at some point, you will map data to domain objects and you want structure and… suddenly, you’ve got your own ORMs.

              That being said, I prefer systems that have a strict boundary between the domain model layer and the query layer and a rather explicit mapping phase. Which means that I’m not a huge fan of ActiveRecord.

              1. 1

                I’d go so far as to say that the domain model mapping and the query interface should be entirely independent systems.

                1. 1

                  For any larger system, yes, but for quick and dirty work, I like things that at least have some knowledge about each other to derive good defaults.

              2. 1

                IMHO it’s less about ORMs but more about ORMs that try to hide SQL from you.

                1. 1

                  This is less about the weaknesses of ORMs in general, and more about a particular feature of ActiveRecord, validates :uniqueness, that frankly shouldn’t be used at all. Its weakness to race conditions is enough reason to argue for its removal from the framework. Uniqueness constraints of any sort need to be done in the DB layer.

                2. 4

                  This site is extremely wide and horrible to read on Firefox because I need to scroll right all the time.

                  1. 4
                    1. 2

                      I’ve had the same problem. I’m not sure what’s going on.

                      1. 2

                        It’s fixed now.

                    2. 3

                      The ORM/no-ORM axis aside, it’s remarkable just how often the performance bottlenecks in live apps turn out to be something that makes you think “wow, we were doing that?” I recently had to look at a search that took 5 seconds to retrieve a couple thousand items. The majority of that time turned out to be spent repeatedly fetching an object with some search-related settings at least once per result; about half the remainder was running a query for data we rarely needed and could fetch lazily. I’d started playing with fundamental changes to how we did the search (which may still wind up being useful) but it turned out you could just tweak a few lines to cut it down to about a second.

                      (Outside of DBs, today listened to a brief recording with Caitie McCaffrey about a bug that looked like it would triple the load on some Web services for Halo. Won’t spoil it, but definitely a “wow, it was that?” sort of thing.)

                      It pleases me much (and makes my job easier) that RDBMSes can take a query and come up with a decent, stats-informed plan for what indexes to use, how best to deduplicate, find aggregates, etc. It is sort of a shame that, right now, they also make it surprisingly easy to write code you can’t easily see the O(n) of. Tricky problem that runs through the stack, or we wouldn’t be here talking about it. Still, everyone gets to whine a bit now and then, and there’s mine for today.

                      1. 2

                        You might have noticed that the LOWER part of the SQL query isn’t represented in my unique index. In my case, I was already normalizing the data stored, so that bit of logic was redundant.

                        Not sure I understand what they’re saying here. Why was the lower not required in the constraint? Now it looks like you can add two names with different casing.

                        Don’t you need the lower?

                        CREATE UNIQUE INDEX ON repos ((LOWER(name)));

                        Plus with this you wouldn’t need to change the model because it’d be optimized for the previous call.

                        Edit: You’re probably setting name to lower on save.

                        1. 2

                          Edit: You’re probably setting name to lower on save.

                          Before validation actually.

                        2. 1

                          This article should be exhibit A for another article on the front page right now: ‘Abstraction’ is a dirty word.

                          1. 1

                            Two things to note here:

                            1. Removing the validation causes a situation where inserting a duplicate repository name will result in a PG::UniquenessConstraint rather than an error that rails knows about.
                            2. Instead of doing the serialization yourself in ruby, just use the ctext data type.
                            1. 1

                              Yep, I already handled the PG::UniquenessConstraint due to the way my controller action was written. I didn’t decide to mention that aspect since the article was already running a bit long.

                              1. 1

                                This class of issues is where I find most of my performance bugs when I use Rails, et al.

                                It’s just not obvious how the ORM interacts with the database. Which is nice as an abstraction, but it causes all sorts of inefficient behavior to arise (N+1 queries, memory bugs, overeager loading, etc.)