1. 36
  1.  

  2. 4

    You can, but should you?

    1. 13

      It depends!

      For something like sum(), average(), max(), or the like, the answer’s an emphatic yes: despite doing “more” on the SQL server, you’ll actually consume less RAM (tracking a single datum of a sum/average/max/min, rather than building up a full response list) and send less over the network. That’s a definite win/win.

      For some of the other stuff in the article, the answer’s murkier. String operations in general, including the ones they’re mentioning (e.g. GROUP_CONCAT, which is specific to MySQL, but has equivalents in other databases) are fast in MySQL, but not necessarily in other databases. On the reverse side, some complex but amazingly useful queries (such as subselects) that are fast in SQLite and PostgreSQL are slow in MySQL, because MySQL’s design requires the generation of temporary tables (or at least still did as of roughly a year ago). PostgreSQL can likewise do some complex JSON ops server-side highly efficiently, and SQL Server can do similar stuff for XML, but I’m not sure I’d recommend doing that in, say, SQLite, even if you mechanically can (via extension methods), because it can defeat the query optimizer if not very carefully implemented.

      So: should you? Sometimes! Unfortunately, you need to learn your database to know the answer.

      1. 3

        Also, speed concerns aside, declarative languages are nice to read and write, as the author points out at the beginning of the article.

        1. 2

          String operations in particular seem to vary widely in how easy or hard they are between database engines. I remember being amazed at how few string functions MS SQL Server has - IIRC, substring and indexOf and that’s about it. Super clumsy to do anything but the most basic things. On the other hand, PostgreSQL has a full regexp find and replace engine and enough functions to do just about anything you could want.

        2. 4

          Another benefit is that you can put all your SQL code into stored procedures, and then the rest of your code can just call that functionality. Use multiple languages and it’s still all standardized in your stored procedures.

          1. 3

            Aren’t stores procedures a config management night mare? As in you distribute your business logic between code and and the db instance.

            1. 4

              I would argue that no business logic belongs in the stored procedures, but I’d further argue that making statements about your relations isn’t business logic. If you’re using a database as a source of truth, that’s basically all it should be worrying about: what is true about your domain, and what may be true about your domain (constraints).

              As for versioning- no, stored procedures aren’t a config management nightmare. It’s just that few organizations bothered to put any config management around them for decades. It’s not hard to implement versioned update scripts which roll the database schema forward or backwards. Honestly, it’s easier than some deployment solutions I’ve seen for application code.

            2. 2

              Dealing with a code base that uses the pattern of lots of logic in stored procedures being called by non-SQL code, this can go… too far. I am sad when I see stored procedures with multiple case clauses, calling other stored procedures, recursively calling themselves, making literal strings that are eval’d as other stored procedures and SO MANY CURSORS, and there’s no point in a test on the calling code because it doesn’t do anything.

              So… there are pros and cons.

              1. 2

                making literal strings that are eval’d as other stored procedures

                Other than that, it sounds perfectly sensible to me.

                there’s no point in a test on the calling code

                You could still test whether the functionality works or not, right?

            3. 2

              If there are going to be multiple consumers of the data, then it’s probably a good idea to make sure that the data as stored in the DBMS is correct under the definition of correctness you’re using. That can be a strong reason to pull much of the logic around the data into the DBMS itself. There are other concerns that pull in the opposite direction, of course, and perhaps you simply are using your database as a simple persistence layer. But if you expect ad-hoc reporting, for instance, or systems that depend on the data in the database being canonical, you are definitely a strong candidate for moving the logic into the DBMS.

              1. 1

                For small data that you are more frequently reading than writing (and you care about performance), you shouldn’t.

                Because then, it makes sense to cache it. Read everything once on a blue moon (whenever it changes) instead of doing many small reads in your fast path. Your own internal datastructures will easily outperform SQL: Simple lookups are the low-hanging fruit; then if you need the power of a relational database, I made a simple binary search based framework for left-joining tables of tuples in C++, and so can you.

              2. 3

                I think for many developers the first step would be to switch from an ORM to raw SQL first. I only recently started to use raw SQL instead of an ORM framework for one application and it boosts my productivity so much. It’s not a standard CRUD application, maybe that’s the reason.

                I guess if you know your ORM very well, it can work to mix both (ORMs probably also support a method to run SQL directly?), but for me it was basically a lot of unneeded overhead (creating the classes and so on) when I only want to retrieve very specific information from my database and not typical CRUD objects.

                1. 2

                  Don’t you end up writing extra code, just in SQL? I’m all for the efficiency, but depending on the team’s fluency in SQL, it can be more inscrutable than just writing Python or Ruby.

                  1. 7

                    My understanding is that if the needed operation is a reduction or join, you should absolutely do it through SQL, so that you’re not sending more data than necessary over the network. Or, If it’s a sort, and the table is indexed (which it should be if you need sorted query results), then you should also be doing that through SQL.

                    1. 5

                      The trade-off is largely the same as optimizations. I had a long-winded answer elsewhere, but the quick version is that if you’re doing a trivial one-off thing, it really doesn’t matter, but if you need performance, it’s worth taking a look. I had an intern this summer whose app was literally yanking all tables into RAM for most page loads, but since that meant pulling in all of about 200 kB of data, and the app was internal-only and only used for a couple hours a month, I really didn’t see a lot of value in making him redo things. On the other hand, I routinely bypassed Active Record on another project because speed and network usage really mattered.

                      1. 3

                        In my experience, that extra code in SQL will be much better in almost every dimension, perhaps because it’s a DSL suited to the task. If the team doesn’t know SQL, they should learn it – the ROI will be enormous.

                      2. 2

                        As I was learning Haskell a year or two ago, I rewrote a personal Go+Postgres web app in Haskell+Postgres. I ended up moving a ton of logic from the application layer into the database just because I felt more comfortable writing Postgres than Haskell. Now I’m more familiar with Haskell but I’ve left the application like that—querying, joining, filtering, etc. just feel more natural in SQL than they do in Haskell, at least without some kind of ORM.

                        1. 1

                          One thing this misses is testing. If I’m implementing logic in my application, I can write tests around it, fairly easily, using whatever unit testing framework is available for my language. If I’m implementing things like custom sorts in SQL, writing tests to ensure those custom sorts don’t have regressions and handle edge cases correctly is much more difficult.

                          1. 1

                            Wouldn’t endorse this headline’s prescriptive/scoldy tone, but a loosely related observation: occasionally your DB handles something well that you think it couldn’t possibly, making clever scalability tricks not worth it.

                            There was a query at work that had a lot of the markers of a potential perf hotspot: so-so index selectivity, bunch of complicated conditions, DB had to look at (large) full rows, could run hundreds of times a second. (The idea is to find upcoming in-person events within X miles of a given (lat,lon).) This kind of search had been a performance issue in a past project we built, so we put some tricky caching in to avoid a repeat.

                            It turned out scanning data hot enough to be in the DB’s page cache was fast enough the query wasn’t nearly a problem. It wasn’t free, but it was cheap enough, and the DB server beefy enough, that the peak query rate wouldn’t make the DB sweat, and almost every other step we were doing (like rendering the template for the search result page) contributed much more to user-visible latency than the wait for query results. We took out the caching in front of the query and worked on other stuff instead.

                            Hard to work out a generalized moral, but maybe it’s that even when you have more than a hunch that something will be unworkably expensive, it might be worth testing it; it can be surprising what the tech stack can handle.

                            (Also interesting, much of the query runtime ended up being sending/receiving the results–a query returning IDs completed much faster than one returning full rows, even though the DB had to look at full rows anyway because of the WHERE conditions. Worth looking at if we do eventually become DB-bound.)

                            1. 0

                              Thanks for all the comments.

                              1. -1

                                So if im doing it in SQL, i’m not writing extra code? Then by which sorcery do I communicate with SQL?