1. 35
  1.  

  2. 24

    The overhead of a relational datastore is enormous; this is a large part of why NoSQL data stores are so much faster.

    I agree that ORM is an anti-pattern, however this one part is just wrong. SQL databases are extremely efficient, and if your queries are limited to SELECT * FROM table WHERE column = ? with appropriate indexes, you are going to get similar if not better performance than a NoSQL database on a single server. The performance advantages of NoSQL are all about distributed systems, using more than one server to handle higher throughput.

    1. -2

      Agreed, although there is a more subtle difference too: “with appropriate indexes”. In many (most?) NoSQL data stores, column can be any column, at any time. While in SQL you need to A) know which columns are important ahead of time B) add an index which hurts write performance and C) hope you don’t need to add indexes in the future.

      Not saying SQL is bad and NoSQL is great…they are each tools for certain jobs, but knowing which indexes you need in your table is a very critical performance point and requires careful thought.

      Edit: apparently I know nothing about other NoSQL stores. Ignore me!

      1. 12

        In many (most?) NoSQL data stores, column can be any column, at any time.

        That’s not true at all. Most NoSQL databases I can think of need explicit indexes, and many of these actually won’t let you query by an arbitrary column without a secondary index explicitly created:

        • Riak via secondary index tags
        • CouchDB via views
        • DynamoDB via global secondary indexes
        • GCE Datastore via index.yaml (though many field types are indexed implicitly, there are still indexes)
        • MongoDB via ensureIndex
        • HBase via periodic-update or dual-write secondary indexes

        Some NoSQL dbs have no secondary index support at all, like Redis. The only one I know of off the top of my head that automatically indexes everything is MarkLogic, which has a rich universal index for finding documents with XPath/XQuery.

        1. 4

          Welp, sorry about that. I’m mostly familiar with Elasticsearch and assumed other NoSQL also automatically indexed (or the equivalent) all columns.

          Thanks for the correction!

          1. 2

            ElasticSearch does that because it’s specifically for searching haha! Lots of use cases don’t involve searching.

            1. 1

              Hmm…I don’t really understand that statement. Search is just finding data that matches a criteria. That criteria can be your “stereotypical” full-text and unstructured search:

              • “Quick brown fox jumped over the moon”
              • “Quick br*”
              • (“Quick brown” AND “moon”) OR (“jumping fast spider”)

              But it can also be fully structured:

              • ID == 5
              • date > yesterday
              • price < 1000
              • category == “videos”
              • ID == 5 AND dates > yesterday AND price < 1000 AND category = “videos”

              Fundamentally it’s all the same thing. If you view search in that light…what use-cases don’t need search?

              Edit: that’s not to say that search solves everything - e.g. relations are hard for search engines.

              1. 3

                I would classify those both as “search”, I just meant that ElasticSearch is specifically for performing searches. Something like a key/value store is more about rapid access to individual items by primary key, or by simple criteria like individual columns.

                For example, a user’s account settings would be keyed by user id. There aren’t really any reasons to find account settings records by anything but user id. Or a database of images that need to be accessed fairly quickly, but also need to have rich query information like your structured example. Photos that have timestamps and albums and whatnot.

                A photo application usually separates that data into two databases, one that focuses on fast access for large chunks of data (like S3), and one that focuses on fast querying for small pieces of data like an RDBMS. It’s a use case that involves searching, but it’s inefficient to use the same set of disks to:

                • stream image data large chunks of image data
                • perform many small reads and writes to query and maintain metadata

                You end up with two database use cases for your application, one that supports rich querying of small but detailed data, and one that supports simple primary key querying of large blobs of data. Just because an application has searching doesn’t mean every database needs rich querying! =)

        2. 2

          In many (most?) NoSQL data stores, column can be any column, at any time

          in most NoSQL data stores, column can only be the primary key. Most (all) NoSQL databases are key/value stores that assign arbitrary values to keys. This means that if you want to query for a specific “column”, you need to know in advance that you might want to do that and then create a mapping from values of that “column” to the keys pointing to the data.

          Of you map/reduce the whole data set on the fly which is much more parallelizable than with SQL databases, but still requires as much total time as a sequence scan in a relational table would, but there adding an index is trivial and requires no programming at all, whereas in a NoSQL datastore, you’d have to manually write code to keep your mapping structure current or you re-run your essentially full-table-scan equivalent map/reduce task.

          If you have many, many machines at your disposal, the parallelizable map/reduce jobs might come out quicker (or at all), but you’ll need many machines, whereas with a relational database, you might not get a result at all if you have outgrown the capabilities of your database, but until then, getting to the data will be much less expensive and easier to achieve.

          Looking up data by primary key on the other hand is as expensive/performant in both NoSQL and SQL stores.

      2. 8

        What I don’t understand is why people are so afraid of SQL. It’s a pretty simple language over pretty simple concepts even at its most arcane, and nobody using an ORM is even scratching the surface of its arcana.

        Any insight? Is it just a side-effect of single-language web developers trying to stay that way?

        1. 12

          I’m not afraid of SQL, I just don’t like to write it.

          Give me

          Person.find(1)
          

          over

          SELECT * FROM people where id=1 LIMIT 1;
          

          any day. As it gets complex, this gets more and more true.

          Post.joins(:group => :memberships).where(:memberships => {:user_id => 1})
          

          is so much nicer to me than

          SELECT "posts".* FROM "posts" INNER JOIN "groups" ON "groups"."id" = "posts"."group_id" INNER JOIN "memberships" ON "memberships"."group_id" = "groups"."id" WHERE ("memberships"."user_id" = 1)
          
          1. 5

            SQL is a subpar language, but the mismatch of ORM and relational models still lives. I wish there were more popularized Tutorial D databases. I also wish Tutorial D were better, though it’s pretty good.

          2. 1

            I don’t think is fear, but the way companies are organised. Historically developers reported to apps hierarchy and DBAs to infrastructure teams. None of them where interested to collaborate. DBAs tried to ignore developers requests, and developers tried to solve their own problems without messing too much with the DB.

            Today with devops is slightly different, but not much better.

          3. 10

            This x1000. I cannot express the amount of harm I have seen ORMs do over the years, both in code quality and performance.

            1. 4

              i agree that if you use an ORM you trade in problems for others.

              the article generalizes the different orm patterns that exist. not all orm are created equal.

              since all data can be “relational in nature” his solution to “Encapsulate your relational queries into a Model layer” will in the end just produce your own ORM or lots of duplicated and hard to maintain code.

              we currently use the Ruby Sequel Library. I like it because it provides different levels. You can easily drop to customised SQL and speed up things if necessary.

              1. 7

                Along these lines, another benefit of the ORM is composability. I find that when working directly with SQL it is much harder to compose queries from smaller, singly focused components. That being said, there are abstractions that operate at a lower level than an active-record-style ORM and map much more closely to SQL (such as Ruby’s Sequel mentioned above or Clojure’s Korma) offering much of the composability benefits that you’d get from an ORM.

                1. 5

                  No one has mentioned another big benefit of ORMs: security.

                  ORMs are supposed to sanitise all input, and prevent sql injection attacks. In theory, programmers could do this themselves. In practice, the dozens of hacks at big name firms (can’t remember any examples now) show that most programmers, even at reputable firms, weren’t doing basic sanitization of input.

                  1. 6

                    A really thin typing layer can do this and do it more completely, safely.

              2. 4

                What I’m curious to see is how migrations are performed outside of an ORM context. ORMs tend to always have a good method for table migrations—does the need for those just vanish when not using an ORM?

                1. 3

                  Migrations are still useful, and the good thing is that they are actually independent of the ORM concept. Migrations provide a repeatable and automated way of incrementally altering the database in step with the source. ORMs provide a mapping from SQL to objects. The ORM libraries often happen to implement migrations as well, that’s all.

                  For example, in my Node.js application I write my queries in SQL, without an ORM, but I still use node-db-migrate and run migrations.

                  Incidentally, I ended up passing plain SQL to node-db-migrate instead of using its functions. The reasons are similar to why I ditched ORM - the mapping from SQL to JavaScript is awkward and incomplete, so it’s more trouble than it’s worth.

                2. 4

                  As steveklabnik mentioned ActiveRecord has made it easier to execute trivial SQL queries and immediately have them returned as objects. I agree an ORM can get a little frustrating when you want to start performing complex SQL queries but for the most part an ORM is highlight beneficial.

                  I find ORMs still overly useful in basic web requests, if I need to show a single User or multiple Posts the gains over a simple User.find(params[:id]) and Post.top_ten outweigh the supposed burden of an ORM. I didn’t have to worry about constructing a maintainable SQL query and converting the DB response into a maintainable struct; all I had to worry about was moving the model down as a view model into the view.

                  There should not be such hate around ORMs, instead there should be better education of when an ORM is awesome and when it’s time to take matters into your own hands.

                  1. 3

                    While I understand why people describe ORMs a failed abstraction, I strongly agree with you. ActiveRecord provides a usable solution out of the box, that fits common requirements for web apps.

                    ActiveRecord and probably most of others ORMs are clearly leaky abstractions. SQL concepts are everywhere, just hidden from plain sight, waiting to bite. Imho the majority of serious Rails developpers are at least able to read and write plain SQL from time to time. If it gets complex, most of those I’ve met and myself included just dive in the docs, trying to refresh memories about what is needed.

                    This leads me to think that AR was designed with that fact in mind, to use it efficiently people still need SQL notions as AR don’t even try to hide that stuff. It just provides a convenient way to handle common queries. I can’t speak for the others big ORMs like hibernate since I haven’t used them in a long time but I doubt they were designed in that way.

                    AR in that case, still is a leaky abstraction, but it works wonderfully in most situations. From what I’ve seen, the biggest issue is usually the developer not knowing AR’s limits and keeps using in a case it brings more harm than good.

                  2. 3

                    Let’s not forget why we have ORM: we invented ORM because we wanted to have object oriented database functionality in relational databases.

                    So, we had to create a mapping to provide a kind of ‘limited’ OODB functionality in relational databases. Personally I think we knew where we were getting into when the first ORM appeared, so there’s no need to talk about anti-pattern. Relational databases provided many benefits to the enterprise world that we did want to use (standard data access, remote data access, standard reporting, easy integration…) but no OODB provided.

                    So ORM was a compromise, not an anti-pattern. Something similar happened with directory services: they are great for their core function, but no so good for non core functions.