1. 9
  1.  

  2. 10

    As thorough as this article is, it seems to gloss right over what I’d consider a five alarm fire.

    its average server response time is 300ms

    Maybe spend some time looking at that number? The same argument that you don’t need to go Twitter scale would seemingly apply to picking the right heroku stack. If you can’t get at least 10 req/s out of webrick, don’t waste time picking a new server layer. No?

    1. 3

      I see what you’re saying, but if you check out the linked “Scaling Twitter” presentation, Twitter was averaging 250-300ms server response times in 2007. And serving 600 requests per second. Stupid? Maybe. But sometimes you’re stuck with that and you need to start pulling other levers.

      It’s also worth noting that an IO heavy application can increase throughput for free by switching to a multithreaded app server like Puma. Yes, your average response times won’t improve, but the amount of requests you can serve per second will increase as Ruby drops threads waiting on IO to go do other work.

      Average response times are just one part of the scaling equation - there are other levers to pull. This post is about those other levers. The rest of my blog has plenty of resource on how to decrease response times.

      1. 3

        I liked Secrets to Speedy Ruby Apps On Heroku (though “On Heroku” is probably overly specific; scares away the rest of us).

        I think the current post would be improved by more references to posts on caching, etc. And some guidance on when to do which. Like instead of saying this isn’t about memcache, suggest memcache first and then come back here?

        1. 1

          Thanks for the suggestion! All of these topics are so inter-related, I’m struggling on splitting them up into posts.

      2. 1

        Yep. If this were a Django app I’d be suggesting that you install Django Debug Toolkit and inspect the SQL queries required to construct one of these pages. Can the queries they be reduced, or cached, or indexed bettter, or run concurrently?

        There’s presumably a Rails equivalent, or you can check out the query log or something. 300ms is a hell of a long time.

      3. 8

        Perhap I’m just out of touch, but I was surprised that scaling to 1000 r/m (or 16 r/s) required any thought at all. I would have figured the default settings on any web framework would handle that with ease assuming the database can.

        Is this Ruby being slow? Does an out-of-the-box Go app doing the same thing need a scaling document? Or Java?

        1. 4

          I think it’s mainly a ruby/rails thing still. Go (Gin), Java (Dropwizard), Elixir (Phoenix), Clojure (Ring), and Haskell (Yesod) should all get that many (and more) out of the box. In fact there is a benchmark for a few: https://github.com/mroth/phoenix-showdown and they are all around a few kr/s (which isn’t heroku).

          I put a phoenix hello app a free heroku dyno and ran benchmarks against it using ab. A similar test with clojure that @peter ran (packaged as a fat jar and deployed to free dyno) got about 750 r/s. The point is that it’s certainly easy to achieve.

          ab -c 250 -n 4000 http://quiet-falls-9626.herokuapp.com/
          
          Server Software:        Cowboy
          Server Hostname:        quiet-falls-9626.herokuapp.com
          Server Port:            80
          
          Document Path:          /
          Document Length:        2140 bytes
          
          Concurrency Level:      250
          Time taken for tests:   3.141 seconds
          Complete requests:      4000
          Failed requests:        0
          Total transferred:      9652000 bytes
          HTML transferred:       8560000 bytes
          Requests per second:    1273.50 [#/sec] (mean)
          Time per request:       196.309 [ms] (mean)
          Time per request:       0.785 [ms] (mean, across all concurrent requests)
          Transfer rate:          3000.94 [Kbytes/sec] received
          

          This endpoint renders the template with passed variable

          ab -c 250 -n 4000 http://quiet-falls-9626.herokuapp.com/hello/tesla
          
          Document Path:          /hello/tesla
          Document Length:        1046 bytes
          
          Concurrency Level:      250
          Time taken for tests:   4.576 seconds
          Complete requests:      4000
          Failed requests:        0
          Total transferred:      5276000 bytes
          HTML transferred:       4184000 bytes
          Requests per second:    874.14 [#/sec] (mean)
          Time per request:       285.996 [ms] (mean)
          Time per request:       1.144 [ms] (mean, across all concurrent requests)
          Transfer rate:          1125.96 [Kbytes/sec] received
          

          And a benchmark against a Phoenix app serving data (keep in mind SSL shtuffs):

          Server Software: nginx/1.6.3
          Server Port: 443
          SSL/TLS Protocol: TLSv1,DHE-RSA-AES256-SHA,4096,256
          
          Document Path: /api/v0/messages
          Document Length: 1886 bytes
          
          Concurrency Level: 200
          Time taken for tests: 11.581 seconds
          Complete requests: 10000
          Failed requests: 0
          Keep-Alive requests: 10000
          Total transferred: 21880000 bytes
          HTML transferred: 18860000 bytes
          Requests per second: 863.47 [#/sec] (mean)
          Time per request: 231.624 [ms] (mean)
          Time per request: 1.158 [ms] (mean, across all concurrent requests)
          Transfer rate: 1844.99 [Kbytes/sec] received
          
          1. 2

            You can just “crank dynos”, but I wanted to write an article about how to get to 1000 r/m efficiently. Any Rails application ran get to 1000 r/m by scaling horizontally, yes, but doing it efficiently with the least amount of servers possible is another matter entirely.

            I’ve just seen so many overscaled Ruby apps that I knew this article had to be written.

            1. 4

              I understand, I’m not talking about cranking dynos either. I’m asking if/why Ruby/Rails is just this slow out of the box. I would expect 1000 r/m to be possible with a default configuration on any modern webstack.

              In otherwords, it seems like not using Ruby is is the most efficient way to scale to 1000 r/m.

              1. 1

                Oh, it’s certainly capable. Consider that Basecamp claims a ~25ms median response time, Shopify has variously claimed ~45-100ms. And both of those companies are at massive scale.

                1. 2

                  I don’t understand your response. I am talking about requests per minute, not response time.

          2. 3

            The heroku router is not nginx.

            1. 1

              I swear it used to be. I’ll ask @schneems.

              EDIT: You were right. It’s a custom erlang app: https://twitter.com/schneems/status/626500188970946560

              1. 6

                Long long ago it was. I work at Heroku on the routers :)

                1. 2

                  Is the guy who wrote Learn You Some Erlang still on the team (think Fred Herb)? That was amazing. There’s also a book by the routing team (or ebook) though I can’t remember the name. Both are awesome reads.

                  Found it: Erlang in Anger.

                  1. 1

                    Yup, Fred Hebert is still here.

            2. 2

              The description of Unicorn is wrong.

              while downloading the request off the socket, Unicorn blocks all of your other workers from accepting any new connections, and your host becomes unavailable.

              Sockets don’t work like this, at all. Using curl it’s easy to show this doesn’t happen:

              [terminal 1] $ curl –limit-rate 1 localhost:8080

              [terminal 2] $ curl localhost:8080

              The second will return immediately, even while the first connection is still slowly trickling the request.

              @nateberkopec conflates a listening socket with an accepted socket. An arbitrary number of connections can be accepted from a listener, and this has no effect on the listener, or any of the other accepted connections.

              Otherwise, good intro on network concerns in web apps. :)

              1. 1

                Hm, this may be related to Ruby’s implementation of sockets. It was my understanding that this behavior (socket blocking) was the reason Unicorn cannot serve slow clients effectively. I’ll have to do more research. Thanks!

                1. 2

                  You’ll be able to serve worker_processes slow connections but not any after that. Typically Ruby and Python app servers are run with low numbers of workers, so you can still only handle a limited number of long-lasting requests. Something that buffers the requests thus still provides a huge advantage.

                  1. 2

                    That makes so much sense. I’m not sure why I thought it was more complicated than that.