1. 7

Suppose you hypothetically cannot use the cloud. Could be financial,political, ethical technological or other reasons. What approach do you take ? What tools do you use ? How do you solve the problem of serving you users beyond one instance/IP etc ?

I’m not looking for concrete advice here, I just want a discussion/survey of what you came up with.

    1. 11

      This is what we do, per policy.

      It depends on what you mean by Scale..

      The easiest way to scale, provided you have $$’s is just throw more hardware at the problem (ram, cpu, etc) You can scale x86 hardware up quite far, and can probably handle most workloads while still sitting on a single machine just fine.

      If you can’t scale on hardware, for whatever reason, then you have to get more creative/work a little harder. The next easiest is figure out what the pain points are for scaling X app, and working on the code/configuration to lower the resources required to do the same amount of work. For custom apps, this could just be something simple like changing how you store X in memory, or it could be putting that work out in a more performant language(like say a cython module if the code was originally in Python), etc. If it’s a java app, it might be just re-configuring the JVM a little bit.

      Next is scaling by increasing the number of running copies of said application. This is where it can get.. hard, and is definitely more unique to each application. For instance scaling nginx across multiple copies is really easy, since it’s pretty much stateless, and has no DB or anything you have to deal with. Scaling postgres across multiple copies gets a lot harder, since it’s a DB, and is very stateful.

      For us, for most web stuff, that is mostly stateless, we scale by just starting more instances of the application. For stuff like Postgres, we scale hardware, versus doing something like citus or offloading reads to a hot-standby or whatever. It gets complicated quickly doing that stuff.

      Generally the easy answer is just throw more physical resources at the problem, that almost always just works to make things faster. You sort of have to know what resource to increase tho (CPU speed, memory, I/O bandwidth, storage, etc). Luckily every OS out there gives you the tools you need to figure that out pretty easily.

      1. 2

        Excellent comment. I agree with all of it. I’ll add one can scale the databases by starting with or migrating to a database that scales horizontally. There’s quite a few of them. This is true for some other stateful services. One might also use load-balancers and/or sharding to keep too much traffic from being on one machine. That gets a bit more complex. There’s still tools and specialists that can help.

        1. 2

          I agree there are stateful databases that scale multi-node better out of the box than Postgres(PG) does. I specifically picked PG as the example here because it doesn’t scale multi-instance/machine out of the box very well.

          Once you get to multi-machine stateful applications, there are a lot of different trade-offs you have to handle, I’m not sure there is any one stateful DB that is multi-machine out of the box that will work for basically every workload the way Postgres does. I.e. PG is basically useful for any DB workload out of the box, provided it can fit on a single physical machine. I’d love examples of general purpose DB’s like PG that are multi-node out of the box with basically no downsides.

          But basically my advice is, once you have a stateful thing you have to go multi-node with, you either need a good, well paid consultant, or good in-house technical staff, as it’s not an easy problem that is very well solved for all use cases. Or to put it another way, avoid multi-node stateful things until absolutely forced to do so, and then go there with eyes wide open, with lots of technical knowledge at hand. Luckily if you do get to that requirement, you probably have lots of $$$ resources to shove at the problem, which helps immensely.

          1. 1

            Well-said again. Yeah, I don’t know if any of those advanced DB’s cover every kind of workload with what drawbacks. I’d want to see experimental comparisons on realistic data.

      2. 1

        You sound like you know a lot about this topic. Hypothetically, if its even possible, what would you do if the load balancer that you put in front of your workers cant handle the incoming load. How do you load balance the load balancer ?

        1. 4

          It’s definitely possible. You have some options, depending on the kind and amounts of traffic we are talking about.

          It depends some on if the Load Balancer(LB) is hardware or software based, etc. Most people are doing software based ones these days.

          Roughly in order of preference, but it’s a debatable order:

          • Ensure you are using a high throughput LB (haproxy comes to mind as a good software based one).
          • Simplify the load balancer configs to the bare minimum, i.e. get them doing the least amount of work possible. The less work you have to do, the more you can do given X resources.
          • Find the bottleneck(s), for most LB workloads the problem is a network I/O problem, not a CPU or memory or disk problem. So ensure your hardware is built for peak I/O (make sure your hardware NICs are configured for IP offloading your kernel is tuned, for I/O, etc).
          • Scale out the LB, with multiple instances. This gets.. interesting, as suddenly you need your traffic to hit more than 1 machine, you can do that a variety of different ways, depending on the actual traffic we are talking about. The easiest is probably just lazy DNS RR (i.e. have your name return multiple A/AAAA records for the host you are load balancing and each IP is a LB).

          Rinse and repeat the above, until you have room to breathe again.

          There are more complicated ways to do this, depending on traffic load. Once you get to needing to scale past simple DNS RR, you probably want a network expert, as it depends on the kinds of traffic (is it all inbound, or is it mostly outbound traffic, etc). I.e. there are some neat things you can do where the LB’s are only handling the inbound traffic and all outbound/return traffic can come directly from the machine(s) doing the work, and not have to go back through the LB, etc. This requires special configuration(s) and is not overly simple to do.

          But it all basically comes down to the general case above. load balancers are generally very stateless, so the general, just run a bunch of copies plan usually works fine.

          If you need your LB’s to do a lot of CPU based work, then you can have a layered fanout model, where you have several layers of LB, each doing a small piece of the total overall work, if you can fit it all within your RTT budget.

          Also if you get extreme you can do user-spaced or unikernel designs where the entire software stack is taken over to do only LB duties, and you can really tune the software stack to do your specific LB duties.

          More than this, and into specific workloads for you, I’d be happy to consult, but it won’t be free ;)

        2. 2

          This is how you do it[1], no cloud required. From what I understand the cloud is essentially composed of what’s in the linked article along with a sophisticated configuration. People’s apps run on this config in the cloud if you go deep enough but it’s being handle by people who worry about the plumbing for you. I really must say that the article I’ve linked is quite excellent and comes with source code. You should check out his other projects as well.

          [1]https://vincent.bernat.ch/en/blog/2018-multi-tier-loadbalancer

        3. 1

          The related question to this is, of course, what do you do of the load balancer dies?

    2. 5

      Write a desktop app?

    3. 3

      One way of understanding the ‘cloud’ is that someone has built the capacity you’re going to use. It’s sitting there idle waiting to be used. You pay your transaction or operations fee and the CapEx that capacity represents is amortized for you. If you don’t want to pay transaction costs to use other people’s capital, use your own capital.

      Alternatively, the ‘cloud’ represents an abstraction over a number of service providers. You can scale by selecting these services and building your computing infrastructure from parts instead of using a single vendor. Presumably you understand or could discover what service you need and talk to vendors that can help with your now less abstract problem.

    4. 2

      This changes highly depending on the application and contract you have with your customers. I’ve listed what I commonly do for most web apps below I’m responsible for.


      1. Look for monotonicity: Plot the following: requests, CPU, Memory, Response Time, GC Free, GC Used. If you see sections of the graph which just keep going up with no stops, or sudden harsh drops, you may have a resource leak of some kind. While in some cases restarting the app occasionally can get you a speed boost, long term this fix needs to be at the app level. Your ability to find the leak will vary based on the language you are using. Most languages have some kind of leak detector, and there is always valgrind.
      2. Slow Queries: A lack of indexes or improperlty written queries can be a major slowdown. Depending on your database EXPLAIN, will be able to guide you in the right direction.
      3. Cache things: If there are slow or expensive oprations on your app, try to see if you can cache them. Making an expensive operation happen once an hour or once a day versus on demand goes a long way to keeping the app feeling responsive.
      4. Deal with bad users: Do you track which users or IPs are making requests which always fail? Can you block them via something like fail2ban? At a bare minimum, this will get noise out of your app logs and help you find real issues. If their reqests are invoking expensive or time consuming operations and they will always fail, blocking them will give you resources back to use on legitimate requests.
      5. Fail early: Check access, validate paramaters and ensure the request is valid before doing anything expensive.
    5. 1

      I haven’t jumped into elastic loadbalancing or scaling yet fully myself. But I have been playing with Docker. I assume you’d use Kubernates or a similar tool to scale horizontally.

      A more baby approach would be to setup Nginx or a loadbalancer with Backup hosts denoted and if your projects traffic is low enough to not need the extra resources you’re ok in general – let the backups kick in under heavy load. In fact Nginx Pro supports more features of dynamic additions of backend servers …