1. 52

  2. 15

    I cringe to think of the Gigawatts and pollution spent on logging

    1. 6

      I used to work on a pretty big website. Logging was 80% of the storage cost because some devs do not understand that stack traces do not belong to production. Many of the logs could have been a metric in the monitoring system. Logging was somehow a holy cow and the mindset was that it is better to have more logs than we need instead of missing something. Half of the logs were repeated meaningless giant string that meant nothing to anybody. Fun times.

    2. 2

      Are there any reputable, reliable VPS providers that use servers with a single socket, uniform memory, and local NVMe SSDs? I believe Linode, DigitalOcean, and Vultr all have the local SSD part covered, but I don’t know about the other parts.

      Yes, I’m attacking this from the opposite direction: bringing servers down to the scale of our laptops. I believe that’s viable for a great many applications that don’t need really big servers.

      1. 12

        The problem with trying to take this approach is that it’s far more power efficient to have one server with a bunch of VMs than to have a bunch of servers. All those fixed costs get folded into one.

        I guess you can add the workload to one of those set-top boxes, like a DD-WRT router, which you probably use anyway. But those things are a lot slower than your laptop is, so it’s not really comparable.

        1. 1

          If we cared about power efficiency we would charge for power and not bandwidth in hosting facilities

          1. 9

            The fact that hosting facilities don’t (necessarily) charge by the watt does not mean that the price isn’t accounting for power use.

            There is a limited amount of power that one server can actually use before it overheats. This is inevitable; a standard rack unit has a fixed surface area through which you can dissipate heat, and heat is basically the only place that the energy can go.

            If you need more, then you have to use more than 1U. In this way, charging for space (which hosting facilities definitely do) is equivalent to charging for power.

            1. 5

              FWIW the last time I used a colo facility for real work(*), they charged for bandwidth, space and power separately.

              (* I’d tell you which because they were rather good, but unfortunately they later got bought out by a competitor and the service has, according to the grapevine, fallen off a cliff.)

              1. 3

                Not my department exactly, but I’m 80% sure we charge for power too. (Servercentral)

                1. 2

                  I am also a SC customer! But I don’t deal directly with the hosting stuff at this job. Last I knew though the power related charges were for the size of circuit and another charge for redundant power. But there’s no charge for /usage/ which would be more relevant because of the BTUs the facility has to remove. That cost is far more than our transit.

            2. 3

              As @notriddle points out, most VPS providers are going to use many-core servers so they can pack many VMs on one host. So it can be hard to find the higher clock speeds associated with “laptop” hardware.

              However, my understanding is that most providers will at least try to give you VMs within a single NUMA zone if the selected VM size will fit.

              So, for example, my 4-core Digital Ocean VPS is running on a server with a Xeon Gold 6140 CPU, and it’s very likely that the server in question has multiple sockets. But because it’s only 4 cores and that CPU has 18 physical cores (36 hyperthreads), my little VPS reports that it only has one NUMA zone.

              (This is something you can check really easily using lscpu, which will report not just the CPU model and features, but how many NUMA zones are present!)

              There do exist some providers that use single-socket consumer hardware for some machines. OVH, for example, provides some dedicated machines with Intel i7 processors. (“Reputable and reliable” are in the eye of the beholder; I’ve personally had poor experiences with vendors that run consumer hardware in datacenters, but that doesn’t mean they’re necessarily a bad choice…)

              1. 1

                (This is something you can check really easily using lscpu, which will report not just the CPU model and features, but how many NUMA zones are present!)

                Not necessarily. It will report the number of NUMA zones that the CPUID instruction reports, but the CPUID instruction can be trapped and emulated by the hypervisor. Most hypervisors will do this to allow migration (advertising the minimum features, so software doesn’t depend on features that won’t be present on the host that you migrate to). It’s entirely possible that you’re in one NUMA zone, two NUMA zones, or either one or two NUMA zones depending on the current pseudo-physical to physical memory mapping, but CPUID is trapped and emulated to report two. You can try allocating a load of memory scattered over your pseudophysical address space and measure latency, but if your VCPUs are being migrated between the two sockets then you’ll need to do a lot of samples to get accurate timing.

            3. 2

              The logging framework may not be a bottleneck, and other lies your laptop may tell you

              Fixed that for you. Our main software absolutely and positively runs a lot faster without debug logs and this isn’t some poor choice of logging, it’s simply not meant to be run in debug mode. And tweaking logging to not be a bottleneck can be quite important and also hard, depending on the language/framework/threading model you use.

              Most consumer machines have a single socket with RAM DIMMs located around it. Accessing any part of RAM has, roughly, uniform latency.

              Again, I won’t debate “most” overall, but most developer laptops I know have 2 sticks (1 internal). Also isn’t it really about the bus and not the stick?

              Maybe the headline is just not fitting the text in the best way?

              1. 11

                but most developer laptops I know have 2 sticks (1 internal). Also isn’t it really about the bus and not the stick?

                I think(?) we are talking about two different things because I’m using the term “socket” in a vague way.

                What I mean is that most servers will have multiple physical CPU sockets. Any given RAM stick will be connected to only one socket; threads that run on that socket will have fast access to that memory. Threads that run on other sockets will need to go “via” the “owner” socket to access those RAM sticks, not via the regular memory bus. On consumer hardware there’s generally just one physical CPU socket - even if you’ve got 16-32 cores on that socket.

                On the ThinkStation I bought, there are 12 RAM sticks; 6 of them are connected to one CPU socket, 6 to the other.

                Accessing “remote” memory roughly doubles the access times, give or take. Hence, a program that allocates memory in one thread and uses it in another will appear fine on a laptop, but may have catastrophic performance problems on server hardware, particularly for long sequences of dependent random access, like pointer chasing an object graph in an OO language.

              2. 1

                I’m very out of the loop on server hardware, so I didn’t realize NUMAs are apparently so common these days. Interesting.

                1. 5

                  NUMA is becoming increasingly common, especially when the hyperscalers are primarily limited by real estate footprint, it becomes more effective at scale for growing capacity to pack as many CPUs as thermals will allow. With AMD’s Zen parts there’s even 1P NUMA systems now, as they attempt to make the most of economies of scale by manufacturing high core count parts as multi-chip modules.

                  1. 4

                    One thing I’ve seen in the past (not sure if it since changed) with low-end rackmount boxes is that when I wanted to pack a lot of RAM into a single server, there were ranges where it’s cheaper to pay to put a second CPU in the server (even if you’re expecting it to have zero utilisation) just to be able to use more DIMM slots ’cuz the less heavily stacked DIMMs were much cheaper.

                    AIUI if you have more than one CPU socket filled, you have NUMA.

                  2. 1

                    IMO, the paragraph on Turbo Boost is misleading. Usually, low-end/laptop CPU are able to enable Turbo Boost for one core only. While Xeon, depending on the range, may be able to use Turbo Boost on all cores at the same time. The more cores, the less the Turbo Boost frequency, but when Turbo Boost is enabled on all cores, it is still higher than the base frequency. To see an example, look at https://en.wikichip.org/wiki/intel/xeon_gold/5120. Base frequency is 2.2 GHz, but all-core Turbo Boost is 2.6 GHz. And with a proper cooling system, there is no reason to not reach this frequency (unlike a laptop, servers are usually correctly cooled).

                    Also, usually VM (since you mention Google Cloud) may not have access to the appropriate MSR (APERF/MPERF) to see the current frequency of the CPU. So, they will only show the base frequency.

                    Also, for memory, Xeon processors usually have more memory channels available. For example, the one mentioned above has 6 channels (more recent ones may have 12). On a laptop, there is rarely more than 2 sticks of RAM, so a quarter of the speed on large transfer. All the more than laptops may be stuck with LPDDR3 while Xeon servers are using DDR4.

                    1. 2

                      I think the argument was that if you’re already using all cores for a long period of time, it’s too hot to enable turbo boost.

                      On a related note, it’s worth disabling turbo boost on your local machine when benchmarking, because it introduces a lot of variance that can confuse results. For my laptop I use:

                      echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
                      cpupower frequency-set -g performance
                      cpupower frequency-set --min 2.80GHz
                      1. 1

                        Turbo boost bumps the clock frequency when the core is both busy and cool. It is a marketing name for thermal throttling. A busy core rapidly stops being cool. High-density cloud hosting often aims to pack loads together, so the probability of all of your cores being hot is much higher and so you won’t get any turbo boost.

                      2. 1

                        This is actually a huge reason why I do so much of my development in kubernetes as I go, instead of hosting everything in post. While developing, not only do you get to deal with deployment issues in real time, and savings in time from having to handle a fragile dev environment ( esp if your deployment relies on multiple servers working together ) - you can see realistic performance in real time. Very underrated and you never have to wonder how things will actually perform in prod.