Threads for Xorlev

  1. 27

    Getting rightfully shredded as closed-source spyware over at HN: https://news.ycombinator.com/item?id=30921231

    1. 7

      Also being prodded for using the name “Warp” (the name of a popular crate) and also trading on Rust’s name for marketing.

      1. 4

        Yea they are roasting the CEO alive and rightfully so.

      1. 2

        I really appreciate the error handling. It looks like Rust, but pretty cleanly maps into Erlang’s model of returning an error tuple.

        Between Gleam and Elixir, BEAM has a pretty good shot at more mainstream adoption over the next few years.

        1. 2

          The Google Slicer paper(pdf) is a good read. I believe that many applications benefit greatly from an above-database stateful layer, especially at scale where hot rows and hot entity groups become a real concern, or when you find yourself doing things like polling a database for completion status.

          Stateful services aren’t right for every use, but when used well they greatly simplify your architecture and/or unlock really compelling use-cases.

          1. 1

            Oooh this looks great, adding it to my paper reading list.

          1. 4

            I’ve seen similar articles to this, but what I haven’t seen is a compelling list of reasons why you’d want to do this. Yeah the world is tight on public IPv4 addresses, but NAT is a thing and it doesn’t seem as dire as everyone said it was ~25 years ago.

            1. 15

              NAT means extra call latency. It means paying for extra IPs if you want to have separate data and management endpoint. It means getting rate limited and captcha’d because someone using the same ISP as you was misbehaving.

              1. 8

                As @viraptor said, NAT is bad for latency due to circuitous routing. It also makes direct connections on the net really difficult which makes it hard for any P2P protocol to take root. The limited IPv4 range also makes it really hard to send email or do anything else where IP reputation matters since there’s a high likelihood that a bad actor had an IP at some given point in time.

                1. 2

                  I agree on the latency, but you can’t expect the return of P2P connectability thanks to ipv6 because everybody will still be running a stateful firewall that drops all unsolicited incoming packets.

                  There are some upnp like mechanisms for ipv6 to punch holes through firewalls, but they are much less common than their ipv4 counterparts and even if they were, at most you get as good connectivity, but hardly better.

                  1. 2

                    The limited IPv4 range also makes it really hard to send email or do anything else where IP reputation matters since there’s a high likelihood that a bad actor had an IP at some given point in time.

                    On the flip side, won’t this make it very difficult to block bad actors?

                    1. 10

                      Relying on IP reputation has always been a terrible way to do security. There’s much better ways to do security.

                      1. 2

                        Moving away from IP-based reputation seems like a decent way to get back to a world where running your own mail server is possible again.

                        1. 1

                          or have the opposite effect, because google, etc decide to only allow a whitelisted group of IPs from “good” mail providers.

                      2. 5

                        I’d rather have a hard time blocking bad actors than accidentally block good ones

                    2. 5

                      Ironically, private IPv4 ranges and NAT make it much easier to actually have a home network where all gear has its own fixed address and you can connect to it.

                      Most providers that bother to provide IPv6 on consumer connections at all use DHCP-PD in the worst possible way—the prefix they give you actually changes from time to time. That way you never know what exact address a device will get, and need a service discovery mechanism.

                      With NAT, even if the ISP gives me a different WAN IPv4 address every time, that doesn’t affect any hosts inside the network.

                      1. 7

                        The big thing in IPv6 is “multiple addresses all the things”. Yeah, the public address for your device will change a lot, both due to prefix changes and due to privacy extensions. If you want a stable local address at home, don’t use the public one, use a ULA prefix.

                        1. 2

                          Giving things names is a lot nicer to work with than remembering IP addresses, though. mDNS+DNS-SD is good tech.

                          1. 3

                            mDNS is problematic for security because, well, there isn’t any. Any device on your network can claim any name. No one issues TLS certificates in the .local TLD that mDNS uses and so you also can’t rely on TLS for identity unless you’re willing to run a private CA for your network (and manage deploying the trusted root cert to every client device that might want to connect, which will probably trigger any malware detection things you have installed because installing a new trusted root is a massive security hole).

                            1. 1

                              It’s only for the local network, and I trust my network. It get trickier if you don’t, of course.

                              1. 2

                                It’s not about trusting your network, it’s about trusting every single entity on the network. Any phone that someone brings to your house and connects to the WiFi can trivially claim any mDNS name and replace the device that you think is there. This is mostly fine for things like SSH, where key pinning give you an extra layer of checks, but isn’t for most protocols.

                                1. 2

                                  I meant to write that I trust the devices on my network, but to access my wifi they’ll need my password - which they don’t get if I don’t trust them 🤷‍♂️

                                  Given the convenience and lack of reasonable things to fear could happen it’s a net win for me, at least.

                                  1. 2

                                    Do you ever hand out the password to people that visit your house? Do you allow any IoT devices that don’t get security updates anymore? Do you run any commodity operating systems that might be susceptible to malware? If you answer ‘yes’ to any of these, then mDNS provides trivial a mechanism for an attacker who compromises any of these devices to impersonate other devices on your network.

                                    1. 2

                                      Don’t use the same network for all those? :)

                                      I have a separate subnet (with no outbound internet access other than to an NTP server) for “Internet LAN of Things” devices, another one for guest Wi-Fi, and another one for my personal devices that I can actually trust.

                                      1. 1

                                        I use a separate vlan for guests. Problem solved.

                          2. 4

                            fundamentally the amount of devices behind NAT is limited by the amount of open sockets the firewall can have for tcp connections made by internal clients… the shortage is still relevant but we have kicked the can down the proverbial road, I’d wager another 10 years before enough connected devices seriously clog the available ipv4 space.

                            1. 6

                              Centralisation has also played a big part. 15 years ago, we expected to have a load of Internet connected devices in houses that you’d want to be able to reach from any location in the world. We now have that but they all make a single outbound connection to a cloud IoT service and that’s the thing that you connect to from anywhere in the world. You need a single routable IPv4 address for the entire service, not one per lightbulb. That might not be great for reliability (it introduces a single point of failure) but it seems to be the direction that has succeeded in the market.

                              1. 7

                                I think technology (lack of IPv4 addresses) and business needs (having your customers create an account and letting you see how they use your products is incredibly valuable) have converged to the “cloud service” model.

                                Although, even if every lightbulb in your home has its own IPv6 address services to help manage them would spring up quite quickly, and the natural way to solve the problem would be a semi-centralized service gathering them all under one “account”.

                          1. 7

                            There’s a few places where you say something is Haskell’s strong point but don’t explain what it is, like higher-kinded types and defunctionalization.

                            1. 2

                              I agree. Personally, the article could have used some punctuating examples and less exposition.

                              1. 1

                                Higher kinder types refers to the ability to talk about a list, even though a list isn’t by itself a type; A “list of integers”, or “a list of strings” are types, list, by itself, is sort of like a type level function, but you can still use it alone in many contexts. For instance list has a functor instance without mentioning its parameter, or you can write a function that’s polymorphic over the container and pass list as the container at a use site. It goes much deeper than that of course, it gives rise to abstractions like Monad, data modelling tools like rank2classes, programming styles like Functor oriented programming, or generally improves the ergonomics of writing code that avoids depending on the structure of its input that’s irrelevant to what it does (by leaving the functor(container) polymorphic and constraining it by interfaces).

                                The bit about defunctionalization surprised me too. Defunctionalization is a very useful technique, but it’s independent of Haskell. It means, instead of passing functions around, you pass around data (first order, serislisable, i.e. good old data) that describes those functions. Since these descriptions are just data, they can be stored or sent over the wire. I guess the author means that ADTs and the data modelling techniques afforded by the type system (such as the rank2classes) make it more ergonomic to defunctionalize programs. Because when you start doing interesting things with defunctionalization, your data type that describe these functions start to look more and more like the AST of a DSL, and Haskell is the king of DSLs :)

                              1. 1

                                This is cursed, but I love it anyways. I greatly enjoyed how many hoops the author went through to get this working.

                                1. 28

                                  I’ve watched friends try Go and immediately uninstall the compiler when they see that the resulting no-op demo program is larger than 2 MiB.

                                  Something about the wording on this kind of bothered me. If said friends were evaluating a language solely on the size of the output binary, perhaps they weren’t really evaluating Go for what it’s good for?

                                  As it turns out, a statically-linked Go binary with libraries and the runtime costs a few megabytes 1. That’s a lot of functionality for a few megabytes. As mentioned in 1, a similar statically-linked hello-world C binary is also pushing a mb (admittedly, I haven’t tried it, but I have no reason to doubt it). There’s a lot of things Go is good for, but golfing binary size isn’t one of them. Though, you can find plenty of examples 2 3 of ways to shrink Go binaries (and account for binary size) if that’s what you’re really looking for.

                                  I’m not really a Go advocate, I spend my professional time in Java/C++ and my personal time in Rust. I even like the look of Zig (I’m firmly in the “anything but C” camp), but this felt like a bit of an invented example.

                                  1. 5

                                    The large binaries makes it impractical to use Go WASM in the browser. You can work around it by using the Tiny Go compiler, but that’s not fully compatible with regular Go. OTOH, WASM is still a sort of niche usecase, so in practice it’s not a big deal.

                                    1. 4

                                      I remember compiling a Motif application in the 90’s on some platform where the Motif libraries weren’t particularly well broken up and shared libraries weren’t well supported either and a “hello Motif world!” application was >1MB…and this was back when 1MB was a significant portion of your disk quota.

                                      1. 2

                                        It depends a lot on the use case. If you wanted to write a small command-line tool in the language then a fixed 2 MiB overhead per binary would be huge. Even something like BusyBox would likely see a large increase. FreeBSD ships with a Busybox-like single statically linked binary that includes all of the core utilities in /rescue and it’s 13 MiB - an extra 2 MiB there would be quite noticeable. That said, in FreeBSD’s libc, jemalloc is around 1MiB and that’s needed for any non-trivial statically linked binary (snmalloc is smaller).

                                        Even that’s not a great comparison though, because most programs do something. The really important question is how rapidly that grows. If that 2 MiB includes a load of functionality that you’d bring yourself then it may be that a 2MiB fixed overhead ends up being better than a small multiplier on binary size as you increase complexity. C++ templates, for example, can be used well with inlining to give tiny incredibly specialised code (the fast path for snmalloc’s malloc function is split across several templated functions in the source code and compiles down to around a dozen instructions) but it can also cause a rapid growth in code size if you use templates too aggressively and don’t explicitly factor out common code into a non-templated superclass for multiply instantiated templates (the Windows linker will discard identical template functions by default, but that’s technically a standards violation as the address of these functions should compare not equal).

                                        1. 3

                                          If you wanted to write a small command-line tool in the language then a fixed 2 MiB overhead per binary would be huge.

                                          If you wanted to write it for a constrained environment, yes. If it’s something that will run on user’s desktops, like some internal tool for developers, it’s nothing. The static linked nature of go binaries, and it’s consequential ease of deployment in uncontrolled environments, is a MUCH bigger advantage (believe me, I wrote internal cli tools in python).

                                          1. 1

                                            Yup, though ‘constrained environment’ covers a lot. For example, I think the base container image for Alpine is around 4 MiB. A 2 MiB overhead in a single tool that runs there is not a big problem but a 2 MiB overhead in each of 100 tools that you run in there will have a noticeable impact on deployment times. Typically, you don’t have 100 tools in a single container (if you do, there’s a good chance that you’ve missed the point of containers).

                                            I probably wouldn’t want an extra 2 MiB on the 13 MiB binaries in /rescue. I definitely wouldn’t want it in the stand-alone versions in /sbin and /bin and so on, because that would roughly double the size of a base VM image, which would increase costs noticeably for cloud deployments. I would be completely fine with it in something like containerd or git. For anything running on a developer desktop, 2 MiB is completely in the noise.

                                            1. 1

                                              Well, last job I was building and deploying images sometimes well over 1 GB, I guess my perspective is biased there.

                                              Was it kind of a pain in the ass sometimes? Sure, but it worked, we run our shit. So, if that’s doable, I’m not gonna prioritize 2MB overhead unless I REALLY need to.

                                          2. 1

                                            technically a standards violation as the address of these functions should compare not equal).

                                            Could be done conservatively. But that seems like a standard bug to me. See e.g. lisp permitting coalescing of literals.

                                            (There are formal definitions of equality, which can be applied to functions. It is impossible to determine all cases when two functions are equivalent; nevertheless such a definition is appropriate for a language standard. It is obvious that when two functions comprise exactly the same code they are equivalent.)

                                            1. 1

                                              Busybox-like single statically linked binary (…) and it’s 13 MiB - an extra 2 MiB there would be quite noticeable.

                                              I think that comparison misses an important thing - it wouldn’t be 2MB of extra code. That space contains useful runtime stuff which larger programs have to implement one way or another. A lot of that runtime already exists in busybox, so rewriting it in go could just as well not change the size.

                                          1. 22

                                            Help me. I am not a Ruby person and I probably never will be. I just cannot figure out what Hotwire is. I have read this post, I have read the Hotwire homepage, I have googled it, I cannot for the life of me figure out what it actually is.

                                            I keep reading “HTML over the Wire” but that is how normal websites work. What is different?

                                            1. 45

                                              you know how HTML is usually transferred over HTTP? well, Hotwire just transfers that same HTML over a different protocol named WebSockets.

                                              that’s it. that’s the different.

                                              1. 9

                                                Thanks. Your explanation saves me countless hours.

                                                1. 9

                                                  what on earth

                                                  1. 28

                                                    For others who may be confused: This is dynamic HTML over web sockets. The idea is that the client and server are part of a single application which cooperate. Clients request chunks of HTML and use a small amount of (usually framework-provided) JS to swap it into the DOM, instead of requesting JSON data and re-rendering templates client-side. From the perspective of Rails, the advantage of this is that you don’t need to define an API – you can simply use Ruby/ActiveRecord to fetch and render data directly as you’d do for a non-interactive page. The disadvantage is that it’s less natural to express optimistic or client-side-only behaviors.

                                                    1. 1

                                                      Ah. That … sort of makes sense, honestly?

                                                      1. 7

                                                        it’s not really a new idea, they stole it from phoenix liveview. https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html

                                                        1. 6

                                                          … which I guess in turn is the spiritual successor to TurboLinks

                                                          1. 5

                                                            It’s worth noting that idea isn’t new with Phoenix. Smalltalk’s Seaside web framework had this back in 2005 or so (powered by Scriptaculous), and WebObjects was heading down that path before Apple killed it.

                                                            Phoenix LiveView looks great, and is likely the most polished version of the concept I’ve seen, don’t get me wrong. But I don’t think DHH is “stealing” it from them, either.

                                                            1. 5

                                                              There’s a lot of implementations of it, here’s a good list: https://github.com/dbohdan/liveviews

                                                              1. 4

                                                                Not unsurprising, given that Phoenix is designed by a prolific Rails contributor. There’s a healthy exchange.

                                                                https://contributors.rubyonrails.org/contributors/jose-valim/commits

                                                                1. 5

                                                                  Moreover, DHH has been experimenting with these techniques since roughly the same time that Elixir (not even Phoenix) first appeared: https://signalvnoise.com/posts/3697-server-generated-javascript-responses

                                                              2. 1

                                                                In addition to brandonbloom’s excellent points, I personally liken it to the Apple/Oxide push for hardware and software being signed together. This type of tech makes it much easier to keep frontend and backend designs coherent. It is technically possible to do with SPAs and the APIs they rely on but the SPA tech makes it too easy (at an organizational level) to lose track of the value of joint design. This tech lowers the cost of that joint design and adds friction to letting business processes throw dev teams in different directions.

                                                          2. 4

                                                            Right. These days, a lot of normal websites transfers JSON over WebSockers and piece the HTML together on the client side with JavaScript, and Hotwire is a reaction against that, bringing it back to transfering HTML.

                                                            1. 3

                                                              Really? How is that the answer to all of life’s problems, the way DHH is carrying on?

                                                              1. 10

                                                                because now your (data) -> html transformations all exist in one place, so you don’t have a server-side templating language and a client-side templating language that you then have to unify. also the performance of client-side rendering differs more substantially between devices, but whether that’s a concern depends on your project and its audience. just different strategies with different tradeoffs.

                                                                1. 4

                                                                  Yes, these are good points; another is that you have basically no client-side state to keep track of, which seems to be the thing people have the most trouble with in SPAs.

                                                                  1. 1

                                                                    Great for mail clients and web stores, the worst for browser games.

                                                                    1. 1

                                                                      depends on the game. A game that runs entirely in the browser makes no sense as a Hotwire candidate. But a for a game that stores its state in the server’s memory it’s probably fine. Multiplayer games can’t trust the state in the client anyway.

                                                                      1. 1

                                                                        If you genuinely need offline behavior, or are actually building a browser-based application (e.g., a game, or a photo editor, etc.), something like Hotwire/Liveview makes a great deal of sense.

                                                                        At least until you get to a certain scale, at which point you probably don’t want to maintain a websocket if you can help it, and if you are, it’s probably specialized to notifications. By that time, you can also afford the headcount to maintain all of that. :)

                                                            1. 3

                                                              “Don’t have long-running transactions” is database ops No 1 commandment surely?

                                                              I’m kind of surprised they still don’t know where the long running transactions are coming from & have spent a ton of developer time eliminating sub-transactions from their code all over the place instead of tracking them down. Is it really impossible to instrument Postgres to warn on long running transactions with a dump of the SQL query that triggered that transaction?

                                                              (I haven’t done any serious database work in >> decade, so maybe this is actually hard?)

                                                              1. 4

                                                                Not all long-running transactions are user initiated. Example from their post:

                                                                There was a long-running transaction, usually relating to PostgreSQL’s autovacuuming, during the time. The stalls stopped quickly after the transaction ended.

                                                                […]

                                                                In our database, it wasn’t practical to eliminate all long-running transactions because we think many of them happened via database autovacuuming, but we’re not able to reproduce this yet.

                                                                They’ve done an excellent job engaging the right people to solve the problem, and frankly I agree with their approach. Running a patched version of Postgres isn’t guaranteed to solve their problem and may well be an operational headache.

                                                                1. 4

                                                                  Sure, but can the database not identify the source of long-running transactions? I thought this was one of the first things you’re supposed to look at when tracking down database performance issues, so surely it ought to be a well-trodden path?

                                                                  At the moment they’re simply speculating that the transactions come from vacuums - they don’t actually know! (Unless I misread the blog post?)

                                                                  Is PostgreSQL is not capable of tracking transactions like this?

                                                              1. 1

                                                                It’s gorgeous. Thank you!

                                                                1. 2

                                                                  Quadratic algorithms pop up everywhere, and they aren’t always obvious, especially in software with a great deal of abstraction. Sometimes you’re calling into a method which does far more work than you expect. In a system I’ve worked on, it caches certain structures to make the common case fast, but evicts the cache on mutation and forces a rebuild on next read (don’t ask me why it’s not maintained incrementally). If you’re doing a read-modify-read, now you’re quadratic. Yay.

                                                                  Worse, you don’t always have a good way to preempt it. On the JVM, threads are cooperatively scheduled. If a thread doesn’t yield, it keeps running. So those algorithms that occasionally take hours to run? Bad news, you just lost a thread from your executor (and probably a core, or at least part of one) for the next few hours. The request you just received probably timed out and retried, so you lose another thread. If you’re lucky, you have enough capacity to handle it, or notice early enough to kick some tasks while you rush a fix in.

                                                                  Running services is hard.

                                                                  1. 6

                                                                    this looks cool, and kudos for using LMDB. The 35MB binary size leapt out at me, though — what contributes to that? (LMDB itself is only ~100KB.) Lots of stemming tables and stop-word lists?

                                                                    1. 4

                                                                      Rust isn’t entirely svelte either. It doesn’t take too many transitive dependencies before you’re at 10MB.

                                                                      1. 3

                                                                        How do you call yourself minimalist when you’re pulling in that many dependencies?

                                                                        1. 11

                                                                          To be clear, I’m not the author. But if I were, this would come off more as a personal dig than a real question. Be kind. :)

                                                                          1. -2

                                                                            Holy passive aggressiveness, batman :)

                                                                            Remind me to avoid rhetorical questions in the future.

                                                                          2. 7

                                                                            It depends what you compare it to. An Elasticsearch x86-64 gzipped tarball is at > 340MB https://www.elastic.co/downloads/elasticsearch

                                                                        2. 4

                                                                          If anyone wants to figures this out, two tools to use are:

                                                                          I am 0.3 certain that at least one significant component is serialization code: rust serialization is fast, but is rumored to inflate binaries quite a bit. I haven’t measured that directly, but I did observe compile time hits due to serialization.

                                                                          1. 3

                                                                            My guess is assets for the web UI are packed into the binary

                                                                          1. 8

                                                                            I wish that we had a better way to refer to this than as “nines”. I agree with all of your points; I wish that folks understood that going from X nines to X + 1 nines is always going to cost the same amount of resources.

                                                                            Here’s a further trick that service reliability engineers should know. If we compose two services which have availabilities of X nines and Y nines respectively into a third service, then the new service’s availability can be estimated within a ballpark of a nine by a semiring-like rule. If we depend on both services in tandem, then the estimate is around minimum(X, Y) nines, but should be rounded down to minimum(X, Y) - 1 for rules of thumb. If we depend on either service in parallel, then the estimate is maximum(X, Y) nines.

                                                                            As a technicality, we need to put a lower threshold on belief. I use the magic number 7/8 because 3-SAT instances are randomly satisfiable 7/8 of the time; this corresponds to about 0.90309 nines, just below 1 nine. So, if we design a service which simultaneously depends on two services with availabilities of 1 nine and 2 nines respectively, then its availability is bounded below 1 nine, resulting in a service that is flaky by design.

                                                                            1. 4

                                                                              If we depend on either service in parallel, then the estimate is maximum(X, Y) nines.

                                                                              Shouldn’t that be X + Y? If you have a service that can use either A or B, both of which are working 90% of the time, and there is no correlation between A working and B working, then at least A or B should work 99% of the time.

                                                                              It’s possible that I misunderstand you, because I don’t understand the last paragraph at all.

                                                                              1. 2

                                                                                If you have two services with a 10% failure rate (90% uptime), the odds of both failing are .1 x .1 = 1% (99% uptime).

                                                                                1. 2

                                                                                  I had a hidden assumption! Well done for finding it, and thank you. I assumed that it was quite possible for the services to form a hidden diamond dependency, in which case there would be a heavy correlation between dependencies being unavailable. When we assume that they are independent, then your arithmetic will yield a better estimate than mine and waste fewer resources.

                                                                                2. 2

                                                                                  Article says:

                                                                                  Adding an extra “9” might be linear in duration but is exponential in cost.

                                                                                  You say:

                                                                                  I wish that folks understood that going from X nines to X + 1 nines is always going to cost the same amount of resources.

                                                                                  I’m not sure what the article means by “linear in duration” or what you mean by “the same amount”. That said, your comment seems to conflict with my understanding: because 9s are a logarithmic scale, going from X to X + 1 9s should be expected to take an order of magnitude more resources than going from X - 1 to X 9s, and that’s an important fact about 9s. How am I misunderstanding your comment?

                                                                                  1. 1

                                                                                    Let me try specific examples first. Let’s start at 90% (1 nine) and go to 99% (2 nines). This has a cost, and got us a fixed amount of improvement worth 9% of our total goal. If we do that again, going from 99% to 99.9% (3 nines), then we get another fixed amount of improvement, 0.9%. My claim is that the cost of incrementing the nines is constant, which means that we get only roughly a tenth of the improvement for each additional nine. The author’s claim is that the overall cost of obtaining a fixed amount of our total goal is exponential; we get diminishing returns as we increase our overall availability. They’re two ways of looking at the same logarithmic-exponential relation.

                                                                                    I don’t know what the author is thinking when they say “linear in duration”. I can understand their optimism, but time is only one of the costs that must be considered.

                                                                                    1. 3

                                                                                      Tbh, I’m still confused by the explanation. I’d make a simpler example claim - adding an extra nine does not have the same cost. Going from 90% to 99% is close to free. Going from 99.999% to 99.9999% is likely measured in millions of $. (with an exponential growth for every 9 in between) (we may agree here, I’m not sure :) )

                                                                                      1. 1

                                                                                        This hasn’t been my experience. I have seen services go from best-effort support (around 24% availability for USA work schedules) to a basic 90% or 95% SLA, and it takes about two years. A lot of basic development has to go into a service in order to make it reliable enough for people to start using it.

                                                                                      2. 3

                                                                                        I’m also rather confused by your claim. Are you saying that going from 99.9 -> 99.99 “costs” the same amount as going from 99->99.9, but you get 10x less benefit for it? I think that’s a rather confusing way to look at it, since from a service operator’s perspective, you’re looking for “how much effort do I need to expend to add a 9 to our reliability?” I also disagree that the cost for a 9 (the benefit aside) is at all linear.

                                                                                        Going from 90->99 might be the difference between rsyncing binaries and running it under screen to building binaries in CI and running it under systemd. Going from 99.9->99.99 is very clearly understanding your fault domains and baking redundancy into multiple layers, geographic redundancy, canaries, good configuration change practices. 99.999 is where you need to start thinking about multiple providers (not just geographic redundancy), automated recovery, partial failure domains (i.e., fault-focused sharding), much longer canaries, isolation between regions.

                                                                                        The effort (and cost) required to achieve greater reliability increases by at least an order of magnitude for each nine, and to your point, it’s also worth less.

                                                                                        1. 1

                                                                                          I appreciate your focus on operational concerns, but code quality also matters. Consider this anecdote about a standard Web service. The anecdote says that the service restarted around 400 times per day, for an average uptime of 216 seconds. Let’s suppose that the service takes one second to restart and fully resume handling operational load; by design, the service can’t possibly exceed around 99.5% availability on a single instance.

                                                                                          In some sense, the tools which you espouse are not just standard ways to do things well, but also powerful levers which can compensate for the poor underlying code in a badly-designed service. While we might be able to eventually achieve high reliability by building compositions on top of this bad service, we should really consider improving the service’s code directly too.

                                                                                          1. 1

                                                                                            I think we’re generally on the same page here: I’m not saying you don’t need to improve your service’s code. Quite the opposite. “Baking redundancy into multiple layers” and “understanding your fault domains” fall into this category.

                                                                                            There’s also just general bugfixing and request profiling. A pretty typical way of measuring availability is by summing the requests that failed (for non-client reasons) and diving it by the total number of requests. Investigating the failed requests often leads to improvements to service behavior.

                                                                                            That being said, there will still be unknowable problems: a cache leaks data and eventually takes down the task. You need multiple tasks keep servicing requests while you solve the problem. A client query of death starts hitting your tasks: if you’re lucky, you have enough tasks to not notice, but perhaps they’re making requests fast enough that your entire fleet is downed. Perhaps they should have been consistently directed to a smaller pool of tasks to limit their blast radius.

                                                                                            You need both a well-written service and systemic reliability. The effort is greatly increased with every 9.

                                                                                  1. 3

                                                                                    Good read. I agree with most of it, but found the proposed solution a bit wierd. Maybe I misundersted, but isn’t long pooling a problem for setups like Django for example, where you have a limited number of worker processes?

                                                                                    1. 5

                                                                                      They’re hard to maintain in general. A lot of web technology assumes you have short-lived requests. Not to say it isn’t possible or even increasingly common, but it’s a moderate investment.

                                                                                      I agree with the thrust of the article (webhooks are insufficient) but disagree with the conclusion that only polling or only listening to webhooks is the way. Poll, but also trigger a sync when you receive a webhook. It’s easy to reason about and doesn’t require much in the way of special infrastructure.

                                                                                      1. 4

                                                                                        it’s easy to reason about and doesn’t require much in the way of special infrastructure.

                                                                                        Unless you’re a firewalled client. Then it requires special infrastructure, DNS, etc.

                                                                                        I think the cost of long-polling is overestimated by a lot of people. I have systems with over a million open connections each: each handle costs only about 2kb. Maybe long-polling is hard in some frameworks, so perhaps it’s worth adding some middleware?

                                                                                        1. 1

                                                                                          Long polling, SSE, and web sockets all have one problem in common: the endpoint a client polls needs to know about new events. With plain polling the endpoints simply read from the database when they get a request.

                                                                                          In your long polling setup, how do your client facing endpoints get new events? Do they themselves poll, or have events pushed to them, or something else?

                                                                                          1. 2

                                                                                            the endpoints simply read from the database when they get a request.

                                                                                            This just kicks the can: How did the database know about the change? Someone did an INSERT statement or an UPDATE statement. If you’re using Postgres they could have additionally done a NOTIFY (or a TRIGGER could have been created to do this as well automatically).

                                                                                            A decade ago, when I used MySQL, I had a process read the replication log and distribute interesting events to fifos that a PHP client would be consuming.

                                                                                            In your long polling setup, how do your client facing endpoints get new events? Do they themselves poll, or have events pushed to them, or something else?

                                                                                            They have events pushed to them: On client-connection, a subscription is made to receive updates, and on disconnection the endpoint unsubscribes. If you’re using Postgres this is just LISTEN. If you’re using erlang, you just use rpc+disk_log (or whatever). If you’re using q, you -11! the client log and hopen the realtime. In PHP, I’ve had a bank of fifos in /clients that we just read the events from. And so on.

                                                                                      2. 2

                                                                                        isn’t long pooling a problem for setups like Django for example

                                                                                        You have X customers, so you simply need to handle X connections. If you allow less than that you have queueing which may be fine for a while, but really you need to be able to handle that anyway: otherwise your customers get errors and one customer can (trivially) deny another.

                                                                                        If X is larger than fits on one machine, you might want multiple machines anyway. Your load-balancer can route long-polls by customer and disconnect old pollers. Regular polling is a bit harder to capacity-plan.

                                                                                        1. 1

                                                                                          Long-polling also shifts cost from the client to the provider. The provider has to have sockets open, RAM allocated, and potentially a thread per request.

                                                                                          There are ways to implement that don’t consume a thread per connection. These usually involve async I/O which is tricky in vanilla Java, C#, and the like. Actor based frameworks and languages make it easier, but the RAM and socket costs are still there.

                                                                                          1. 1

                                                                                            Long-polling also shifts cost from the client to the provider. The provider has to have sockets open, RAM allocated, and potentially a thread per request.

                                                                                            There are ways to implement that don’t consume a thread per connection. These usually involve async I/O which is tricky in vanilla Java, C#, and the like. Actor based frameworks and languages make it easier, but the RAM and socket costs are still there.

                                                                                          1. 10

                                                                                            Really nice article, @JeremyMorgan! There’s just one thing that’s bugging me, and that is the use of the Fahrenheit-scale, which is only used in the US, in Liberia and the Cayman Islands, while the rest of the world uses Celsius.

                                                                                            This article is your project and I’m not in the position to order you to do anything, but if you intended this article to be for an international audience, this is a critical error. If you only intended this article for US-American readers (and those few from the Cayman Islands and Liberia), I can understand the choice, but please keep this in mind in the future in case it is different.

                                                                                            1. 17

                                                                                              While I agree that Celsius is the superior unit, the article is really about building the device and measurement system, rather than takeaway about how hot the car is. It really isn’t the author’s job to translate the units from something they’re not comfortable with – it’s not a scientific paper.

                                                                                              Calling use of Fahrenheit a “critical error” on a hobbyist blog post (even one viewed internationally) seems overly dramatic, don’t you think? It’s not as if it’s written in Klingon (though, he’d be welcome to that too on a blog, albeit with a vastly smaller audience.)

                                                                                              1. 1

                                                                                                Arguably since the audience is primarily those in the east and west coasts of the United States, Fahrenheit is the superior scale.

                                                                                                1. 1

                                                                                                  I did convert it to Fahrenheit during storage, but I can convert it all to Celcius in another database! Thank you for the idea to enhance the project!

                                                                                                  1. 1

                                                                                                    I hope I didn’t come across too negatively, which wasn’t my intention. Thanks for your feedback!

                                                                                                    1. 1

                                                                                                      Not at all thanks. Over the weekend I did another stream and added celsius columns!

                                                                                                      https://github.com/JeremyMorgan/HotCar

                                                                                                  2. 1

                                                                                                    Celsius is inferior for this purpose. The degrees are lower precision, and the scale doesn’t connect to human health, which is the focus of the article. 100F is approximately the temperature of the human body, and consequently, the point at which air temperatures become dangerous to people. Saying a car is at 130F clearly conveys that it is an extremely dangerous environment for humans.

                                                                                                    For cooking and chemistry, Celsius is fine, but for weather, it’s gotta be Fahrenheit.

                                                                                                  1. 32

                                                                                                    When an error in your code base can take down millions of users who depend upon it for vital work you should

                                                                                                    1. Have good CI
                                                                                                    2. Have extensive tests
                                                                                                    3. Make small changes at a time
                                                                                                    4. Have at least one set of extra eyes looking at your changes
                                                                                                    1. 15
                                                                                                      1. Make use of language features that push you towards correctness, for example static typing.
                                                                                                      1. 8

                                                                                                        I find it shocking how many people love “dynamic languages”

                                                                                                        1. 7

                                                                                                          I don’t. There’s a lot of neat tricks you can do at runtime in these systems that would require 10x more work to do at build time, because our build tools are awful and far too difficult to work with. Problem is that we only have the build-time understanding of things while we’re actually programming.

                                                                                                          Don’t get me wrong, I disagree with taking this side of the trade-off and I don’t think it’s worth it. But I also realise this is basically a value judgement. I have a lot of experience and would expect people to give my opinions weight, but I can’t prove it, and other rational people who are definitely no dumber than me feel the opposite, and I have to give their opinions weight too.

                                                                                                          If our tooling was better (including the languages themselves), a lot of the frustrations that lead people to build wacky stuff that only really works in loose languages would go away.

                                                                                                          1. 7

                                                                                                            I don’t, because I used to be one of those people. Strong type systems are great if the type system can express the properties that I want to enforce. They’re an impediment otherwise. Most of the popular statically typed languages only let me express fairly trivial properties. To give a simple example: how many mainstream languages let me express, in the type system, the idea that I give a function a pointer to an object and it may not mutate any object that it reaches at an arbitrary depth of indirection from that pointer, but it can mutate other objects?

                                                                                                            Static dispatch also often makes some optimisations and even features difficult. For example, in Cocoa there is an idiom called Key-Value Coding, which provides a uniform way of accessing properties of object trees, independent of how they are stored. The generic code in NSObject can use reflection to allow these to read and write instance variables or call methods. More interestingly, this is coupled with a pattern called Key-Value Observing, where you can register for notifications of changes before and after they take place on a given object. NSObject can implement this by method swizzling, which is possible only because of dynamic dispatch.

                                                                                                            If your language has a rich structural and algebraic type system then you can do a lot of these things and still get the benefits of a static type checking.

                                                                                                            1. 2

                                                                                                              Regarding your example, honestly I am not 100% sure that I grasp what you are saying.

                                                                                                              In something like C++ you can define a constant object and then explicitly define mutating parts of it. But I don’t think that quite covers it.

                                                                                                              I have enjoyed some use of Haskell a few years back and was able to grasp at least some of it. But it gets complicated very fast.

                                                                                                              But usually I am using languages such as c# and typescript. The former is getting a lot of nice features and the latter has managed to model a lot of JavaScript behaviour.

                                                                                                              But I have no problem admitting that type systems are restrictive in their expressibility. But usually I can work within it without too many issues. I would love to see the features of Haskell and idris, and others become widely available - but the current languages don’t seem interested in that wider adoption.

                                                                                                              1. 3

                                                                                                                Regarding your example, honestly I am not 100% sure that I grasp what you are saying.

                                                                                                                In something like C++ you can define a constant object and then explicitly define mutating parts of it. But I don’t think that quite covers it.

                                                                                                                I don’t want an immutable object, I want an immutable view of an object graph. In C++ (ignoring the fact that you can cast it away) a const pointer or reference to an object can give you an immutable view of a single object, but if I give you a const std::vector<Foo*>&, then you are protected from modifying the elements by the fact that the object provides const overloads of operator[] and friends that return const references, but the programmer of std::vector had to do that. If I create a struct Foo { Bar *b ; ... } and pass you a const Foo* then you can mutate the Bar that you can reach via the b field. I don’t have anything in the type system that lets me exclude interior mutability.

                                                                                                                This is something that languages like Pony and Verona support via viewpoint adaptation: if you have a capability that does not allow mutation then any capability that you load via it will also lack mutation ability.

                                                                                                                But usually I am using languages such as c# and typescript. The former is getting a lot of nice features and the latter has managed to model a lot of JavaScript behaviour.

                                                                                                                Typescript is a dynamic language, with some optional progressive typing, but it tries really hard to pretend to be a statically typed language with type inference and an algebraic and structural type system. If more static languages were like that then I think there would be far fewer fans of dynamic languages. For what it’s worth, we’re aiming to make the programmer experience for Verona very close to TypeScript (though with AoT compilation and with a static type system that does enough of the nice things that TypeScript does that it feels like a dynamically typed language).

                                                                                                                1. 1

                                                                                                                  I really like the sounds of Verona.

                                                                                                              2. 1

                                                                                                                Strong type systems are great if the type system can express the properties that I want to enforce. They’re an impediment otherwise.

                                                                                                                It’s not all-or-nothing. Type systems prevent certain classes of errors. Tests can help manage other classes of errors. There’s no magic bullet that catches all errors. That doesn’t mean we shouldn’t use these easily-accessible, industry-proven techniques.

                                                                                                                Now, static typing itself has many other benefits than just correctness–documentation, tooling, runtime efficiency, enforcing clear contracts between modules being just a few. And yes, they do actually reduce bugs. This is proven.

                                                                                                              3. 4

                                                                                                                We have somewhat believable evidence that CI, testing, small increments, and review helps with defect reduction (sure, that’s not the same thing as defect consequence reduction, but maybe a good enough proxy?)

                                                                                                                I have yet to see believable evidence that static languages do the same. Real evidence, not just “I feel my defects go down” – because I feel that too, but I know I’m a bad judge of such things.

                                                                                                                1. 1

                                                                                                                  There are a few articles to this effect about migrations from JavaScript to TypeScript. If memory serves they’re tracking the number of runtime errors in production, or bugs discovered, or something else tangible.

                                                                                                                  1. 1

                                                                                                                    That sounds like the sort of setup that’d be plagued by confounders, and perhaps in particular selection bias. That said, I’d be happy to follow any more explicit references you have to that type of article. It used to be an issue close to my heart!

                                                                                                                    1. 1

                                                                                                                      I remember this one popping up on Reddit once or twice.

                                                                                                                      AirBNB claimed that 38% of their postmortem-analysed bugs would have been avoidable with TypeScript/static typing.

                                                                                                                  2. 1

                                                                                                                    Shrug, so don’t use them. They’re not for everyone or every use case. Nobody’s got a gun to your head. I find it baffling how many people like liquorice.

                                                                                                                    1. 1

                                                                                                                      Don’t worry, I don’t. I can still dislike the thing.

                                                                                                                2. 6

                                                                                                                  And if you have millions of users, you also have millions of user’s data. Allowing unilateral code changes isn’t being a good steward of that data, either from a reliability or security perspective.

                                                                                                                1. 10

                                                                                                                  Sometimes there’s acronym requirements that require review on every code change.

                                                                                                                  1. 5

                                                                                                                    +1. If you work at a larger organization, you are likely bound to “acronym requirements” which seek to prevent unilateral access by employees.

                                                                                                                    Another set of eyes is a partial deterrent to insider attacks. That assumes you closed all the obvious holes first, however.

                                                                                                                  1. 4

                                                                                                                    How mature is the Nix tooling story? I’d like to migrate from my current solution for side projects (some bespoke shell tooling), however I’m a little afraid if I invest the time to do it right that it’ll be obsolete in months or require lots of maintenance.

                                                                                                                    1. 2

                                                                                                                      There are still some growing pains, the biggest upcoming change will be Nix Flakes and the new nix 2.0 CLI. Flakes is currently in Beta(?) I dunno, it’s not default yet, but it’s easily install-able. The CLI is there now, but it’s not fully fleshed out yet, so you have to go back to nix- instead of nix sometimes.

                                                                                                                      I’m still fairly new, but so far it all seems stable enough.

                                                                                                                    1. 20

                                                                                                                      Because of this, I will now have to ban all future contributions from your University and rip out your previous contributions, as they were obviously submitted in bad-faith with the intent to cause problems.

                                                                                                                      That’s anyone at the University of Minnesota then banned from making Linux contributions..

                                                                                                                      1. 23

                                                                                                                        It’s perhaps ineffective at stopping these authors, but an excellent message to the university. It’s not going to stop the authors changing email addresses, but it will end up in the media, at which point university administrators will be concerned about the bad press.

                                                                                                                        The patches coming from a group at a university probably lent them some minimal initial credibility, it’s not uncommon for CS research to build new tools and apply them to the Linux kernel. It’s unfortunate that future submissions will have to be treated with heightened suspicion.

                                                                                                                      1. 6

                                                                                                                        The part about HTTP protocols seems interesting enough, but everything after that is basically “yes we know our data and what we extracted from it is deeply flawed but here it is anyway”.

                                                                                                                        The detection of used libraries: You might think it will undercount consistently (in a way that does not introduce too much skew), but I’d expect jQuery to be much more likely in globals than other libraries, particularly React which, judging from gut feeling is more likely to be found on a site that uses webpack. The fact that none of the popular component frameworks are in the list suggests this mostly crawled newssites and stuff like that.

                                                                                                                        This linear regression thing: What does a negative regression coefficient mean then? That my site becomes faster when I add Zendesk? They put up a disclaimer saying “correlation does not equal causation”, then go on to suggest causation anyway by saying jQuery makes everything slower.

                                                                                                                        I commend the effort but I don’t think the results here tell me anything except “linear regression can be used to tie two random numbers together to make a graph that looks like it says something”.

                                                                                                                        1. 4

                                                                                                                          This linear regression thing: What does a negative regression coefficient mean then? That my site becomes faster when I add Zendesk?

                                                                                                                          Pages with Zendesk JS are likely to also be faster than average, is what that’s saying, which could very well be since a “support” page is fairly lightweight and the Zendesk JS is smart and asynchronous (I’m not sure if it’s true or not).

                                                                                                                          Likewise re:jQuery, you could probably say that folks that care about render times are also likely to not use jQuery. Not that jQuery itself is necessarily bad – though you can certainly build some Lovecraftian horrors with it.

                                                                                                                          1. 3

                                                                                                                            Just to be clear: I understand what the data actually says, I’m criticizing their choice to frame it as a useful guide when removing dependencies, which they do right at the end of the blogpost.