The deadlock ended up happening when the full http1 and http2 request were all included in the first Write call. With the 1ms sleep it is chunked up into multiple writes and then responses flow as expected. Tried a few things to better understand why it was locking, but eventually gave up :(
It looks cool, I think I would actually have a use to it, but it seems to be 100% tied to their SaaS. You can’t get this extension and install it in your own postgres.
Thanks! I think I could use that. I have a process now where I use a python API to query the DB, stream the result as Arrow over HTTP, then write to Parquet on the receiving end. It looks like I could probably cut a lot of steps out.
The author’s argument relies on the assumption that split-brain scenarios don’t occur in cloud services, which doesn’t match the theory or my personal experience.
What I’m saying is that well-designed and well-implemented systems can avoid split-brain scenarios by ensuring that writes only happen on the majority/quorum side of any split. This ensures correctness, while still practically offering high availability to clients in a large number of cases. The cases where this can’t happen - when the network is too broken - are rare and not a particularly large contributor to real-world availability.
Alternatively, systems can choose to allow split brain to happen, allow monotonic writes and weak read guarantees, and still have well-defined properties in the case of network partitions. This is less useful in cloud architectures, where establishing a quorum is generally a better architecture, but more useful in IoT, mobile, space, and similar applications.
As I say in the post, I’m not saying the CAP theorem is wrong, merely that it’s one of the least interesting things one can say about distributed systems tradeoffs (and that’s its irrelevant to thinking through the real tradeoffs in large classes of cloud architectures).
CAP is the trivial observation that if you have a database implemented by two servers, and the connection between the two servers is lost, then if you write to server 1 and then read from server 2, you won’t see the newly written value. I think at least some of the opposition to CAP comes from naming this “The CAP theorem” or here, “CAP Theory” (capital T).
Right. The “CAP Theorem” is either trivial (as you say), or just wrong (if we try use the “common sense” definition of A).
The best version I could state is probably “there exist network partitions that force a trade-off to be made between consistency and availability”, and even that’s more clunky than the flow-chart I borrowed from Bernstein and Das.
The way you wrote that made me realize that the CAP theorem is a deepity in the sense of Dennett: “A deepity involves saying something with two meanings—one trivially true, the other profound sounding but false or nonsensical.”
It may be trivial, but it’s still a theorem, since it’s not an axiom. Even easy theorems deserve proof and recognition when they have profound consequences.
I would disagree here. The pretentiousness around “The CAP Theorem” has had negative value, by obfuscating a trivial concept. The success of CAP in the memetic sense relies on NOT clearly explaining what it really means. If its purveyors clearly explained what it means and then called it “The CAP Theorem” afterward, that would be ok, but it also would look silly, which is why that doesn’t happen.
I don’t think this is is a fair assessment of CAP.
CAP says that if you have a distributed system over N nodes, then, in the presence of a partition (P) that system can be either correct (C) or available (A) — from the perspective of nodes/clients — but not both.
It is unclear what those words mean and when others have tried to pin it down they have ended up with that. See the paper that “proved” the CAP theorem (previously “Brewer’s conjecture”): https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.pdf If you read the proof of Theorem 1 you’ll see lots of fancy words, but those words say nothing more than the sentence I wrote.
Personally I’ve never found the theorem or its terminology particularly confusing, but, to each their own, I guess! Mostly, I like CAP as a way to push back on overly optimistic engineering managers who want to insist on having both C and A in the data system(s) they commission from their engineers ;)
I’m not saying it’s confusing. I’m saying that if you pin the meaning down then you end up with the triviality that I wrote above. See the paper I referenced above, or the blog post mentioned by Corbin below, if you do not believe me.
I don’t think that “a system can be either consistent or available but not both” is a triviality. But, different strokes for different folks, I suppose.
If you prefer pictures, there is an illustrated proof using diagrams. Out of curiosity, which language would you use if you wanted a formal proof of CAP? I might be able to oblige.
Right, that is a pictorial way of saying “if you have a database implemented by two servers, and the connection between the two servers is lost, then if you write to server 1 and then read from server 2, you won’t see the newly written value”. If you think it is valuable to formalize this, Lean seems to be popular these days.
well-designed and well-implemented systems can avoid split-brain scenarios by ensuring that writes only happen on the majority/quorum side of any split
How would you convince someone this is well-designed without first explaining that it’s a CP system, and what that means?
I don’t think, from the client’s perspective, it’s particularly interesting to talk about partitions (rather than talking about the larger set of failures of infrastructure, software, and operations).
So what do we talk about? There are three options I find more interesting than CAP:
Talk specifically about client guarantees (e.g. “strict serializability”, “bounded staleness”, “linearizability”) offered to clients. This needs to consider the whole world of failure modes.
Talk about specific distributed systems problems and their safety and liveness definitions (e.g. “consensus”, “atomic commitment”). You can talk about these problems in context of a particular failure model (e.g. fail-stop, byzantine, etc) and what the achievable system properties are.
Talk about lower-level results with more specific an useful predictions and properties. Even of the well-known ones (e.g. FLP, CALM, etc), CAP is the least interesting of these.
The problems with CAP are the goofy definition of “availability”, the overly-broad assumptions people make based on that definition, and the over focus on partitions at the cost of more interesting failure models.
What would be a non-goofy definition of availability?
I agree that “partition” is, in practice, often just a synecdoche for a much broader class of failure modes. My experience has been that the specific details of those failure modes are rarely interesting to users. All that users really care about is that such failure modes exist; or, more precisely, that those failure modes are unavoidable, and the downstream consequences of that realization.
While I agree with you that such breakages are rare, they affect millions of people, as in an outage earlier this year disconnecting millions from the Internet. Affected countries are known for their commerce and financial-service providers would have had to fall back to eventual consistency rather than pause transactions.
I imagine it’s similar to describing a boolean value without covering the fact that radiation can cause bitflips. In actual hardware, for most systems, these are so rare that the flips can be detected and corrected by ECC memory, and you probably don’t care about double flips unless you’re in space. So is it really relevant for us to talk about radiation bitflips? Does it impact the semantics of our program?
The article points out that in some networked services we already have a mitigation (routing to the majority) and scenarios where this isn’t viable are rare. The argument is that it’s so rare that you don’t even need to think about it.
As it says, it’s not that it’s wrong, it just isn’t interesting. Similarly, I could use booleans that use redundant bits to avoid radiation, but I don’t, because it’s not really important to me even if I understand that there’s a real scenario where I could really end up in trouble.
What is doing the detection/coordination here? A load balancer? Can you describe maybe a little bit more about the specifics of what cloud products would allow this kind of transactional awareness of network failures?
DNS, multi-cast, or some other mechanism directs them towards a healthy load balancer on the healthy side of the partition
These are not transactional mechanisms in my experience so would need a loss of A?
That depends a lot on what you mean by A. But yes, the mechanisms (e.g. LB health checks) are not transactional, and depend on the fact that the database replicas use consensus for replication (in my particular example). This is a common pattern, and is the same fundamental idea as DynamoDB, Aurora, S3, and wide array of other cloud systems.
What I’m saying is that well-designed and well-implemented systems can avoid split-brain scenarios by ensuring that writes only happen on the majority/quorum side of any split
One property of CAP is that the only way that a writer can know if it’s writing into a majority/quorum or minority/non-quorum partition of a distributed system, is if that system is CP.
ensuring that writes only happen on the majority/quorum side of any split. This ensures correctness, while still practically offering high availability to clients in a large number of cases.
The only way to ensure that writes only happen on the majority/quorum side of a split, is to use some kind of consensus protocol (Paxos, Raft, etc.) which necessarily make the overall system a CP system, in the CAP sense.
This ensures correctness,
Yes.
while still practically offering high availability to clients in a large number of cases
You’re weakening the definition of availability, here, beyond what CAP requires.
Alternatively, systems can choose to allow split brain to happen…
Split brain is always a possibility, the only question is how a system behaves when it occurs.
…allow monotonic writes and weak read guarantees, and still have well-defined properties in the case of network partitions.
Yes! And CRDTs are basically the only way to do it!
If you’re interested in this space there is also a collaborative effort going on to write a SQLite compatible DB in Rust here: https://github.com/penberg/limbo
The reporter had a node.js process with a 14 GB heap (RSS)
And it takes ~300 ms to fork(), which blocks the main loop for 300 ms, which means your performance is limited to like 3 requests/second
One explanation is:
The problem is down to http://docs.libuv.org/en/v1.x/process.html#c.uv_spawn being run in the main event loop thread, and not in the thread pool. Forking a new process causes the RSS of the parent process to be copied.
As RSS increases, spawn time increases.
I’m aware that fork() is very hard to implement, but I don’t think it should be O(N), where N is RSS
Does this have something to do with the fact that every node.js process basically has an enormous GCC-like compiler in it at runtime? (v8 is like 1M - 2M lines of code, with multiple compilers and interpreters)
Are there tons of JIT data structures that are somehow invalidated when you fork?
I don’t think “normal” C processes should have this big a perf hit, but I could be wrong …
Oh actually I wonder if the v8 GC is fork() friendly?
Forking is sort of OS-specific, so Chrome can’t rely on it. On macOS, fork and Mach ports are incompatible, so you can’t successfully fork a process that’s done anything but really basic stuff. And apparently Windows doesn’t support forking at all.
This is not true … at least historically, Chrome has a ton of OS-specific code deep in its bowels, relating to processes, and it uses fork() heavily. It’s different on Windows, Linux, and OS X.
at startup, before we spawn any threads, we fork off a helper process. This process opens every file we might use and then waits for commands from the main process. When it’s time to make a new subprocess we ask the helper, which forks itself again as the new child.
A zygote process is one that listens for spawn requests from a main process and forks itself in response. Generally they are used because forking a process after some expensive setup has been performed can save time and share extra memory pages.
So back to the original question, I really do wonder if v8 is causing any problems with node.js and forking … I would have thought that is optimized, because of v8’s heritage in a multi-process program
Although I guess it’s possible that I misunderstand where the forking happens … i.e. the rendering could be forked off, but the v8 parts aren’t forked, etc.
When I first started working on this blog post I had just returned from Systems Distributed (https://systemsdistributed.com/) in New York.
I was looking around for any information online about Node spawn performance and found this issue. The author of the issue is Joran Dirk Greef, CEO of Tigerbeetle. I imagine he filed it when he was working on the NodeJS prototype of Tigerbeetle. Tigerbeetle and Joran ran Systems Distributed.
Kind of. When you call fork the OS has to alias many memory pages, so it’s in part O(N) where N is the number of allocated memory pages, and RSS and memory pages are correlated.
Add a new build flag v8_enable_private_mapping_fork_optimization which
marks all pages allocated by OS::Allocate as MADV_DONTFORK. This
improves the performance of Node.js’s fork/execve combination by 10x on
a 600 MB heap.
yeah, in my testing with that fix merged I can no longer reproduce the spawn times using the example script. it’s still slow, but not as slow as what’s outlined in the issue.
This is sweet, but why do the squares have the wrong color? A1 should be a dark square and H1 a light square. It’s like a programming language where 0 is true and nonzero is false
Missing chess feature: allow choosing what to promote pawns to (after this state) rather than always promoting them to queens. Without this feature, not all valid chess states can be represented.
Missing UI feature: Detect when checkmate has happened, as in this state. (I presume stalemate is not detected either.)
If you wanted to support UI that shows when a draw may be claimed due to threefold repetition or the fifty-move rule, the game state, and thus the URL, would have to be much longer.
I never added the rules of piece movement, though, as the point was to create a static page and url you could pass around (say through email) and it would keep the board status.
Sidebar: after I finished, all the coders I talked to had a laundry list of things they wanted: add rules for chess, put in chat, add algebraic notation, and so on.
No thanks. Coding isn’t the hardest part of solving problems using PL, scoping is.
I like what you’ve done here. Keep up the good work!
Um, wild. Does this mean I could (some day, as an example) wire this into my Go code and use it to run integration tests against postgres on an in-memory database? That would be very cool.
I’ve used pg_tmp for ephemeral in memory postgresql instances as testing fixtures in the past, it worked well and didn’t require anything particularly magical or special to get going and it also allowed for me to quickly do integration matrices against multiple different versions of postgres.
This was long enough ago that containerization wasn’t really ready for prime time for this type of thing, but I’d imagine that more people are using containers for DB fixtures these days. I think there is definitely some upside to not needing a container runtime as part of a test suite.
Disorganization will always be the biggest problem. Organization is a place for everything, everything in its place, and not too many similar things in the same place. That is, if you have something new and you don’t know where to put it, then it is disorganized. It must go somewhere; if that is obvious, then you are okay.
Would someone mind attempting to elaborate on this? I think I kind of get it, but I am a little confused about what the last sentence is trying to convey.
Say you need to write a new utility function, or look one up. Do you know which file to put it in, or where to find it? Or is there a single 10k line file with every utility function for the entire application? Or a single flat folder with 200 files, one for each function? It’s not a trivial problem, because you have to balance over-categorization with “big messy pile” – both are unfriendly to work with. You actually have to face the problem and come up with (and evolve) systems that make sense but aren’t too complex, and work for everyone on the team, so they can be adhered to without too much discipline/overhead.
This means when you find an “exception” to the rule, you should actually think it through, and decide if it means the system should be updated, and how. But in practice, people mostly shrug and say, “Fuck it, just throw it in X” even though that doesn’t make sense. As those “fuck it” decisions compound, the system becomes unusable. And you need to balance that against becoming overly anal and bike shedding everything.
And this same principle applies at every level, whether you are creating a new function, or new API endpoint, or new service, etc.
I think it’s generally a wise list and you can see where it’s coming from. But any such list will fall in overgeneralization. That doesn’t make it bad, though!
Doing the screens first and persistence last […] is a very bad mistake. […] Persist first, then gradually move it up until it gets into the screens.
Then you can end up with the opposite problem, a data model with a high impedance mismatch with the UI. Neither is the “right” approach —you should take both factors account.
But, to be honest, I’ve fought more systems in which the data was “correctly” modeled in an idealized world and then struggled to be made useful for its mission than the other way around. So I’d say erring on the side of modeling the data after its particular mission is a better default than thanking of persistence or abstract relationships first.
Always clean up right after a release. Everyone is tired, and cleanup work is boring. If you do not clean up then, you will never clean up and the mess will get worse, far worse.
Reduce friction, don’t tolerate it. Spending the time to mitigate or minimize it always pays off. Putting up with it always slides one downhill.
Sure, hard to disagree with any of that! But I don’t think most people who doesn’t always uphold to this does so out of ignorance, or unwillingness. You have stakeholders, a backlog, competitors, competing opinions, then you have more deadlines, etc. Prioritizing is hard.
Do not let people add in onion architectures. If they are trying to avoid the main code, and just do “their thing” around the outside, that work is usually very harmful. Push them to do the work properly.
I’ve certainly suffered my fair share of onion architectures. I think those efforts are generally misguided and prone to second system effects. But sometimes it can be the difference between shipping in two weeks, two quarters, or two years. Or, extending the original system may not be feasible for this reason or another —vendor lock-in, outdated tech or practices, the guy who knows all about it has left, etc.
A weird and ugly interface is a strong disincentive against usage. Useful code has a long life. The point of writing professional code is to maximize its lifespan.
The most ancient code I’ve worked with wasn’t ancient because it was so good or had such a nice interface. Quite the opposite: the lack of proper interfaces makes it so entrenched that it’s never cost-effective to remove it or replace it.
I’m having trouble understanding the table. How meaningful are the compression ratios shows in these results in the context of this competition? http://prize.hutter1.net/
edit: Ok, enwik8 with rwkv_430M, 100,000,000 bytes, 0.948 bits per byte, 94,800,000 bits. Hutter shows 15’284’944 “size” as best for enwiki8, if that’s bytes then it’s 122,279,552 bits vs 94,800,000 for total size?
The Hutter competition takes the compressor’s size into account. Ratios in OP’s tables don’t. As the smallest model used in the table alone is 135MB, it means Bellard doesn’t get the prize.
Is this bad practice? If you use this to freely pin multiple versions of different pieces of software and string them together won’t you have to download large dependency trees for every version you pin?
I guess this is better than not being able to access the version at all.
I think this is one of the fundamental tradeoffs of Nix: you can potentially have a different dependency tree for every package you install, and yeah, that would use a lot of disk space. But in exchange, you’re assured that once you install a package, it and its entire dependency tree are immutable. (Of course, you can update packages’ dependency trees later if you want to, so that some or all of your packages use the same dependency tree, but the point of Nix is that this only happens if you explicitly request it.)
It’s more that you can if you need to. Trying to do this with basically any other package manager would be an absolute nightmare, usually resulting in having to maintain your own packaging scripts and learning a bunch of arcana irrelevant to your actual project.
Love to see “wasm is the target” tooling getting worked on. Would love to see coverage-aware fuzzing and deterministic simulation in wasm pop up sometime as well. Seems like you could build something like rr for wasm: https://rr-project.org/
I have become one of those boring people whose first thought is “why not just use Nix” as a response to half of the technical blog posts I see. The existence of all those other projects, package managers, runtime managers, container tooling, tools for sharable reproducible development environment, much of Docker, and much more, when taken together all point to the need for Nix (and the need for Nix to reach a critical point of ease of adoption).
it doesn’t appear to support using different versions of runtimes—which is the entire point of asdf/rtx in the first place. I’m not sure why I would use devenv over homebrew if I didn’t care about versions.
Primarily the bad taste the lacking UX and documentation leaves in people’s mouths. Python development is especially crap with Nix, even if you’re using dream2nix or mach-nix or poetry2nix or whatever2nix. Technically, Nix is awesome and this is the kind of thing the Nix package manager excels at.
because the documentation is horrible, the UX is bad, and it doesn’t like it when you try to do something outside of it’s bounds. It also solves different problems from containers (well, there’s some overlap, but a networking model is not part of Nix).
I’ll adopt Nix the moment that the cure hurts less than the disease. If someone gave Nix the same UX as Rtx or Asdf, people would flock to it. Instead it has the UX of a tire fire (but with more copy-paste from people’s blogs) and a street team that mostly alienates 3/4 of the nerds who encounter it.
No, thanks for the link! This looks like a real usability improvement. I don’t know if I am in the target audience, but I could see this being very useful for reproducing env in QA.
Heh, that’s a good counterpoint. I would say, unlike with k8s I get very immediate benefits from even superficial nix use. (I do use k8s too, but only because I work with people who know it very well.) I assure you (honest) I’m not very smart. I just focus on using nix in the simplest way possible that gives me daily value, and add a little something every few months or so. I still have a long way to go!
The How it works section of the rtx README sounds very much like nix + direnv! (And of course, I’m not saying there’s no place for tools like rtx, looks like a great project!)
Nix is another solution that treats the symptoms but not the disease. I used asdf (and now rtx) mainly for Python because somehow Python devs find it acceptable to break backwards compatibility between minor versions. Therefore, some libraries define min and max supported interpreter version.
Still, I’d rather use rtx than nix. Better documentation and UX than anything Nix community created since 2003.
Yes it is difficult. Nix is great at “give me Rust” but not as great at “give me Rust 1.64.0”. That said for Rust itself there aren’t third party repos that provide such capability.
I think you are pointing out that nixpkgs tends to only ship a single version of the Rust compiler. While nixpkgs is a big component of the Nix ecosystem, Nix itself has no limitations prevent using it to install multiple version of Rust.
Obviously nix itself has no limitation as I mentioned there are other projects to enable this capability. While you are correct I was referring to nixpkgs, for all intents nixpkgs is part of the nix ecosystem. Without nixpkgs, very few people would be using or talking about nix.
What Adam means here is that depending on what revision of Nixpkgs you pull in, you will only be able to choose one version of rustc. (We only package one version of rustc, the latest stable, at any given time.)
Of course, that doesn’t stop you from mixing and matching packages from different Nixpkgs versions, they’re just… not the easiest thing to find if you want to be given a specific package version.
(Though for Rust specifically, as Adam mentioned, there are two projects that are able to do this easier: rust-overlay and Fenix.)
No I wouldn’t say so, especially using flakes. (It gets trickier if you want to use nix to pin all the libs used by a project. It’s not complicated in theory, but there are different and often multiple solutions per language.)
Wonder where there is a missing mutex or waitgroup.
The deadlock ended up happening when the full http1 and http2 request were all included in the first Write call. With the 1ms sleep it is chunked up into multiple writes and then responses flow as expected. Tried a few things to better understand why it was locking, but eventually gave up :(
It would be cool to see a version of this that is about container technologies instead of container providers.
In the cloud there are no technologies, only products.
It looks cool, I think I would actually have a use to it, but it seems to be 100% tied to their SaaS. You can’t get this extension and install it in your own postgres.
It looks like this might be an open source option: https://github.com/duckdb/pg_duckdb
Thanks! I think I could use that. I have a process now where I use a python API to query the DB, stream the result as Arrow over HTTP, then write to Parquet on the receiving end. It looks like I could probably cut a lot of steps out.
The author’s argument relies on the assumption that split-brain scenarios don’t occur in cloud services, which doesn’t match the theory or my personal experience.
That’s not what I’m saying, no.
What I’m saying is that well-designed and well-implemented systems can avoid split-brain scenarios by ensuring that writes only happen on the majority/quorum side of any split. This ensures correctness, while still practically offering high availability to clients in a large number of cases. The cases where this can’t happen - when the network is too broken - are rare and not a particularly large contributor to real-world availability.
Alternatively, systems can choose to allow split brain to happen, allow monotonic writes and weak read guarantees, and still have well-defined properties in the case of network partitions. This is less useful in cloud architectures, where establishing a quorum is generally a better architecture, but more useful in IoT, mobile, space, and similar applications.
So, you traded off consistency.
CAP Theory remains.
As I say in the post, I’m not saying the CAP theorem is wrong, merely that it’s one of the least interesting things one can say about distributed systems tradeoffs (and that’s its irrelevant to thinking through the real tradeoffs in large classes of cloud architectures).
CAP is the trivial observation that if you have a database implemented by two servers, and the connection between the two servers is lost, then if you write to server 1 and then read from server 2, you won’t see the newly written value. I think at least some of the opposition to CAP comes from naming this “The CAP theorem” or here, “CAP Theory” (capital T).
Right. The “CAP Theorem” is either trivial (as you say), or just wrong (if we try use the “common sense” definition of A).
The best version I could state is probably “there exist network partitions that force a trade-off to be made between consistency and availability”, and even that’s more clunky than the flow-chart I borrowed from Bernstein and Das.
The way you wrote that made me realize that the CAP theorem is a deepity in the sense of Dennett: “A deepity involves saying something with two meanings—one trivially true, the other profound sounding but false or nonsensical.”
It may be trivial, but it’s still a theorem, since it’s not an axiom. Even easy theorems deserve proof and recognition when they have profound consequences.
I would disagree here. The pretentiousness around “The CAP Theorem” has had negative value, by obfuscating a trivial concept. The success of CAP in the memetic sense relies on NOT clearly explaining what it really means. If its purveyors clearly explained what it means and then called it “The CAP Theorem” afterward, that would be ok, but it also would look silly, which is why that doesn’t happen.
I don’t think this is is a fair assessment of CAP.
CAP says that if you have a distributed system over N nodes, then, in the presence of a partition (P) that system can be either correct (C) or available (A) — from the perspective of nodes/clients — but not both.
It is unclear what those words mean and when others have tried to pin it down they have ended up with that. See the paper that “proved” the CAP theorem (previously “Brewer’s conjecture”): https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.pdf If you read the proof of Theorem 1 you’ll see lots of fancy words, but those words say nothing more than the sentence I wrote.
Personally I’ve never found the theorem or its terminology particularly confusing, but, to each their own, I guess! Mostly, I like CAP as a way to push back on overly optimistic engineering managers who want to insist on having both C and A in the data system(s) they commission from their engineers ;)
edit: my bad —
s/correct/consistent/gI’m not saying it’s confusing. I’m saying that if you pin the meaning down then you end up with the triviality that I wrote above. See the paper I referenced above, or the blog post mentioned by Corbin below, if you do not believe me.
I don’t think that “a system can be either consistent or available but not both” is a triviality. But, different strokes for different folks, I suppose.
If you prefer pictures, there is an illustrated proof using diagrams. Out of curiosity, which language would you use if you wanted a formal proof of CAP? I might be able to oblige.
Right, that is a pictorial way of saying “if you have a database implemented by two servers, and the connection between the two servers is lost, then if you write to server 1 and then read from server 2, you won’t see the newly written value”. If you think it is valuable to formalize this, Lean seems to be popular these days.
How would you convince someone this is well-designed without first explaining that it’s a CP system, and what that means?
I don’t think, from the client’s perspective, it’s particularly interesting to talk about partitions (rather than talking about the larger set of failures of infrastructure, software, and operations).
So what do we talk about? There are three options I find more interesting than CAP:
The problems with CAP are the goofy definition of “availability”, the overly-broad assumptions people make based on that definition, and the over focus on partitions at the cost of more interesting failure models.
What would be a non-goofy definition of availability?
I agree that “partition” is, in practice, often just a synecdoche for a much broader class of failure modes. My experience has been that the specific details of those failure modes are rarely interesting to users. All that users really care about is that such failure modes exist; or, more precisely, that those failure modes are unavoidable, and the downstream consequences of that realization.
While I agree with you that such breakages are rare, they affect millions of people, as in an outage earlier this year disconnecting millions from the Internet. Affected countries are known for their commerce and financial-service providers would have had to fall back to eventual consistency rather than pause transactions.
How does that make them irrelevant for describing the semantics of the system?
I imagine it’s similar to describing a boolean value without covering the fact that radiation can cause bitflips. In actual hardware, for most systems, these are so rare that the flips can be detected and corrected by ECC memory, and you probably don’t care about double flips unless you’re in space. So is it really relevant for us to talk about radiation bitflips? Does it impact the semantics of our program?
The article points out that in some networked services we already have a mitigation (routing to the majority) and scenarios where this isn’t viable are rare. The argument is that it’s so rare that you don’t even need to think about it.
As it says, it’s not that it’s wrong, it just isn’t interesting. Similarly, I could use booleans that use redundant bits to avoid radiation, but I don’t, because it’s not really important to me even if I understand that there’s a real scenario where I could really end up in trouble.
What is doing the detection/coordination here? A load balancer? Can you describe maybe a little bit more about the specifics of what cloud products would allow this kind of transactional awareness of network failures?
These are not transactional mechanisms in my experience so would need a loss of A?
That depends a lot on what you mean by A. But yes, the mechanisms (e.g. LB health checks) are not transactional, and depend on the fact that the database replicas use consensus for replication (in my particular example). This is a common pattern, and is the same fundamental idea as DynamoDB, Aurora, S3, and wide array of other cloud systems.
One property of CAP is that the only way that a writer can know if it’s writing into a majority/quorum or minority/non-quorum partition of a distributed system, is if that system is CP.
The only way to ensure that writes only happen on the majority/quorum side of a split, is to use some kind of consensus protocol (Paxos, Raft, etc.) which necessarily make the overall system a CP system, in the CAP sense.
Yes.
You’re weakening the definition of availability, here, beyond what CAP requires.
Split brain is always a possibility, the only question is how a system behaves when it occurs.
Yes! And CRDTs are basically the only way to do it!
If you’re interested in this space there is also a collaborative effort going on to write a SQLite compatible DB in Rust here: https://github.com/penberg/limbo
Recent update: https://x.com/penberg/status/181131073696396515
The linked issue from 2017 is an interesting - https://github.com/nodejs/node/issues/14917
The reporter had a node.js process with a 14 GB heap (RSS)
And it takes ~300 ms to fork(), which blocks the main loop for 300 ms, which means your performance is limited to like 3 requests/second
One explanation is:
I’m aware that fork() is very hard to implement, but I don’t think it should be O(N), where N is RSS
Does this have something to do with the fact that every
node.jsprocess basically has an enormous GCC-like compiler in it at runtime? (v8 is like 1M - 2M lines of code, with multiple compilers and interpreters)Are there tons of JIT data structures that are somehow invalidated when you fork?
I don’t think “normal” C processes should have this big a perf hit, but I could be wrong …
Oh actually I wonder if the v8 GC is fork() friendly?
In Oils we made sure that the GC is fork() friendly, since Ruby changed their GC in 2012 because of this “dirty the whole heap” performance bug - https://www.brightbox.com/blog/2012/12/13/ruby-garbage-collector-cow-performance/
Also CPython has the same issue on large web server workloads, e.g. at Instagram
I would have thought v8 is designed for this since Chrome is multi-process ….
Forking is sort of OS-specific, so Chrome can’t rely on it. On macOS, fork and Mach ports are incompatible, so you can’t successfully fork a process that’s done anything but really basic stuff. And apparently Windows doesn’t support forking at all.
This is not true … at least historically, Chrome has a ton of OS-specific code deep in its bowels, relating to processes, and it uses fork() heavily. It’s different on Windows, Linux, and OS X.
https://neugierig.org/software/chromium/notes/2011/08/zygote.html
Forking is also used as an optimization:
https://chromium.googlesource.com/chromium/src/+/HEAD/docs/linux/zygote.md
Multiprocess is the gold standard for security! https://lobste.rs/s/dmgwip/sshd_8_split_into_multiple_binaries#c_wnaflq
So back to the original question, I really do wonder if v8 is causing any problems with node.js and forking … I would have thought that is optimized, because of v8’s heritage in a multi-process program
Although I guess it’s possible that I misunderstand where the forking happens … i.e. the rendering could be forked off, but the v8 parts aren’t forked, etc.
When I first started working on this blog post I had just returned from Systems Distributed (https://systemsdistributed.com/) in New York.
I was looking around for any information online about Node spawn performance and found this issue. The author of the issue is Joran Dirk Greef, CEO of Tigerbeetle. I imagine he filed it when he was working on the NodeJS prototype of Tigerbeetle. Tigerbeetle and Joran ran Systems Distributed.
It was a surreal moment.
Kind of. When you call fork the OS has to alias many memory pages, so it’s in part O(N) where N is the number of allocated memory pages, and RSS and memory pages are correlated.
I would have thought you can do page ranges with some kind of tree structure , so it would be more like O(log N) to update that metadata
https://en.wikipedia.org/wiki/Page_table#Multilevel_page_tables
But I actually don’t know what’s common
I read deeper in the thread, and it looks like there was indeed a 2018 v8 fix motivated by this 2017 issue:
https://issues.chromium.org/issues/42210615
https://chromium-review.googlesource.com/c/v8/v8/+/4602858
yeah, in my testing with that fix merged I can no longer reproduce the spawn times using the example script. it’s still slow, but not as slow as what’s outlined in the issue.
This is sweet, but why do the squares have the wrong color? A1 should be a dark square and H1 a light square. It’s like a programming language where 0 is true and nonzero is false
Welcome to shell 😈
ah! I fixed this at some point but it seems I reversed it again before publishing. fixed now, thanks
Cannot click on pieces on my iPhone to register a move. Maybe accept input in chess notation to make a move?
hmm, I think I improved the iphone thing a little by removing the :hover state on mobile, but it still seems a little buggy
Missing chess feature: allow choosing what to promote pawns to (after this state) rather than always promoting them to queens. Without this feature, not all valid chess states can be represented.
Missing UI feature: Detect when checkmate has happened, as in this state. (I presume stalemate is not detected either.)
If you wanted to support UI that shows when a draw may be claimed due to threefold repetition or the fifty-move rule, the game state, and thus the URL, would have to be much longer.
Edit: I am happy to see that the game already tracks history enough to detect when players can and cannot castle and when they can or cannot capture en passant.
nice! thanks for finding these states, I didn’t think about promotion at all.
yeah, I’d really like to have a gameplay mode of some kind that stores the actual game history.
I did this four years ago. Fun project! Cool!
https://github.com/DanielBMarkham/OnePageChess
I never added the rules of piece movement, though, as the point was to create a static page and url you could pass around (say through email) and it would keep the board status.
Sidebar: after I finished, all the coders I talked to had a laundry list of things they wanted: add rules for chess, put in chat, add algebraic notation, and so on.
No thanks. Coding isn’t the hardest part of solving problems using PL, scoping is.
I like what you’ve done here. Keep up the good work!
ah nice, this version is so elegant
Um, wild. Does this mean I could (some day, as an example) wire this into my Go code and use it to run integration tests against postgres on an in-memory database? That would be very cool.
If we finally get an embeddable Postgres but it requires a WASM runtime… I don’t know how to feel about that.
Learn to love it?
I’ve used
pg_tmpfor ephemeral in memory postgresql instances as testing fixtures in the past, it worked well and didn’t require anything particularly magical or special to get going and it also allowed for me to quickly do integration matrices against multiple different versions of postgres.This was long enough ago that containerization wasn’t really ready for prime time for this type of thing, but I’d imagine that more people are using containers for DB fixtures these days. I think there is definitely some upside to not needing a container runtime as part of a test suite.
I think you can already do in-memory tables in client-server postgres.
I believe they mean in go’s process for testing, not in-memory on the PG server
Okay, that makes sense, thanks. I can see that that might offer speed or isolation benefits when testing.
Ooh, this might make running the Miniflux RSS reader on Windows quite a bit simpler.
Would someone mind attempting to elaborate on this? I think I kind of get it, but I am a little confused about what the last sentence is trying to convey.
My interpretation….
Say you need to write a new utility function, or look one up. Do you know which file to put it in, or where to find it? Or is there a single 10k line file with every utility function for the entire application? Or a single flat folder with 200 files, one for each function? It’s not a trivial problem, because you have to balance over-categorization with “big messy pile” – both are unfriendly to work with. You actually have to face the problem and come up with (and evolve) systems that make sense but aren’t too complex, and work for everyone on the team, so they can be adhered to without too much discipline/overhead.
This means when you find an “exception” to the rule, you should actually think it through, and decide if it means the system should be updated, and how. But in practice, people mostly shrug and say, “Fuck it, just throw it in X” even though that doesn’t make sense. As those “fuck it” decisions compound, the system becomes unusable. And you need to balance that against becoming overly anal and bike shedding everything.
And this same principle applies at every level, whether you are creating a new function, or new API endpoint, or new service, etc.
Some very good lessons in here, but I doubt it will land with people who haven’t already learned the lesson themselves.
What if someone has already learned both the lesson and the opposite lesson?
Have any examples?
I think it’s generally a wise list and you can see where it’s coming from. But any such list will fall in overgeneralization. That doesn’t make it bad, though!
Then you can end up with the opposite problem, a data model with a high impedance mismatch with the UI. Neither is the “right” approach —you should take both factors account.
But, to be honest, I’ve fought more systems in which the data was “correctly” modeled in an idealized world and then struggled to be made useful for its mission than the other way around. So I’d say erring on the side of modeling the data after its particular mission is a better default than thanking of persistence or abstract relationships first.
Sure, hard to disagree with any of that! But I don’t think most people who doesn’t always uphold to this does so out of ignorance, or unwillingness. You have stakeholders, a backlog, competitors, competing opinions, then you have more deadlines, etc. Prioritizing is hard.
I’ve certainly suffered my fair share of onion architectures. I think those efforts are generally misguided and prone to second system effects. But sometimes it can be the difference between shipping in two weeks, two quarters, or two years. Or, extending the original system may not be feasible for this reason or another —vendor lock-in, outdated tech or practices, the guy who knows all about it has left, etc.
I’ll just link one of my favorite software engineering essays: Write code that is easy to delete, not easy to extend.
The most ancient code I’ve worked with wasn’t ancient because it was so good or had such a nice interface. Quite the opposite: the lack of proper interfaces makes it so entrenched that it’s never cost-effective to remove it or replace it.
Thanks, great response!
I’m having trouble understanding the table. How meaningful are the compression ratios shows in these results in the context of this competition? http://prize.hutter1.net/
edit: Ok, enwik8 with rwkv_430M, 100,000,000 bytes, 0.948 bits per byte, 94,800,000 bits. Hutter shows 15’284’944 “size” as best for enwiki8, if that’s bytes then it’s 122,279,552 bits vs 94,800,000 for total size?The Hutter competition takes the compressor’s size into account. Ratios in OP’s tables don’t. As the smallest model used in the table alone is 135MB, it means Bellard doesn’t get the prize.
As well as I understand the main reason why Bellard didn’t took the prize is this: his compressor is not FOSS
Ah! thank you
Is this bad practice? If you use this to freely pin multiple versions of different pieces of software and string them together won’t you have to download large dependency trees for every version you pin?
I guess this is better than not being able to access the version at all.
It’s a non-starter for any serious project for all packages to be only available in lockstep with each other.
Doing these version pins is pretty annoying so people hopefully only would use them if they really need them.
I think this is one of the fundamental tradeoffs of Nix: you can potentially have a different dependency tree for every package you install, and yeah, that would use a lot of disk space. But in exchange, you’re assured that once you install a package, it and its entire dependency tree are immutable. (Of course, you can update packages’ dependency trees later if you want to, so that some or all of your packages use the same dependency tree, but the point of Nix is that this only happens if you explicitly request it.)
It’s more that you can if you need to. Trying to do this with basically any other package manager would be an absolute nightmare, usually resulting in having to maintain your own packaging scripts and learning a bunch of arcana irrelevant to your actual project.
Stdlib only version (with no colors, no term width and no options), just for fun
It’s a great little problem to golf on. My entry.
edit: technically this solution is broken, because the Reset method on a time.Timer is basically broken
Defining the writer on the func is cool. Stdout and stderr to avoid synchronization, v nice.
Yeah, that
time.TimerAPI is so tricky to get right. Do you have any suggestions for what to use instead?I think a correct timer implementation would look something like this, which is highly verbose and annoying.
This code helped me shorten it. Thank you.
Love to see “wasm is the target” tooling getting worked on. Would love to see coverage-aware fuzzing and deterministic simulation in wasm pop up sometime as well. Seems like you could build something like rr for wasm: https://rr-project.org/
I have become one of those boring people whose first thought is “why not just use Nix” as a response to half of the technical blog posts I see. The existence of all those other projects, package managers, runtime managers, container tooling, tools for sharable reproducible development environment, much of Docker, and much more, when taken together all point to the need for Nix (and the need for Nix to reach a critical point of ease of adoption).
Well, maybe there’s a reason why nix hasn’t seen significant adoption?
The Nix community has been aware of the DX pitfalls that prevented developers to be happy with the tooling.
I’ve made https://devenv.sh to address these and make it easy for newcomers, let me know if you hit any issues.
+1 for devenv. It’s boss. The only thing I think it’s truly “missing” at the moment is package versioning (correct me if I’m wrong).
Love it! (As in: I haven’t had a reason to try it yet, but this is definitely the way to go!)
it doesn’t appear to support using different versions of runtimes—which is the entire point of asdf/rtx in the first place. I’m not sure why I would use devenv over homebrew if I didn’t care about versions.
I think the idea is a
devenvper-project, not globally, like a.tool-versionsfile; as you say, it’d be a bit of a non-sequitor otherwiseDevenv, just like Nix, support that OOTB. You simply define different shell per project.
No, it’s the children who are wrong
Primarily the bad taste the lacking UX and documentation leaves in people’s mouths. Python development is especially crap with Nix, even if you’re using dream2nix or mach-nix or poetry2nix or whatever2nix. Technically, Nix is awesome and this is the kind of thing the Nix package manager excels at.
I’ve found
mach-nix[1] very usable! I’m not primarily working with Python though.[1] https://github.com/DavHau/mach-nix
Yes, it’s way too hard to learn!
because the documentation is horrible, the UX is bad, and it doesn’t like it when you try to do something outside of it’s bounds. It also solves different problems from containers (well, there’s some overlap, but a networking model is not part of Nix).
I’ll adopt Nix the moment that the cure hurts less than the disease. If someone gave Nix the same UX as Rtx or Asdf, people would flock to it. Instead it has the UX of a tire fire (but with more copy-paste from people’s blogs) and a street team that mostly alienates 3/4 of the nerds who encounter it.
Curious did you try https://denvenv.sh yet?
https://devenv.sh for those clicking…
No, thanks for the link! This looks like a real usability improvement. I don’t know if I am in the target audience, but I could see this being very useful for reproducing env in QA.
It’s like using kubernetes. Apparently it’s great if you can figure out how to use it.
I’ve given up twice trying to use nix personally. I think it’s just for people smarter than me.
Heh, that’s a good counterpoint. I would say, unlike with k8s I get very immediate benefits from even superficial nix use. (I do use k8s too, but only because I work with people who know it very well.) I assure you (honest) I’m not very smart. I just focus on using nix in the simplest way possible that gives me daily value, and add a little something every few months or so. I still have a long way to go!
The
How it workssection of thertxREADME sounds very much like nix + direnv! (And of course, I’m not saying there’s no place for tools likertx, looks like a great project!)Nix is another solution that treats the symptoms but not the disease. I used asdf (and now rtx) mainly for Python because somehow Python devs find it acceptable to break backwards compatibility between minor versions. Therefore, some libraries define min and max supported interpreter version.
Still, I’d rather use rtx than nix. Better documentation and UX than anything Nix community created since 2003.
It’s clearly out of scope for Nix (or adf, rtx…) to fix the practices of the Python community?
Sure. It’s good that a better alternative for asdf exists, although it would be better that such a program wasn’t needed at all.
Isn’t it somewhat difficult to pin collections of different versions of software for different directories with Nix?
Yes it is difficult. Nix is great at “give me Rust” but not as great at “give me Rust 1.64.0”. That said for Rust itself there aren’t third party repos that provide such capability.
I think you meant s/aren’t/are :)
Correct. Bad typo. :)
I think you are pointing out that
nixpkgstends to only ship a single version of the Rust compiler. Whilenixpkgsis a big component of the Nix ecosystem, Nix itself has no limitations prevent using it to install multiple version of Rust.Obviously nix itself has no limitation as I mentioned there are other projects to enable this capability. While you are correct I was referring to nixpkgs, for all intents nixpkgs is part of the nix ecosystem. Without nixpkgs, very few people would be using or talking about nix.
I thought that was the point of Nix, that different packages could use their own versions of dependencies. Was I misunderstanding?
What Adam means here is that depending on what revision of Nixpkgs you pull in, you will only be able to choose one version of rustc. (We only package one version of rustc, the latest stable, at any given time.)
Of course, that doesn’t stop you from mixing and matching packages from different Nixpkgs versions, they’re just… not the easiest thing to find if you want to be given a specific package version.
(Though for Rust specifically, as Adam mentioned, there are two projects that are able to do this easier: rust-overlay and Fenix.)
This is a great tool to find a revision of Nixpkgs that has a specific version of some package that you need: https://lazamar.co.uk/nix-versions/
That said, it’s too hard, and flakes provides much nicer DX.
The original https://github.com/mozilla/nixpkgs-mozilla still works too, as far as I know. I use it, including to have multiple versions of Rust.
Alright, thanks!
No I wouldn’t say so, especially using flakes. (It gets trickier if you want to use nix to pin all the libs used by a project. It’s not complicated in theory, but there are different and often multiple solutions per language.)
Any pointers on how I can accomplish the same functionality of asdf in Nix?
Docs https://nixos.org/guides/declarative-and-reproducible-developer-environments.html or use https://devenv.sh/
These are some quick ways to get started:
Without flakes: https://nix.dev/tutorials/ad-hoc-developer-environments#ad-hoc-envs
With flakes: https://zero-to-nix.com/start/nix-develop
And add direnv to get automatic activation of environment-per-directory: https://determinate.systems/posts/nix-direnv
Or try devenv: https://devenv.sh/
(Pros: much easier to get started. Cons: very new, doesn’t yet allow you to pick all old versions of a language, for example.)
I have become one of those boring people who just downloads an installer and double clicks it.