The other threads are talking about serializing closures, but I want to talk about the capability for a service to start a new sub-service!
FLAME is Fly demo’ing the power of their machine API; it has start, stop, wait and everything you’d expect from an operating system’s process control API. (n.b. fly_process_group) The conspicuous absence is resource limits— which I suppose Fly is OK with as they have your credit card.
But imagine if each container (group!) had a defined reservation of CPU, memory, (ephemeral) storage, bandwidth, … which it could apportion out to any started sub-services. Basically: posix_spawn but for services on a mesh network; instead of file descriptors, we have IPv6 addresses.
I read this and feel like I’m missing something important. The idea is that you capture the state of the program, launch a copy of your code and then restore the state and carry out the task. Is the only difference between this and traditional background job frameworks the lack of a queue? The linked Elixir code just seems to take advantage of OTP’s clustering, with nodes coming and going… so, “queueless,” but if you “autoscale” Rails workers, and eliminate / constrain the queue, isn’t that effectively the same thing?
I’m scratching my head here looking for something innovative and I’m not really finding it. Fly makes the “you don’t have to configure anything because you can just do flyctl run ... ” or whatever is equivalent, but … yeah, thats part of why Heroku was so successful, too. Turns out that when you use a platform that makes it trivial to do traditionally hard to setup things, you get some advantages.
You can “just” wrap your function, like any other, without needing to do the extra job stuff. By contrast, using the ecosystem-standard Oban or Verk (yuck), you’d need to implement a module and it would be a Whole Thing (and for good reason, and it would handle other things, but still…)
The actual “making this work” part is conceptually very clean
In the process of wrapping that function, you can close over variables and the runtime will “just work” getting them shuffled out to the runner
For testing, it seems to just spawn in the normal supervision tree so you don’t have to lug around a whole DB or whatever in order to check that things work
And as for this, if you combine this with something like Zigler or Rustler to make it super-easy to drop in native code where needed, things we really interesting.
I’m scratching my head here looking for something innovative and I’m not really finding it.
So, that’s the thing…99% of the stuff we do in Elixir and Erlang isn’t “innovative” so much as just solid, boring engineering with sane primitives.
The things that make the core of all this work are:
A boring HTTP library for talking to internal Fly APIs and spinning up instances (Req, though I’m still not using it for my own stuff)
The Firecracker VM stuff that makes this all possible
Whatever bullshit software-defined networking magic Fly does to let squishy BEAM cluster networks not get fiddled with
The newish (as of Elixir 1.14) Node.spawn_monitor function
The BEAM’s message passing and closure behavior
And that’s about it, honestly. That’s a feature, not a bug.
So, that’s the thing…99% of the stuff we do in Elixir and Erlang isn’t “innovative” so much as just solid, boring engineering with sane primitives.
That was kind of my fear, to be honest. BEAM makes it trivial to solve some of these things, and so saying “We’re rethinking Serverless” is sort of an… “OK, but not everyone can serialize closures, what about the rest of us?”
“OK, but not everyone can serialize closures, what about the rest of us?”
At some level, as an industry we’ve gotta realize when our tools kinda suck and stop using them. One day doubtless we’ll say the same thing about the BEAM stuff (and some of us have, for certain usecases) when a genuinely better solution comes along–but for now, for webshit and for certain other problems, the Elixir and Erlang ecosystem is pretty much as good as it gets (in my opinion, of course).
I’ve been using Elixir for quite nearly a decade now. I have absolutely zero desire to switch stacks because of all the permanent clownishness elsewhere. I’ve done Python and Ruby and Javascript and Java and C and many others, and of them only SQL has really reliably come through for me in my darkest hour every time.
Like, it’s okay to just stop hurting ourselves and paying the sanity damage cost of worse tooling. It’s time to stop.
As someone that has also been on the BEAM for a decade or more, this is not really realistic. The vast majority of people in their careers are unable to affect the language choice of the applications they build. It is generally up to a small handful of people within a given organization. So simply saying that people should move on is unfair to these people.
Of course, people could choose to learn the language on their own time, but even that does not necessarily guarantee that you will get a job in it if you actually like it.
I haven’t used BEAM, but I am starting to wonder if serializing closures is actually the big missing piece in structured languages.
All those messages waiting on work queues in a distributed system really are just an example of “defuntionalizing the continuation” [0]. Instead of converting your future intent into a data structure manually, finding and managing a place to store that state, running services to pull those messages, decode them, and execute those tasks… why not just get the compiler to do all of that for me? I mean, that last sentence pretty much describes JavaScript Promises and the task execution engine (except single thread and in memory, not distributed with fault tolerance etc). JavaScript works well for concurrent programming but not parallelism or distributed computing due to mutability everywhere - you can bring that to full parallelism with immutability as done here with BEAM (or else with managed/structured mutability like Rust/Pony/mutable value semantics/transactional semantics/etc).
So, not everyone can serialize closures, but maybe we should fix that first and the rest will follow easily?
You are going back to CORBA and RPC :) And it does not work well. There are reasons why, but fundamentally, things get pretty funky. The BEAM can do it because you are not expected to do anything “collaborative” here, and so it works well to ship a self-contained piece of memory with some runtime behavior (aka code). But things get really funky when you want to do distributed and collaborative, or even worse, stateful work.
And keep in mind that there is more state than you realize. A TCP connection, or even a db connection, is state.
At this point, it is usually better to move only data instead of functionalizing again so that the part that holds the state can interpret the data “as it locally fits”. When you ship the whole closure, it means you are shipping your localized context too, both in terms of space and time, and that has a really short, and non easy to define, lifespan.
Basically, the behavior you told your runtime to have in the closed-over function is now out of sync with the reality of what is needed.
And I say that as someone who defends the BEAM and is full-on Elixir for years. It definitely has its space. But it really does not solve the problems you are looking at here. Usually the BEAM prefer to ship data and messages instead of closures for that reason.
I do wonder if the strictness of some newer languages like Austral (linear type - e.g. TCP connection must die in this execution context) or Roc (where the language itself is functional and total, and you explicitly make “platform” calls for effects) could help mitigate these problems?
Not really? I mean you cannot pack the connection into the closure, only a remote reference to it. Keeping this reference alive properly is… Not fun across network.
Among others… What if i restart the app/crash it?
Types across network have seen some interesting developments in session types but last i checked we are still reaaaaaally far from practically useable stuff.
Maybe I misunderstand completely, but possibly we agree?
I was suggesting that if the connection were represented by a linear type that had to be released synchronously, it would be impossible to keep the connection for later (in a closure passed to the async/distributed runtime). E.g. the compiler might force you to write your API in a way that used short-lived connections only.
Well the problem is that then you cannot serialise it unless you shutdown your connection. Which means you cannot serialise everything, only things where the state can be closed over.
It is… Surprisingly inefficient to do this usually
So, not everyone can serialize closures, but maybe we should fix that first and the rest will follow easily?
I wonder how easy it will be to accidentally include way too much in those closures. Like if you access a list or map of some sort and then in a different context it turns out to have a lot more in it than you expected, how easy will that be to avoid and / or debug. The BEAM has pretty good introspection tooling, but I wonder if that will still work in production on Fly?
Every time a file is opened, Elixir spawns a new process. Writing to a file is equivalent to sending messages to the process that writes to the file descriptor.
This means files can be passed between nodes and message passing guarantees they can write to the same file in a network.
Ecto Databases are a bit different in that function calls don’t take an opaque value such as a reference, PID, or port, but rather use an atom (the module name) to identify the database. Function calls merely go to the given database’s connection pool for the current node.
Wait, did you mean generalize as in for other languages/runtimes?
Yeah, good point - that’s a real problem with having it compile fully automatically. You’re right that you’d want some tooling - but I wonder if the tooling could be compile-time like syntax to make captures explicit, or inferred static types of the closures where you can inspect it easily with intellisense, etc.
The article actually links to a JS PoC. The call just JSON serializes the args and return values. No support for closures or globals, but you’ll have to tell me if that’s a deal breaker or not.
index.mjs
import runMath from './runMath.mjs'
async function main() {
console.log(await runMath(100, 20))
}
main()
runMath.mjs
import runOnAnotherMachine from "./runOnAnotherMachine.mjs"
export default runOnAnotherMachine(function runMath(a, b) {
return a + b
}, {
meta: import.meta,
guest: {
cpu_kind: "shared",
cpus: 2,
memory_mb: 1024
}
})
I’ve “serialized” “closures” in the Java 1.6 days by doing something similar. Copy the closed over data to a data structure that can be serialized and capture a reference to which class has the code. It works, but is very manual and you can’t, say, serialize a database connection, or even a file handle if you’re planning to invoke it somewhere else.
In my case, I put them on an AMQP bus and they got “invoked” in 2 different data centers… this was a good example of low impact for failures. 🤣
The other threads are talking about serializing closures, but I want to talk about the capability for a service to start a new sub-service!
FLAME is Fly demo’ing the power of their machine API; it has
start,stop,waitand everything you’d expect from an operating system’s process control API. (n.b.fly_process_group) The conspicuous absence is resource limits— which I suppose Fly is OK with as they have your credit card.But imagine if each container (group!) had a defined reservation of CPU, memory, (ephemeral) storage, bandwidth, … which it could apportion out to any started sub-services. Basically:
posix_spawnbut for services on a mesh network; instead of file descriptors, we have IPv6 addresses.@andyc said Kubernetes is Our Generation’s Multics. @catern advocates single program distributed systems. @sluongng points at Bazel’s Remote Build Execution (RBE) API … but AFAICT builds aren’t supposed to start sub-builds.
Why can’t I
starta service?I read this and feel like I’m missing something important. The idea is that you capture the state of the program, launch a copy of your code and then restore the state and carry out the task. Is the only difference between this and traditional background job frameworks the lack of a queue? The linked Elixir code just seems to take advantage of OTP’s clustering, with nodes coming and going… so, “queueless,” but if you “autoscale” Rails workers, and eliminate / constrain the queue, isn’t that effectively the same thing?
I’m scratching my head here looking for something innovative and I’m not really finding it. Fly makes the “you don’t have to configure anything because you can just do
flyctl run ...” or whatever is equivalent, but … yeah, thats part of why Heroku was so successful, too. Turns out that when you use a platform that makes it trivial to do traditionally hard to setup things, you get some advantages.Am I missing something?
I think the interesting parts are:
And as for this, if you combine this with something like Zigler or Rustler to make it super-easy to drop in native code where needed, things we really interesting.
So, that’s the thing…99% of the stuff we do in Elixir and Erlang isn’t “innovative” so much as just solid, boring engineering with sane primitives.
The things that make the core of all this work are:
Node.spawn_monitorfunctionAnd that’s about it, honestly. That’s a feature, not a bug.
That was kind of my fear, to be honest. BEAM makes it trivial to solve some of these things, and so saying “We’re rethinking Serverless” is sort of an… “OK, but not everyone can serialize closures, what about the rest of us?”
At some level, as an industry we’ve gotta realize when our tools kinda suck and stop using them. One day doubtless we’ll say the same thing about the BEAM stuff (and some of us have, for certain usecases) when a genuinely better solution comes along–but for now, for webshit and for certain other problems, the Elixir and Erlang ecosystem is pretty much as good as it gets (in my opinion, of course).
I’ve been using Elixir for quite nearly a decade now. I have absolutely zero desire to switch stacks because of all the permanent clownishness elsewhere. I’ve done Python and Ruby and Javascript and Java and C and many others, and of them only SQL has really reliably come through for me in my darkest hour every time.
Like, it’s okay to just stop hurting ourselves and paying the sanity damage cost of worse tooling. It’s time to stop.
As someone that has also been on the BEAM for a decade or more, this is not really realistic. The vast majority of people in their careers are unable to affect the language choice of the applications they build. It is generally up to a small handful of people within a given organization. So simply saying that people should move on is unfair to these people.
Of course, people could choose to learn the language on their own time, but even that does not necessarily guarantee that you will get a job in it if you actually like it.
I mean, sure. But we all know that’s not going to just happen magically.
Yeah. :(
I haven’t used BEAM, but I am starting to wonder if serializing closures is actually the big missing piece in structured languages.
All those messages waiting on work queues in a distributed system really are just an example of “defuntionalizing the continuation” [0]. Instead of converting your future intent into a data structure manually, finding and managing a place to store that state, running services to pull those messages, decode them, and execute those tasks… why not just get the compiler to do all of that for me? I mean, that last sentence pretty much describes JavaScript
Promises and the task execution engine (except single thread and in memory, not distributed with fault tolerance etc). JavaScript works well for concurrent programming but not parallelism or distributed computing due to mutability everywhere - you can bring that to full parallelism with immutability as done here with BEAM (or else with managed/structured mutability like Rust/Pony/mutable value semantics/transactional semantics/etc).So, not everyone can serialize closures, but maybe we should fix that first and the rest will follow easily?
[0] https://www.pathsensitive.com/2019/07/the-best-refactoring-youve-never-heard.html
Well actually …
You are going back to CORBA and RPC :) And it does not work well. There are reasons why, but fundamentally, things get pretty funky. The BEAM can do it because you are not expected to do anything “collaborative” here, and so it works well to ship a self-contained piece of memory with some runtime behavior (aka code). But things get really funky when you want to do distributed and collaborative, or even worse, stateful work.
And keep in mind that there is more state than you realize. A TCP connection, or even a db connection, is state.
At this point, it is usually better to move only data instead of functionalizing again so that the part that holds the state can interpret the data “as it locally fits”. When you ship the whole closure, it means you are shipping your localized context too, both in terms of space and time, and that has a really short, and non easy to define, lifespan.
Basically, the behavior you told your runtime to have in the closed-over function is now out of sync with the reality of what is needed.
And I say that as someone who defends the BEAM and is full-on Elixir for years. It definitely has its space. But it really does not solve the problems you are looking at here. Usually the BEAM prefer to ship data and messages instead of closures for that reason.
All fair points, thanks.
I do wonder if the strictness of some newer languages like Austral (linear type - e.g. TCP connection must die in this execution context) or Roc (where the language itself is functional and total, and you explicitly make “platform” calls for effects) could help mitigate these problems?
Not really? I mean you cannot pack the connection into the closure, only a remote reference to it. Keeping this reference alive properly is… Not fun across network.
Among others… What if i restart the app/crash it?
Types across network have seen some interesting developments in session types but last i checked we are still reaaaaaally far from practically useable stuff.
Maybe I misunderstand completely, but possibly we agree?
I was suggesting that if the connection were represented by a linear type that had to be released synchronously, it would be impossible to keep the connection for later (in a closure passed to the async/distributed runtime). E.g. the compiler might force you to write your API in a way that used short-lived connections only.
Am I barking up the wrong tree?
Well the problem is that then you cannot serialise it unless you shutdown your connection. Which means you cannot serialise everything, only things where the state can be closed over.
It is… Surprisingly inefficient to do this usually
I wonder how easy it will be to accidentally include way too much in those closures. Like if you access a list or map of some sort and then in a different context it turns out to have a lot more in it than you expected, how easy will that be to avoid and / or debug. The BEAM has pretty good introspection tooling, but I wonder if that will still work in production on Fly?
I mean, you can’t generalize serialization of database connections, file handles, or really, any side effecting stuff.
Oh you absolutely can.
The 3rd codeblock in the parent article uses this to great effect
You can also, e.g., use the stream as stdin to a command.
Ecto Databases are a bit different in that function calls don’t take an opaque value such as a reference, PID, or port, but rather use an atom (the module name) to identify the database. Function calls merely go to the given database’s connection pool for the current node.
Wait, did you mean generalize as in for other languages/runtimes?
You can, here, but you can’t generally, which is what I was responding to.
Yeah, good point - that’s a real problem with having it compile fully automatically. You’re right that you’d want some tooling - but I wonder if the tooling could be compile-time like syntax to make captures explicit, or inferred static types of the closures where you can inspect it easily with intellisense, etc.
The article actually links to a JS PoC. The call just JSON serializes the args and return values. No support for closures or globals, but you’ll have to tell me if that’s a deal breaker or not.
index.mjsrunMath.mjsI’ve “serialized” “closures” in the Java 1.6 days by doing something similar. Copy the closed over data to a data structure that can be serialized and capture a reference to which class has the code. It works, but is very manual and you can’t, say, serialize a database connection, or even a file handle if you’re planning to invoke it somewhere else.
In my case, I put them on an AMQP bus and they got “invoked” in 2 different data centers… this was a good example of low impact for failures. 🤣