1. 7

I had some thoughts about what I would be curious to see in an interpreted language and want to get some feedback. Do you think this set of features could work in a programming language? is it useless? would be too slow?

If you are referencing a specific point, please cite the number, feel free to suggest your own ideas, but try not to hijack the topic too far from the features in the gist.

  1.  

  2. 6

    This sounds interesting, and seems close to some ideas I brainstormed for Oil shell. I wanted something like this, but I’m not sure if Oil will get there in the next 2 years, because I started at the beginning with bash :)

    But yes I think what you are describing is interesting and you should try it.

    In particular I had thrown around this idea of threads + channels being isomorphic to processes + pipes – i.e. somehow you could use the same shell-like syntax for both. That would naturally lead to the “mutability within an actor / immutable across actor boundaries” rule you propose – that’s exactly how Unix processes work.

    (I’m not claiming I invented this idea – e.g. there’s a chapter in Programming in Lua that talks about using multiple Lua interpreters assigned to threads, and message passing requires a copy. There’s a paper on it as well.)

    I wanted to have cheaper message passing than serializing over a pipe. That could involve persistent data structures (Clojure-style), or simply transferring ownership like you say. This would mean enforcing a “delete” when you send over a channel.

    A few years ago I wrote and deployed a highly concurrent C++ program that used this convention, and I liked it very much. I based it on the Go model, and although I don’t use Go, the concurrency model is pretty much my “default” now, but with plain threads rather than goroutines. (I don’t use Go mainly because I don’t like that it has its own OS, and beacuse I prefer metaprogramming in dynamic languages rather than weak-ish static typing.)


    The idea of using the Unix process security model (rather than inventing your own) is also very related to the shell. The “Bernstein chaining” idea of daemontools is basically how you would express this in a shell-like language:

    http://www.oilshell.org/blog/2017/01/13.html

    I wanted (and still want) to make Oil a good language for containers.

    Ironically, Go turned out to be the language of containers, with Docker and Kubernetes and the like. But I feel it is fairly horrible for this use case because it has M:N threading, and syscalls in Linux refer to threads. For example, stuff like this:

    https://github.com/golang/go/issues/1435

    (I had saved that issue like 3 years ago, and it’s still being updated?)


    I was also influenced by the Capsicum model and Chrome sandboxing. As far as I know, Chrome was the first major browser to do this, with Firefox and Microsoft (and maybe Apple) all following, sometimes lagging years behind.

    https://www.cl.cam.ac.uk/research/security/capsicum/

    Anyway I think that there needs to be a shell-like language to express processes, data, and capabilities. But I also think it would be useful to have a more tightly coupled model, with actors running in threads in the same VM, which sounds like what you are proposing. I may have projected my own ideas onto what you’re saying, but it sounds pretty similar.

    Other points:

    • An obvious thing about Unix processes is that if one crashes, the operating system doesn’t crash! It would be nice to have these fault-tolerant properties for threads.

    • About remote or local actors: I think it is wise that Go left this out. It would have been very tempting to add this, given that one of the main environments for Go is Google’s data centers. I think we still don’t know enough about how to write distributed systems to really bake this in at a deep language level. There still need to be multiple choices, libraries, frameworks, etc. (And I say that after working at Google for over a decade.)

    • Unix has resource limits for processes obviously; I think you can apply this to threads. Threads can be assigned to a NUMA region on Linux too. (There is a good paper on this: “Your computer is already a distributed system; why isn’t your OS?” [1] )

    I do agree that there might be too much similarity to either Erlang or Go. What is wrong with those two languages in your opinion? (I gave my opinions on Go; I have less experience with Erlang, but I think it has some limitations too.)

    However, I don’t think either Erlang or Go has any concept of Unix processes, and that is one thing that would be different about my approach if I were to go down this road (which I won’t have time for any time soon).

    Anyway, if you want to kick around more ideas like this I’m interested :)

    [1] https://scholar.google.com/scholar?cluster=9454209336002200388&hl=en&as_sdt=0,5&sciodt=0,5


    Addendum: I might not emphasized that the benefit of the threads + channels model over processes + pipes is that channels can carry structured data, and they could even be typed like they are in Go. In other words, sometimes it’s a pain to serialize a whole data structure to a file. (It’s the right thing more often than people think though.)

    Also I was sticking to OS threads because I want the kernel to do the work; not sure if you were suggesting user-space threading.

    1. 5

      Go is a nice language and I think Go 2 will be great now they understand the kinks, currently it is just a workhorse language with nothing special, and lots of dumb things. It doesn’t make debugging, fault/security isolation easy, which is laughable for a server language in my opinion.

      Erlang seems really nice, and though I am very much a novice, I can see why it is so powerful, but my biggest hate on erlang is shared nodes are one big trusted computing environment. I want to be able to be able to have secure separation of parts of software, while at the same time having great ease of use. e.g. My web server code should be able to create stripe payments, but not read the secret key on the other end of the request.

      Designing secure and reliable systems seems difficult for no real good reason. Openbsd uses a lot of privsep to make a secure/fault isolating environment, but writing code to do lots of this is currently far too tedious. Erlang is the only language I know that has made remote/local interaction seamless, it just wasn’t designed for security, just fault isolation.

      With regard to reference capabilities, I’d like to be able to protect my system from third party libraries, as stupid as it sounds, they won’t be able to delete files they don’t have a reference to! :) Currently I use the google cloud sdk a lot, and it is thousands of lines of code with arbitrary access to my system, just disgusting and I can’t avoid it. It’s too late for us, but if we make better programming models the people of the future won’t have to suffer so much.

      1. 4

        Absolutely, I think a Go- or Erlang- type language with security primitives is a great idea. It’s a distinct niche.

        However if you want to use the kernel for security (which I think is a good idea), then it seems you’re limited to processes? Most security properties don’t apply to OS threads, let alone user-space threads.

        Then it starts to look more like a shell, which I like :) I think of the shell as a language for specifying the architecture of distributed systems. You are starting processes and connecting them with ports. And the processes have heterogeneous security properties (user IDs, etc.), not homogeneous like a typical threaded application.

        (And I consider a collection of processes on the same machine a distributed system. Some of your processes might be dead, you might have communication failures, etc. Threaded programs generally don’t consider these failure modes – it’s all or nothing.)

        Protecting both your program and OS from third-party libraries is absolutely not a weird requirement. The security team at Google would always (and still does I’m sure) make application teams using third-party libraries like PDF decoders put them in a separate process. Either as a separate service, accessible via RPC, running with different privileges, or inside a ptrace sandbox. (The ptrace sandbox requires a separate process too.)

        When handling user data, you’re not allowed to just link whatever code you want into the same process without thinking about security. There are also plenty of server programs that are gigabytes of statically-linked executable code with hundreds-of-gigabyte heaps and hundreds/thousands of threads. Some internal isolation starts to sound like a really good idea in that case!

        I also think programs like VLC with a lot of decoders could benefit from some Chrome-like process isolation. DJB has an old article about MP3 decoders and privilege separation in “classical” Unix (which I wasn’t able to find last time I looked)

        Some other points:

        • I think remote/local is a bit of a false dichotomy. I think at the least you want WAN-latency / LAN-latency / local. What WAN and LAN share is that you have CAP-style problems. But local latency and LAN latency are closer together than WAN latency, at least in modern data centers (e.g. microseconds vs. tens/hundreds of milliseconds for WAN).
        • As mentioned I think trying to make remote/local “invisible” is a bad idea. I think “similar” or “analogous” is what I would like to see.
        • I mentioned Capsicum because I think it would be interesting to integrate the ideas of file-descriptors-as-capabilities and objects-as-capabilities in a language. I think one issue with reading programs and reasoning about security is that there are all these obscure system calls and /cgroup type namespaces, and everyone fits them into their program differently. But I think there should be some way to express OS-level capabilities in the language consistently.

        I’ve been idly thinking about a special syntax for function parameters that are capabilities/file descriptors… not sure I came up with anything great.

        I think one reason it’s so hard to make secure systems is that the kernel is in the best position to enforce security policies (by definition, since it runs in a different privilege mode). But kernels export different interfaces (Windows/Mac/Linux/BSD), and applications need to be portable. So Chrome has a sandbox for each operating system. It even has multiple sandboxes for different version of Linux as far as I know.

        So if there is a language with such security features, then it will necessarily have to contend with this problem. But it will save applications from having to do it, hopefully.

    2. 4

      I would recommend spending some time around Racket’s threads, channels and events.

      The threads there can be killed at any moment and you need to design your message exchange protocols accordingly. To lock, you need to create an actor that holds the lock table. What happens if it crashes? Who knows. It’s an environment all right, but in my opinion not very comfortable one.

      Transparent parallelization / distribution that can offload pure but expensive computations to a different machine would be much cooler in my opinion.

      1. 1

        thanks for the link, I had no idea racket had garbage collectible threads. really interesting.

      2. 3

        It does sound interesting. I would suggest reviewing https://github.com/monte-language/monte since it is already trying to do about 85% similar stuff (though it focuses on object capabilities as it’s core).

        Most notably, I don’t think they intend to make the Unix security model first class.

        1. 1

          Thanks for pointing this out, finding monte is a great result of this post. I had no idea it existed.

        2. 3

          This is a neat combination of features.

          Point 3 seems like a missed opportunity to me. The unix security model places an amount of trust in programs that’s no longer appropriate for the most popular computing environments. Android and iOS have a more fine-grained permissions model because most programs people want to use that shouldn’t be granted read & write access to all of a user’s files (especially browser cookies that grant access to banks, email, etc.).

          Of course, if this is for compsci practice or intended solely for sysadmin scripting or something this would be overkill or inappropriate. Could you talk some about your motivations for starting to design a language?

          1. 1

            It could be true that there is no use case, its really just my mind wandering. I’ve been writing a lot of Go services, and I really just want to be able to deploy something where I don’t have to worry, but at the same time not have super slow development time because I spend X number of days setting up chroots and other things manually.

          2. 3

            I have also been experimenting with some similar concepts in the VM I’ve been working on. With respect to the ideas on remote vs. local execution, I completely agree with @andyc when he mentions:

            As mentioned I think trying to make remote/local “invisible” is a bad idea. I think “similar” or “analogous” is what I would like to see.

            This just reminds me of things like RPC frameworks that give you an object-oriented interface to work with or even something like network filesystems where you never know what method invocation or which sync call is going to stall out.

            That being said, there is an argument for making remote and local operations identical at the language level, primarily for ergonomics. For my use case, I’ve basically ended up with a kind of multiple-value-continuation construct that has at least allows a programmer to handle failures of this sort.

            One last thing to think more about is which resources must be bound to local execution. One example that you briefly mentioned was interactions with files and mutations of objects. But perhaps there are other reasons to pin some computation to a certain actor, e.g. network locality, memory locality, hard CPU requirements, etc. These things tend to be a little more “fluffy” to express through syntax. The only strategy I’ve entertained for this has been by introducing a way for the programmer to indicate that a section of code has some direct attachment to it’s defining scope (in my VM, it’s called “gravity”). This let’s the VM better understand how to schedule the delimited section of code to run more optimally. This may or may not be a major concern for your use case, but I think in the least it’s good to think about it.

            1. 1

              I think heartbeats and an erlang model of recovering from failures that can occur at any line might deal with remote processes disappearing, but not totally certain.

            2. 3

              Reminds me of languages that existed when I studied agent-oriented programming. Telescript and Obliq had some of these properties. E is worth looking into for capability-stuff.

              https://en.wikipedia.org/wiki/Telescript_(programming_language)

              https://en.wikipedia.org/wiki/Obliq_programming_language

              https://en.wikipedia.org/wiki/E_(programming_language)

              Just throwing out some stuff for you to look at in case you find anything useful in there. Also, on resource controls, one thread I read somewhere had people using Lua for some of these reasons with some integration with OS’s features for resource monitoring/control. I guess they sort of made a spec, created a Lua image out of it + configuration for OS mechanisms, and ran those together. I can’t remember link or if they built it, though.

              1. 2

                Thank you, I will check them out. The E language looks especially close to what I had in mind!

                edit: and monte looks very interesting.

              2. 2

                Also, one thing to consider might be prototyping the semantics / VM in Lua? The Lua VM is re-entrant, has no “capabilities” unless you explicitly add them, and doesn’t have a lot of baggage like a mess of #ifdefs.

                Although I don’t like programming in Lua itself, I think the idea of prototyping a language in a dynamic language worked pretty well for me (even considering my problems with parser speed).

                Although one thing is that Lua already has coroutines, which might not be what you want and could conflict with your own concurrency primitives. (or they might be exactly what you want, which could be good.)

                There might be some be small Lisps that could help prototype/bootstrap it. I tried this with femtolisp and the shell, but it wasn’t all that successful. femtolisp is smaller than Lua but it is a bit particular and not re-entrant as far as I remember, which is bad for concurrency.

                1. 1

                  I’ve decided to prototype the runtime as a golang library. Then potentially build an interpreter that is essentially a DSL to deal with the library. Its just a hobby, so no promises on if it will ever do anything useful. I’m keen to try some sort of tree routing and heartbeats between processes.