This is a good overview, but I wish it talked about potential solutions or directions too. Let me scribble some quick notes on that:
Virtual memory: With 64 bits all processes can share the same address space, so there’s really no need to remap addresses between processes; only permissions need to change.
Memory protection: less of an issue with “managed” runtimes (JS, Java…) that allow only safe accesses, and with compiled languages like Go and Rust that statically ensure safety (aside from “unsafe” calls of course.)
Filesystems: Nonvolatile memory is going to take over some of the filesystem’s use cases, like saving state, and maybe some databases. And I’m a big fan of replacing the filesystem abstraction with something more like a “soup” of persistent objects, or with systems that rely more on tagging than on directories.
Concurrency, IPC, networking: Lots of momentum these days for CSP and Actor -like mechanisms, I.e. lightweight threads that communicate by message passing not shared memory. And Capabilities are a really powerful way to add authorization to this type of messaging
Another area the authors didn’t address is the difficulty of the POSIX APIs, especially in the area of networking. Writing a solid, performant TCP-based client or server on those APIs is quite hard, with all sorts of gotchas; just look at doorstop books like Unix Network Programming. I’ve done it but I never want to do it again! This s a side effect of how the APIs grew over time and as more capabilities were added — for example, signals predate threads, so the two don’t work well together.
Nonvolatile memory is going to take over some of the filesystem’s use cases, like saving state, and maybe some databases. And I’m a big fan of replacing the filesystem abstraction with something more like a “soup” of persistent objects, or with systems that rely more on tagging than on directories.
Early 90s Apple employee detected - who else would call single-level storage/persistent memory soup?
Guilty as charged, but there’s been some definite interest recently in the Newton architecture — what I know of it comes mostly from reading old Apple manuals that were linked here in the last year or two.
Virtual memory: With 64 bits all processes can share the same address space, so there’s really no need to remap addresses between processes; only permissions need to change.
Memory protection: less of an issue with “managed” runtimes (JS, Java…) that allow only safe accesses, and with compiled languages like Go and Rust that statically ensure safety (aside from “unsafe” calls of course.)
One of the current CHERI projects involves flat address spaces with POSIX. This augments vfork with a coexecve system call, which is like execve but launches the new process in the same address space as the caller (but with a disjoint set of capabilities).
Filesystems: Nonvolatile memory is going to take over some of the filesystem’s use cases, like saving state, and maybe some databases. And I’m a big fan of replacing the filesystem abstraction with something more like a “soup” of persistent objects, or with systems that rely more on tagging than on directories.
One of the HP engineers on the The Machine put this best: the problem with persistent single-level stores is that your memory safety errors last forever. CHERI helps here, now only your type-safety bugs last forever.
There’s a much bigger problem with using NVM as memory though: you have two kinds of consistency and both matter. You have consistency that arises from cache coherency (things like the C++11 memory model), which defines the order in which other cores can observe writes to shared memory. You also have the consistency that arises from writes to persistent memory, which defines the order in which stores become visible after a crash. Providing a model that is consistent across both is incredibly hard (well, it’s easy, unless you want performance to be better than totally awful).
There’s also the locality issue with filesystems. Increasingly, applications store data in cloud services with local caches. All of the problems that CODA tried to solve are still there but, for security, the shared-namespace aspects of a filesystem are less important.
This is also interesting for server-side applications, where the storage is often provided by a different VM or other service and so the ‘local’ OS needs to expose only a low-latency network interface.
Concurrency, IPC, networking: Lots of momentum these days for CSP and Actor -like mechanisms, I.e. lightweight threads that communicate by message passing not shared memory. And Capabilities are a really powerful way to add authorization to this type of messaging
One of the HP engineers on the The Machine put this best: the problem with persistent single-level stores is that your memory safety errors last forever. CHERI helps here, now only your type-safety bugs last forever.
I don’t envision NVM being used just like a regular heap only it lasts forever. I see it more like a memory-mapped database without the paging. That is, ideally you can use the persistent data as regular runtime data structures (including pointers!) but there’s more ceremony involved in writes to ensure ACID-type guarantees. That way, anything that might affect memory/type safety is going through some code that’s hopefully as well-debugged as SQLite.
That might mean it isn’t single-level storage, but that’s OK; it’s not too hard to make it look that way to the application, and it will probably be lighter-weight than an ORM.
That still requires you to hide behind fairly complex proxy objects because direct mutation is unsafe. You need explicit cache writebacks to guarantee that data is committed to persistent memory but you need barriers to ensure that it is visible to other threads. It’s hard to unify these because barriers typically cost tens of cycles or less, writebacks cost thousands. Concurrent mutation of data structures in NVM is an active research area and is hard.
Virtual memory: With 64 bits all processes can share the same address space, so there’s really no need to remap addresses between processes; only permissions need to change.
Security, security, security.
Memory protection: less of an issue with “managed” runtimes (JS, Java…) that allow only safe accesses, and with compiled languages like Go and Rust that statically ensure safety (aside from “unsafe” calls of course.)
And more security. Most runtimes that “allow only safe accesses” run in user space, which means they protect the machine from bad code handed to the runtime. But, they do nothing to protect against bad actors and bugs in these runtimes.
With 64 bits all processes can share the same address space, so there’s really no need to remap addresses between processes; only permissions need to change.
What does this buy you though, except for perhaps saving a little bit of memory from page tables? You cannot know how much memory each process will need a priori, so how much of the address space do you partition for each process? I suspect you would quickly run into fragmentation problems which is precisely one of the issues page-based virtual memory was intended to solve.
I am not a kernel expert, but I hear it’s a performance win not having to remap memory during system calls and context switches.
I don’t see why fragmentation would be a problem. The space available is on the order of 10 billion gigabytes. To fragment that, a pretty sizeable fraction of it would need to be allocated, and processes would have to be making some huge individual allocation requests.
However, contemporary applications rarely run on a single machine. They increasingly use remote procedure calls (RPC), HTTP and REST APIs, distributed key-value stores, and databases,
This is a good overview, but I wish it talked about potential solutions or directions too. Let me scribble some quick notes on that:
Another area the authors didn’t address is the difficulty of the POSIX APIs, especially in the area of networking. Writing a solid, performant TCP-based client or server on those APIs is quite hard, with all sorts of gotchas; just look at doorstop books like Unix Network Programming. I’ve done it but I never want to do it again! This s a side effect of how the APIs grew over time and as more capabilities were added — for example, signals predate threads, so the two don’t work well together.
Early 90s Apple employee detected - who else would call single-level storage/persistent memory soup?
Guilty as charged, but there’s been some definite interest recently in the Newton architecture — what I know of it comes mostly from reading old Apple manuals that were linked here in the last year or two.
Much has been lost.
One of the current CHERI projects involves flat address spaces with POSIX. This augments
vfork
with acoexecve
system call, which is likeexecve
but launches the new process in the same address space as the caller (but with a disjoint set of capabilities).One of the HP engineers on the The Machine put this best: the problem with persistent single-level stores is that your memory safety errors last forever. CHERI helps here, now only your type-safety bugs last forever.
There’s a much bigger problem with using NVM as memory though: you have two kinds of consistency and both matter. You have consistency that arises from cache coherency (things like the C++11 memory model), which defines the order in which other cores can observe writes to shared memory. You also have the consistency that arises from writes to persistent memory, which defines the order in which stores become visible after a crash. Providing a model that is consistent across both is incredibly hard (well, it’s easy, unless you want performance to be better than totally awful).
There’s also the locality issue with filesystems. Increasingly, applications store data in cloud services with local caches. All of the problems that CODA tried to solve are still there but, for security, the shared-namespace aspects of a filesystem are less important.
This is also interesting for server-side applications, where the storage is often provided by a different VM or other service and so the ‘local’ OS needs to expose only a low-latency network interface.
Completely agreed.
I don’t envision NVM being used just like a regular heap only it lasts forever. I see it more like a memory-mapped database without the paging. That is, ideally you can use the persistent data as regular runtime data structures (including pointers!) but there’s more ceremony involved in writes to ensure ACID-type guarantees. That way, anything that might affect memory/type safety is going through some code that’s hopefully as well-debugged as SQLite.
That might mean it isn’t single-level storage, but that’s OK; it’s not too hard to make it look that way to the application, and it will probably be lighter-weight than an ORM.
That still requires you to hide behind fairly complex proxy objects because direct mutation is unsafe. You need explicit cache writebacks to guarantee that data is committed to persistent memory but you need barriers to ensure that it is visible to other threads. It’s hard to unify these because barriers typically cost tens of cycles or less, writebacks cost thousands. Concurrent mutation of data structures in NVM is an active research area and is hard.
Security, security, security.
And more security. Most runtimes that “allow only safe accesses” run in user space, which means they protect the machine from bad code handed to the runtime. But, they do nothing to protect against bad actors and bugs in these runtimes.
There are OSes that handle single-address space and static safety with managed languages fine - IBM i is the most commercially successful example.
What does this buy you though, except for perhaps saving a little bit of memory from page tables? You cannot know how much memory each process will need a priori, so how much of the address space do you partition for each process? I suspect you would quickly run into fragmentation problems which is precisely one of the issues page-based virtual memory was intended to solve.
I am not a kernel expert, but I hear it’s a performance win not having to remap memory during system calls and context switches.
I don’t see why fragmentation would be a problem. The space available is on the order of 10 billion gigabytes. To fragment that, a pretty sizeable fraction of it would need to be allocated, and processes would have to be making some huge individual allocation requests.
I’m seeing an increasing trend of pushback against this norm. An early example was David Crawshaw’s one-process programming notes. Running the database in the same process as the application server, using SQLite, is getting more popular with the rise of Litestream. Earlier this year, I found the post One machine can go pretty far if you build things properly quite refreshing.