I agree with the main premise of this article but I think that they’re not communicating it very well. For one thing, their linguistic distinction is misplaced. Neither S3 nor POSIX are filesystems. File systems are formats for storing directory structures on disk. Neither S3 nor POSIX are filesystems, they are APIs. Examples of filesystems are:
EXT4
BTFS
HDFS
Its funny how they build this around a a linguistic distinction and then get that wrong. They also get wrong the part about how in a “real file system you can use databases”, this is totally off the mark. There are plenty of POSIX compatible filesystems where you cannot use databases, for example, a mounted CD or Tape drive.
The main gripe I have though, with this article, is their use of the word “first”: “it is not a surprise the S3 was the first popular cloud API”. Why are technologists so obsessed with calling things the first? The first popular could API was probably telnet. There were telnet systems that had all modern aspects of the cloud, including pay by the hour/minute for use. Why do we continuously pretend that history does not exist beyond the backs of our ears?
Yeah, I concede I’m not doing an excellent job WRT the term “filesystem” vs “file api” vs whatever in the article. It is difficult and the sense of “filesystem” I am considering here is the one from the sentence “the filesystem of this machine”, eg the stuff mounted under /.
Though if I could put in my own little bit: please could you avoid quote marks when you’re paraphrasing.
Kinda curious about any system you can give as an example that was accessed via telnet, paid by usage, and had scalable capacity and elasticity which were programmatically controlled and on demand.
Oddly enough the one he quoted is the WELL (Whole Earth ’Lectronic Link) BBS, which you actually could telnet to pretty early on (in the 2000s?). But it was still just a VAX in somebodies basement that you rented shell time on. And Telenet is still not telnet.
IBM AS/400 - distributed, scalable, on-demand (pay for MIPS) API controlled application server, 1988
IBM Db2 - distributed, scalable, API controlled, on-demand (pay for MIPS) SQL database, 1983
IBM IMS - transactional, scalable, distributed, API controlled hierarchical database, 1966
And pertaining to file systems, there’s also zFS, which is circa 1995. Same properties as all of the above. I hate IBM products (especially the above and mainframes), but one has to say they invented most of the distributed, scalable, API controlled, on-demand things before they were called “cloud”.
Practically infants if you want to quibble over terms. Look up ‘IBM Service Bureau’. They’ve been renting ‘calculation’ time since before there were digital computers (mechanical tabulators). From a customer perspective, on-demand, elastic, distributed (multi-city) & scalable. Maybe not API per the current fashion until recently, but certainly by the time JCL, etc., came around in the OS/360 days there was arguably an orchestration facility. And IBM wasn’t the only one…CDC for example had a service bureau business and eventually bought IBMs.
Cloud is an evolution of things that have been a thing for a very long time.
It’s a good attempt, but rather crude and incomplete in the end. I’d recommend S3Backer over it if you use the “bucket” just as a file system on one server at a time.
In the cloud computing graduate course at CMU, one of the assignments is to make a real filesystem on top of S3. This class project lead to several different commercial “cloud gateway” projects.
Filesystem software, especially databases, can’t be ported to Amazon S3
This seems mistaken. Porting databases that run on local disk to S3 seems like a good way to get a lashing from https://aphyr.com/
Can any databases do it correctly?
If so, I doubt they work with the model of partial overwrites. They probably have to do something very custom, and either sacrifice a lot of tail latency, or their uptime is capped by the uptime of a single AWS availability zone. Doesn’t seem like a great design.
litestream is a good example because they document the fact that data can be lost! If you don’t make false claims about durability, then you don’t get a lashing from Aphyr.
By default, Litestream will replicate new changes to an S3 replica every second. During this time where data has not yet been replicated, a catastrophic crash on your server will result in the loss of data in that time window.
Though I guess you can say this is about replication. You could just run MySQL connected to S3, and no replication ??? Sounds like a bad idea to me.
I really fail to see the use case of porting file system software like databases to S3. It seems like a very naive design.
Though probably people do it and get use out of it, e.g. for metrics/logging/monitoring, where you don’t care much about durability or uptime.
As far as I’ve seen litestream doesn’t try to make any claims of a consistency model other than “it’s asynchronous and has the potential for data loss in these situations” which is already enough to not have aphyr on your back. He’s generally motivated by claiming your system meets some consistency standard, and then proving it doesn’t.
The other trend I’ve seen in S3 is that using it by itself can replace a lot of things that would otherwise need a local filesystem + clustering. Just upload / download a file to / from s3 and be done with it.
You can’t “just” shim the file system with object storage with the same data layout and be happy. you can’t use it as primary storage media, but you can easily spill cold data there. It slots pretty nicely into a temporal db like Datomic, see https://docs.datomic.com/cloud/whatis/architecture.html
Neon is a Postgres fork that moves cold data to s3.
Something like Litestream/LiteFS or AWS RDS you can view the “async replication to s3” as a backup system where you tolerate some loss, but it’s still a DBMS integrating s3 for its known properties.
Yeah that’s a good distinction, but the original article seems to be suggesting something very naive that doesn’t work:
Databases of all kinds need a place to put their data. Generally, that place has ended up being various files on the filesystem. Postgres maintains two or three files per table, plus loads of others for bookkeeping. SQLite famously stores everything in a single file. MySQL, MongoDB, Elasticsearch - whatever - they all store data in files.
When you say “cold data”, I still read that as “the warm data could be lost under certain circumstances” …
Really what I would look for is (1) explicit claims about durability, and (2) a third party (e.g. Aphyr) actually tested the claims
If there’s no claim, then it’s impossible to test :)
I agree with the main premise of this article but I think that they’re not communicating it very well. For one thing, their linguistic distinction is misplaced. Neither S3 nor POSIX are filesystems. File systems are formats for storing directory structures on disk. Neither S3 nor POSIX are filesystems, they are APIs. Examples of filesystems are:
Its funny how they build this around a a linguistic distinction and then get that wrong. They also get wrong the part about how in a “real file system you can use databases”, this is totally off the mark. There are plenty of POSIX compatible filesystems where you cannot use databases, for example, a mounted CD or Tape drive.
The main gripe I have though, with this article, is their use of the word “first”: “it is not a surprise the S3 was the first popular cloud API”. Why are technologists so obsessed with calling things the first? The first popular could API was probably telnet. There were telnet systems that had all modern aspects of the cloud, including pay by the hour/minute for use. Why do we continuously pretend that history does not exist beyond the backs of our ears?
Old man yells at the Cloud.
I’m 33 years old. Back in my day you weren’t old untill you were at least 40!
I’m 42 in a couple months, so I’ll yell for you.
This definition feels weirdly restrictive too— I’m assuming I can replace “disk” with “block device.”
What about virtual file systems? What about network file systems? What about cluster file systems? What about streaming file systems?
Yeah, I concede I’m not doing an excellent job WRT the term “filesystem” vs “file api” vs whatever in the article. It is difficult and the sense of “filesystem” I am considering here is the one from the sentence “the filesystem of this machine”, eg the stuff mounted under
/.Though if I could put in my own little bit: please could you avoid quote marks when you’re paraphrasing.
Sorry about the quote marks, you’re right.
Kinda curious about any system you can give as an example that was accessed via telnet, paid by usage, and had scalable capacity and elasticity which were programmatically controlled and on demand.
“ 08/29 415-332-6106 well Sausalito CA 3/12 24 VAX 750 - BSD 4.2, multiple lines, Telenet access, Picospan bbs fees: $8/month, $3/hour direct, Telenet $20/$4 hour (peak/off peak) “
Before my time…
https://retrocomputing.stackexchange.com/questions/14879/what-were-the-major-public-access-unix-systems-available-in-the-1980s-90s
Telenet wasn’t telnet. And these systems were all direct dial— you used a modem to connect to them via your landline.
God, I am old. 😭
Oddly enough the one he quoted is the WELL (Whole Earth ’Lectronic Link) BBS, which you actually could telnet to pretty early on (in the 2000s?). But it was still just a VAX in somebodies basement that you rented shell time on. And Telenet is still not telnet.
We’re both old. :-)
So, no scalable capacity, no elasticity, no actual API to control the resources. I think calling this “early cloud” is a stretch.
Cloud might be a somewhat loosely defined concept but that doesn’t mean you can boil it down to “paying for remote access”.
IBM AS/400 - distributed, scalable, on-demand (pay for MIPS) API controlled application server, 1988
IBM Db2 - distributed, scalable, API controlled, on-demand (pay for MIPS) SQL database, 1983
IBM IMS - transactional, scalable, distributed, API controlled hierarchical database, 1966
And pertaining to file systems, there’s also zFS, which is circa 1995. Same properties as all of the above. I hate IBM products (especially the above and mainframes), but one has to say they invented most of the distributed, scalable, API controlled, on-demand things before they were called “cloud”.
Practically infants if you want to quibble over terms. Look up ‘IBM Service Bureau’. They’ve been renting ‘calculation’ time since before there were digital computers (mechanical tabulators). From a customer perspective, on-demand, elastic, distributed (multi-city) & scalable. Maybe not API per the current fashion until recently, but certainly by the time JCL, etc., came around in the OS/360 days there was arguably an orchestration facility. And IBM wasn’t the only one…CDC for example had a service bureau business and eventually bought IBMs.
Cloud is an evolution of things that have been a thing for a very long time.
Ok, those are better examples
“Get off my lawn”
What about S3 FUSE which makes a POSIX compatible -ish API on top of S3?
It’s a good attempt, but rather crude and incomplete in the end. I’d recommend S3Backer over it if you use the “bucket” just as a file system on one server at a time.
In the cloud computing graduate course at CMU, one of the assignments is to make a real filesystem on top of S3. This class project lead to several different commercial “cloud gateway” projects.
This seems mistaken. Porting databases that run on local disk to S3 seems like a good way to get a lashing from https://aphyr.com/
Can any databases do it correctly?
If so, I doubt they work with the model of partial overwrites. They probably have to do something very custom, and either sacrifice a lot of tail latency, or their uptime is capped by the uptime of a single AWS availability zone. Doesn’t seem like a great design.
litestream comes to mind. It replicates the wal of a sqlite database via s3.
There’s also WarpStream that is a Kafka API compatible system on top of S3.
I know this isn’t exactly “porting,” but shows it’s possible, probably.
litestream is a good example because they document the fact that data can be lost! If you don’t make false claims about durability, then you don’t get a lashing from Aphyr.
I asked about that a couple years ago - https://news.ycombinator.com/item?id=30884290
https://litestream.io/tips/
Though I guess you can say this is about replication. You could just run MySQL connected to S3, and no replication ??? Sounds like a bad idea to me.
I really fail to see the use case of porting file system software like databases to S3. It seems like a very naive design.
Though probably people do it and get use out of it, e.g. for metrics/logging/monitoring, where you don’t care much about durability or uptime.
As far as I’ve seen litestream doesn’t try to make any claims of a consistency model other than “it’s asynchronous and has the potential for data loss in these situations” which is already enough to not have aphyr on your back. He’s generally motivated by claiming your system meets some consistency standard, and then proving it doesn’t.
The other trend I’ve seen in S3 is that using it by itself can replace a lot of things that would otherwise need a local filesystem + clustering. Just upload / download a file to / from s3 and be done with it.
You can’t “just” shim the file system with object storage with the same data layout and be happy. you can’t use it as primary storage media, but you can easily spill cold data there. It slots pretty nicely into a temporal db like Datomic, see https://docs.datomic.com/cloud/whatis/architecture.html
Neon is a Postgres fork that moves cold data to s3.
Something like Litestream/LiteFS or AWS RDS you can view the “async replication to s3” as a backup system where you tolerate some loss, but it’s still a DBMS integrating s3 for its known properties.
Yeah that’s a good distinction, but the original article seems to be suggesting something very naive that doesn’t work:
When you say “cold data”, I still read that as “the warm data could be lost under certain circumstances” …
Really what I would look for is (1) explicit claims about durability, and (2) a third party (e.g. Aphyr) actually tested the claims
If there’s no claim, then it’s impossible to test :)