I found this article of poor quality.
Why does the Stateless Approach section not bring up the CAP theorem? It has the same issues as the distributed file system.
Why doesn’t the author suggest putting your data in a database? Using something like an RDBMS gets you off your single server at the cost of availability.
Why doesn’t the author talk about how to get around the CAP theorem. For example, if you craft your data intelligently you can be eventually consistent.
This is just weak sauce advice.
Ah, I see the source of your confusion.
This article doesn’t mean to talk about traditional database state at all. That’s a completely separate topic. All we’re talking about here is file system state. Which is why in the Stateless Approach section, there’s no need to discuss CAP, because nothing is being distributed. The file system is either read only, or completely temporary.
Similarly, the An Alternative approach hopes to address the idea that you can get around the CAP theorem for file systems. The summary is: you can’t, if your app is expecting POSIX semantics.
Perhaps I’m misunderstanding this sentence:
It might be as simple as switching to uploading files to S3 instead of the file system, though this will also mean converting your code to use an S3 library for manipulating files
This implies to me that when writing to a file you push to S3 rather than the local filesystem.
Regardless, I think this post leaves a lot to be desired and is more-or-less a fluff piece.
Your understanding of that sentence is correct. What I’m suggesting is that for any file you need to persist, you write it directly to a service like S3. For anything temporary, write it to the filesystem as normal.
This post is the fourth in a series:
The series as a whole is attempting to address the question:
“I have a legacy off-the-shelf app like WordPress that expects to be able to write files to the local filesystem. How do I deploy this app in the cloud while taking advantage of everything the cloud has to offer.”
A lot of people approach this problem by deploying WordPress (or whatever) onto a single server, and just scaling that up. But this is no different from the traditional VPS model of hosting! You may as well not use a PaaS. You’re certainly not getting your money’s worth if you’re doing this. (And you’re going to get a nasty shock when AWS “degrades” your single instance…)
But if you want to distribute something like WordPress across multiple instances, then some changes need to be made. And this series attempts to explain what those changes are, and why they are necessary.
Your understanding of that sentence is correct. What I’m suggesting is that for any file you need to persist, you write it directly to a service like S3.
Which puts you right back into CAP land.
Ah, no. Well, it’s different. Off-the-shelf apps like WordPress assume a POSIX interface for writing out files. That’s where the problem lies. You can’t distribute that without changing your assumptions. So if you change your code to write any files that must be written to something like S3, you are changing those assumptions.
There’s no need to mitigate it any more, because you’re embracing it.
Is S3 as flexible as local files? Can you open an S3 file in append mode?
An issue I’ve encountered when building systems on top of networked file stores is that operations like truncate, append, and inotify are not available. This stymies developers who don’t know the design patterns for working on data systems without those features.
The article states:
Consistency, availability, and partition tolerance. Pick two.
It’s been pointed out that this formulation of the CAP theorem doesn’t make sense. You can’t pick CA.
Oh. I saw this a while back, but didn’t get a chance to read it. Thanks for reminding me!
I think that the author left out an important class of sharable POSIX filesystems: Lustre, Gluster, NFS, MapR, and the like.
These filesystems accept varying degrees of write load, but all scale to hundreds, or even thousands, of machines, if operated properly (and not used in a general purpose way). Applying these correctly, a business can avoid a costly refactor of a filesystem-based app, but still scale out effectively.