1. 17
  1.  

  2. 21

    A long time ago I worked on Transactional NTFS, which was an attempt to allow user controlled transactions in the filesystem. This was a huge undertaking - it took around 8 years and shipped in Vista. The ultimate goal was to unify transactions between the file system and SQL server, so you could have a single transaction that spans structured and unstructured data. You can see the vestiges of that effort on docs.microsoft.com, although if you click that link, you’ll be greeted with a big warning suggesting that you shouldn’t use the feature.

    One of the use cases being mentioned early in development was atomic updates to websites. In hindsight, I’m embarrassed to not reflexively call “BS” then and there. Even if we could have had perfectly transactional updates to a web server, there’s no atomicity with web clients who still have pages in their browser with links that are expected to work, or are even actively downloading HTML which will tell them to access a resource in future. If the client’s link still works, it implies a different type of thinking, where resources are available long after there are no server side links to them, which is why clouds provide content addressable blob storage which is used as the underpinnings for web resources. Stale resources are effectively garbage collected in a very non-transactional way. Once you have GC deleting stale objects, you also don’t need atomic commit of new objects either.

    The majority of uses we hoped to achieve didn’t really pan out. There’s one main usage that’s still there, which is updates: transactions allow the system to stage a new version of all of your system binaries while the system is running from the old binaries. All of the new changes are hidden from applications. Then, with a bit of pixie dust and a reboot, your system is running the new binaries and the old ones are gone. There’s no chance for files being in use because nothing can discover the new files being laid down until commit. I really thought I was the last person alive still trying to make this work when writing filter drivers in 2015 that understand and re-implement the transactional state machine so the filter can operate on system binaries and the system can still update itself.

    Somebody - much older and more experienced in file systems - remarked when we were finishing TxF that file system and database hybrids emerge every few years because there’s a clear superficial appeal to them, but they don’t last long. At least in our case, he was right, and I got to delete lots of code when putting together the ReFS front end.

    1. 2

      This was a super interesting read, thanks for sharing it!

      Even if we could have had perfectly transactional updates to a web server, there’s no atomicity with web clients

      This seems to become more of an issue when clients run code. When there is no client side code it seems to be a non-issue to me. (Say, all assets can be pushed via HTTP/2 to make sure the version is right.)

      If there is client-side code, one could force a re-load when the server-side codebase has changed.

      That aside, I’m not talking about transactions for application change, I’m talking about transactions for user data changes. That is currently unsolved, unless one stores all user uploaded images in the DB.

      remarked when we were finishing TxF that file system and database hybrids emerge every few years because there’s a clear superficial appeal to them, but they don’t last long

      Haha, interesting! I guess only time can tell. :)

      1. 1

        This seems to become more of an issue when clients run code.

        That’s half of Fielding’s thesis on REST right there ;)

        It’s a bit unfortunate that the need (and I agree it is a need) for encryption/confidentiality/privacy led to the current state of http2/tls - where a lot of the caching disappeared, leaving only client cache and server/provider cache (no more mtm lan caches)-which makes REST less interesting. Even for application/websites were the architecture is a great fit.

        Recommended reading (still)ffor those that havennot read it (just remember modern web apps/spas are not REST - they’re more like applets or word files with macros.

        https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

      2. 1

        What do you see as a problem when implementing transactions on a file system? It seems doable (as ZFS’s snapshot has shown, partially), but why it is not more prominent? Is there any trade offs made that bitter?

        1. 4

          I don’t think the problem was implementation. It was more a case of a solution looking for a problem.

          In hindsight, among other things, a file system is an API that has a large body of existing software written to it. Being able to group a whole pile of changes together and commit them atomically is cute, but it doesn’t help if the program reading the changes is reading changes from before and after the commit (torn state.) Although Transactional NTFS could support transactional readers (at the file level), in hindsight it probably needed volume wide transactional read isolation, but even if it did, that implies that both the writer and reader are modified to use a new API and semantics. But if you can modify both the producer and consumer code, there’s not much point trying to impersonate a file system API - there are a lot more options.

          1. 1

            I’d say the big issue is that classic OSs are not transactional, so adding a transactional FS to it just doesn’t make sense. What if a process starts a transaction, keeps it open for months, then crashes?

            To make a transactional FS you also need to build a transactional OS around it.

        2. 4

          Author here. Shalabh recommended the post may be interesting for this community and invited me here. Thanks Shalabh! :)

          The goal of Boomla is to create a radically simpler & more powerful application platform. Web development is the “killer use case” to build a useful product and then grow from there. This post explains one of the unusual solutions of the platform. I’d be really curious to hear what you think.

          1. 1

            Hi there!

            What are your thoughts on databases which offer large object support, or filesystems with support for transactions (like mentioned in a sibling comment here)?

            1. 1

              Hi! Note that the context of the article is web development. The problem with storing large objects in a DB is in the performance department. To serve a static image stored in the DB, one has to load the image in memory, serialize it and send over a TCP connection, deserialize it and keep it in memory while serving to the visitor, even if the client is on an extremely slow connection. So the issue is not really with databases themselves but the way the application server would connect to the DB and serve the image.

              As for filesystems with transactions, that’s not enough in itself, one would need the ability to store structured data as well. Do you know any?

              1. 1

                Directories and files? /database/tables/users/1/{name,phonenumber}?

                Ed: i guess it depends a lot on what is meant by structured. Some files can be merged/joined, and results and views can be made with symlinks - but I don’t think just the filesystem works well as a relational store.

                1. 1

                  Directories and files? /database/tables/users/1/{name,phonenumber}?

                  Can’t follow you. Can you elaborate?

                  By storing structured data, I mean data that can be directly accessed without deserializing if first. So, a JSON string would not count as it is not stored in a structured way. On the other hand, DB rows typically store structured data. Each column of each row contains one value.

                  but I don’t think just the filesystem works well as a relational store.

                  The crucial point here is that there is no the filesystem. Each filesystem has its own API and capabilities. Forget for a moment that it’s called a “file system” and think about an object system. Obviously, saying that an object system could not work well for storing relational data would be weird.

                  The hardest part of “getting Boomla” is always that people think they know what a file is and how it behaves. One has to unlearn it first to get it. At best a file is just an API. And you can design a completely new crazy API if that brings huge benefits. Like, Boomla files can store other files, as in, image.jpg/other-image.jpg. Boomla a few of these weirdo things. Like, every file is also like a DB row at the same time. The fields are just called file attributes. And because it is “also a DB”, it sure can do the same things a DB can do.

                  1. 2

                    For that definition of structured data, the filesystem works sort of OK: in my example above, you have atoms in files:

                    Insert a new user:

                    mkdir users/1
                    echo "John Connor" > users/1/name
                    echo "555-CHEESE" > users/1/phone_number
                    

                    Now, if you know you want to lookup user 1’s name, you can read it directly. You could store binary data in the files too.

                    This is somewhat similar to how maildirs work (but they are more of a “document store” storing whole emails.

                    Im not sure I understand your point re: blobs in a db - if it’s an image, you can just stream it from the db to the client - like you stream it from the filesystem to the client?

                    1. 1

                      Yeah this sort of works but I’d say there are a couple problems with this.

                      • The performance may be heavily impacted by the increased number of files. I’d say this results in an average of 10x more of them.
                      • You can store strings without serialization but not other data types. You still end up having to serialize/deserialize numbers, for example.
                      • If each file has lots of system metadata (creation time, modification time, created-by, etc.) you will need way more storage that way. It’s like adding, say, 50-100 bytes of data to every field in every row of a DB table. That will easily 10x your storage requirements for the kind of data that normally goes in a DB.

                      You could stream the image, from the DB but it would still need to go through your application logic if that’s where access control is implemented. Then this would mean that way more DB requests are required. That said, yes, this may be an improvement over loading the image in one go and keeping it in memory. Can’t tell, would require a benchmark.

                      If that works out well enough, I still had other requirements that would have still forced me to go the filesystem route, I just didn’t want to make the blog post even longer. As this was a CMS originally that turned to into an OS, the original use case was building websites in a sane way. Users constantly mess things up and needed undo/redo to save the day. That means a copy-on-write filesystem was required with the ability to quickly restore old snapshots. In fact, the first versions of the platform were prototyped on top of MySQL. It worked well enough for a while but this undo/redo was completely impossible.

                      Another problem was speed. If the DB is a separate software, communicating with it has higher latency than an in-process filesystem. Boomla makes heavy use of file access and this proved to be a problem as the system was growing.

                      That said, your streaming solution would be an interesting approach to moving all data to the DB in a classic setup.

                      1. 1

                        Note, I’m not really commenting in the context of the article, as much as what is generally possible.

                        You can store strings without serialization but not other data types.

                        Not true. You would need to store your schema somewhere/somehow - but nothing stops you from storing binary data in a file - if anything you’d need less serialization than with a typical database driver.

                        As for rollback/undo, I did consider a cms at one point with data in xml files on a nilfs2 filesystem.

                        Obviously such a “document database” via the file system would need to do quite a lot of serialization - but if the data fit the application - you might get away with a small number of files or request.

                        1. 1

                          nothing stops you from storing binary data in a file - if anything you’d need less serialization than with a typical database driver

                          Emh, yeah, you could say that and that would be “legally correct”. Yes, that’s less serialization. Yet from a programming perspective, in the end, that’s still a serialization layer and the rest is just implementation detail. You could also say that you do a memory dump for any value / object and that way you are not serializing. In a way, at least legally, that may be right too, yet from a different point of view, you are using the built-in serialization of the language that may not be cross platform, may not be documented and may even change over time. When using a different language, you would need to reimplement the same de/serialization, so even that would count as a serialization layer in my world.

                          I’m not really commenting in the context of the article, as much as what is generally possible.

                          Got it. I agree, that would work for certain use cases. Existing filesystems would probably make it slow but if you give yourself a blank slate, I’m sure the underlying idea could be made work reasonably well if you built this from the ground up yourself, that would need a purpose built FS though. But then I’m also sure you would end up optimizing parts of it as you learn more about the performance characteristics of what you have built and how it is used in the real world.

                          1. 1

                            you are using the built-in serialization of the language that may not be cross platform, may not be documented and may even change over time.

                            Maybe. The point is that the database driver certainly does serialization, and might even change byte order to match network standard- you can potentially do less serialization with files.

                            Or you could use a low-overhead format like captnproto to read/write.

                            1. 1

                              true

                      2. 1

                        Relationships:

                        mkdir users/2
                        echo “Sarah Connor” > users/2/name
                         ln - s users/2 users/1/mother
                        
            2. 3

              The premise of this article seems to be “filesystems vs. database is not the right way to frame technologies because the requirements they solve for do not conflict”. That is, providing support for database-like operations (e.g. transactions) does not intrinsically preclude providing support for filesystem-like operations (e.g. storing large objects).

              I’d argue that this premise is incorrect, because it considers requirements only from the perspective of capabilities when in reality there are also requirements from the perspectives of performance and cost. Databases and filesystems have dramatically different characteristics in terms of how much it costs to store some amount of data (remember, databases need to build indexes) and how quickly I can query and search for data. The reason I don’t mind working with two sets of technologies (at least, for now) is because I have intrinsically different requirements for the kind of data I put in a database vs. the kind of data I put in a filesystem, and it would be cost-prohibitive for me to use a database/filesystem hybrid abstraction.

              I don’t find “the correct framing is controlled vs. uncontrolled changes” to be a compelling argument for using a hybrid system - what I would find to be extraordinarily compelling for a hybrid system is “here is an explanation of how we managed to build database-like capabilities at filesystem-like cost”.

              1. 3

                I’d argue that this premise is incorrect, because it considers requirements only from the perspective of capabilities when in reality there are also requirements from the perspectives of performance and cost

                May be if the title (and therefore the implied premise) of the article was replaced into:

                “Data lifecycle, the application way” or something like that It would reflect the intended capabilities better.

                There is certainly a need to have technologies that can reflect ‘Life-cycle’ of an application object, and not just the structure of that object.

                The document oriented databases, tried to (perhaps not always successfully) reflect a ‘structure’ of an application object, but not its life-cycle. Instead the application developers have to write code to accommodate the life-cycle.

                In my reading, boomla recognizes that life-cycle gap, and seems to want to go beyond what we have today.

                We look to day at ‘append only’ and ‘authenticated databases’ approaches are means to ‘simplify’ the life-cycle gap I noted above, but those are simplifications, are more of ‘atomic’ building blocks of something bigger, in my view.

                When I architecture a system, that deals with ‘critical data’, I want to think of the data lifecycle as a whole from the time it created, it is read, it is transacted with, it is archived, it is backed up, it is restored, it is reviewed for compliance, it is referenced (this is a hard problem, that probably was not solved by MS’s CreateFileTransactedA that @malxau mentioned)

                Today, I have to ‘custom design’ an ecosystem (assuming a large enterprise) around the above. But there is more that can be done in that space from the basic technologies prospective.

                I agree with @liftM that may be trying to project the idea into known ‘light’ formalism (like databases or filesystem) may not carry the message, the intent in the best way.

                1. 1

                  That’s an interesting way of looking at it. I have no clue about that space. Could you rephrase the gap you see?

                  One aspect I understand is audit-ability. Boomla can store every single change and currently does so. But I hardly see this being unique to Boomla, every copy-on-write filesystem does that. Backups, archives, and restoration all work but again I don’t see the uniqueness here.

                  I don’t quite follow what you mean with “it is transacted with”, “it is reviewed for compliance”, “it is referenced”. Maybe by transacting with, you mean the file is accessed? I see how that could be used for audition purposes, but again, that would not be unique.

                  I’d love to understand this.

                  I actually see the biggest value of Boomla in the entire integrated platform. It simplifies writing applications and eliminates problem spaces. Looking at any one of the eliminated problems, one can say, nah, that’s not a big issue. Yet as they add up, one ends up having a platform that’s 10x simpler to work with. That’s what I’m really after.

                  Thanks for your comment!

                2. 1

                  If I understand correctly, the argument is that databases have in-memory indexes which require lots of memory while filesystems don’t do that and as a result need much less memory.

                  I don’t see why filesystems couldn’t index structured file attributes the same way databases index the data. Boomla doesn’t currently do that thus its memory footprint is super low. In the future, apps will define their own schema similar to database tables and define any indexes. At that point one will have to ability to create in-memory indexes.

                  Again, I do not see this as conflict. Did I miss something?

                  1. 1

                    The conflict is that I don’t need indexing on structured file attributes and I don’t want to pay for that extra capability. Cost is just as much of a requirement for me as capability.

                    I think that filesystems and databases genuinely are intrinsically different abstractions (as opposed to the article’s premise that filesystems and databases are not intrinsically different, and that the real differentiation for data storage abstractions is along the “controlled vs. uncontrolled changes” axis) because they have different cost profiles, optimized for different workloads.

                    databases have in-memory indexes which require lots of memory while filesystems don’t do that and as a result need much less memory

                    [nit] Database indexes also live on disk - indexing on a field necessarily requires more space (and therefore costs more) than only storing the field.

                    1. 1

                      I think we are both right in different contexts.

                      If your main concern is cost, sure, that’s an optimization. In that case you should do whatever you can to reduce storage requirements.

                      The context I was exploring is a general purpose storage system for web development. In this context, storage space is not the key thing to optimize for, it is the developer’s brain capacity and working efficiency.