1. 55
  1. 12

    If memory serves, BeOS’s file system had tags and very fast search, and one of its core developers dumped all his files in the root directory and depended on those two things entirely.

    1. 17

      Directories and tags were the same thing on BeFS. BeFS had two kinds of metadata:

      • Large metadata (e.g. icons, preview images), which were stored in separate inodes and pointed to by other files’ inodes.
      • Small metadata (e.g. artist, creator, file type), which were embedded in the inodes.

      A directory in BeFS was simply a collection of inodes that all matched some metadata query. When you updated metadata, the filesystem driver had to see if there were any saved searches for that kind of metadata and, potentially, update any relevant ones. This could get quite slow on large filesystems.

      Spotlight, on macOS (designed by the same person), runs the indexing in userspace and provides a firehose model, where the indexer has to consume update events as fast as they’re created. If it fails to keep up then it falls back to having to scan all files that have been modified since the last creation date. This avoids degrading non-search system performance when there are a lot of file updates and allow search to return slightly stale results, degrading gracefully and keeping up when necessary.

      This system uses SQLite for metadata, which is likely to be a bottleneck for performance (and cause consistency problems in the event of crashing, unless the author is very careful about how updates to the SQLite are handled with respect to updates to other files) but it’s great to see people revisiting points in this design space. It would be interesting to hook something like this up to a proper RDBMS such as PostgreSQL and use it for storing the file contents as well as their metadata.

      1. 6

        One of the things we were trying to do with WinFS at Microsoft (which was based on a “proper” RDBMS, SQL Server) is to extend this idea from simple tags to attribute values. For example, treat the year/month/day of the creation date as if they were nested folders, or have a “client” attribute so each of your clients automatically has a “folder”.

        This mirrors the way a lot of people use folders already, but in a more flexible way, as you don’t have to decide a single nesting order for the attributes. Thus I could get to the November report for client XYZ as /reports/XYZ/2021/Nov or /reports/2021/Nov/XYZ depending on my need at the time.

        Or with your music example, you wouldn’t need a special application to show your music library as nested folders of artist/album/track, the folder structure would just be derived from the metadata.

        This is an oversimplification, and of course this would have come with a revamped file explorer that knew all about it rather than playing tricks with the existing one, but that was the basic idea.

        1. 4

          A lot of this was also true on BeOS. There were plugins that would do things like pull ID3 tags into FS metadata, but once there you didn’t need any kind of special viewer. The Tracker (the BeOS file manager) could give you the album or genre displays for movies. The BeOS equivalent of an address book was just a directory containing a load of files that used FS metadata for all of the normal address book fields and so could be edited in the Tracker and searched / filtered by any of this metadata. The same sorts of paths that you describe could be constructed.

        2. 2

          Thank you! My memory was not quite serving.

          1. 1

            There were Live Queries in BeFS that worked like you describe, but did really all directories work that way? Wasn’t there a simpler static POSIX hierarcy as well?

            1. 1

              As I recall, ‘normal’ directories were just collections indexed by the ‘parent directory’ metadata node. The same on-disk structure was used for both.

        3. 5

          This is a fascinating way of thinking about files that I had not considered before. Reading it initially my thoughts was “Why would I ever need this?” However, the more I read, the more I wanted to try it out. Seriously worth a read, and the documentation appears excellent.

          1. 4

            Has anybody used tags with great effectiveness? I’ve never heard of this to be something ultimately super useful… I’m very curious to hear if anyone has used them as their primary organizational tool!

            1. 2

              As my primary organizational tool? Not quite yet. I think I might like to do that some day. As an organizational tool in the toolbox? Sure. For instance, I use file tagging to organize a collection of etext and track readedness status. This requires some discipline on my part, but it’s worth it.

              Right now I’m using tmsu as my tagging tool of choice. One thing that interests me about Supertag is how it treats a logical path as an intersection of tags.

              1. 2

                For a balance: I have looked at tags via FUSE for a loooong time. Typical tagging solutions looked too weak. Used RelFS, found it too limited. Wrote my own, ended up with a lot of weird but personally convenient setups based on indexing stuff into SQL databases, multiple versions of file tagging… guess what, I tried to use each of my tagging things and gave up and just use hierarchical categorisation. More classical-structured SQL-based tools see daily use, including reading Lobste.rs

                «read/unread» tracking though? Sure, I have a column in my SQL table for grabbing web content streams.

                1. 2

                  Not sure if great effectiveness, but I use tags for all scanned correspondence. Things are easier if my recent scan is tagged “bank”, “mortgage”, “rate change”, “(address)”. I’ve never had an issue with too many tags, so I slap anything useful on them.

                  1. 2

                    MacOS let’s you tag files with colours, which may optionally be named. I use that to keep count of whether I’ve watched ⚪️ downloaded films, and whether they are keepers for being good 🔵 or bad 🔴.

                    At an earlier job, we used colours to track the stages of preparation for documents.

                    1. 2

                      I have a thunderbird tag called “reply” and a filter which every ten minutes marks “reply” emails as unread. Works pretty well!

                    2. 4

                      I’ve been blogging for over 20 years, and every blog post has multiple tags associated with them [1]. I have 5,327 posts, and 10,245 unique tags, and I still have problems finding posts via tags. There are times when the tags I thought would be useful at time of writing, aren’t useful when it comes time to search. I should also mention that I sometimes lose files in the Unix hierarchy file system (last time I counted files about half a decade ago, I had over 500,000 files).

                      [1] Internally. I haven’t made them available to the general audience due to laziness, as I wrote my own blogging engine.

                      1. 5

                        Ontology drift is a problem with tagging systems. I first learned about this via the Cyc project, which spent a decade or so tagging large quantities of data to train an AI system, only to discover that they were tagging new items inconsistently with respect to existing entries. Expecting humans to be consistent in how they apply subjective metadata over a long term, in general, doesn’t work.

                        1. 2

                          I think the drift is manageable if you realize upfront that you have a maintenance task ahead in doing so. Is it less work than/sufficiently better than hierarchical organization and its accompanying maintenance… ¯_(ツ)_/¯

                          I spend a few hours every few months grooming tags in my pinboard bookmarks to collapse/migrate tags that need it.

                      2. 3

                        This is something I was looking at developing with FUSE for a long time, glad someone else has gotten around to it.

                        1. 2

                          Cool :)

                          I became enamored with the tagged filesystem idea in maybe 2010 or so? I was sending a lot of poetry submissions out and wanted a metadata/organizational system close to the files that made it easy to tell what was available to send out, avoid re-sending to the same publications, etc. I was using Windows file metadata for this at the time, but it was pretty clumsy.

                          I was ultimately too leery to jump in. It’s hard to remember for sure, but I feel like the Windows options (now? at the time?) entailed access through a client rather than directly through native tools? At root, I was leery of investing a lot of work on tagging that might be mooted if a given toolchain dead-ends.

                          I just have macOS and NixOS these days, so maybe I should give it a fresh look. I have a bunch of stuff in a git-annex that I’m being anxious-avoidant about.

                          1. 2

                            This is a neat project and looks like a useful file system for single topics. Really cool work.

                            I wouldn’t want to use it for my whole disk. If you give up hierarchical organization, you give up progressive disclosure of the file system. My projects folder, for instance, organizes information differently in each project subfolder, and it goes back decades. If I used a tagging-only file system without scope-dependent subfolders, I think I’d have an overwhelming set of tags at different levels of specificity, tag names that don’t make much sense as a set, and usually few files in each tag. In this case, hierarchical folders help each project make sense by itself, and there’s no need for consistency across them.

                            For a long time, HFS+ and APFS have offered global tagging without giving up hierarchical folders as the default. If I had to use one file system for everything, I prefer this arrangement. I use the default “Green” tag to mark active projects and I keep a smart folder of tag search results in a Dock stack. But I don’t try to add more tags with custom names. That way lies madness, is how it feels. I think macOS bought into the “just search for it” mindset many years ago. However, this setup is not easy to use from the command line.

                            Now, if I could mount a filesystem like this as a project folder, that might be about the right scope for tags to be great. It might also make sense to sacrifice hierarchical subfolders within that scope.

                            Can I mount a volume of another file system within some subfolder (tag set) of a Supertag file system? How about nested Supertag; does that result in multiple levels of tag scoping?

                            How do you type on each supported OS?

                            I think there is potential here for something really elegant, depending on whether you want to solve these problems for all users, just power users, just the hacking crowd, etc.

                            1. 2

                              I think you never truly sacrifice hierarchical structure, as you can always just store symlinks to outside files or even directories.

                            2. 2

                              For what it’s worth, rumor has it that Microsoft’s attempt at doing this was one of the main factors in sinking Longhorn.

                              1. 1

                                This is something I’m been meaning to look at for a long time, but have procrastinated the accompanying media-cleanup. I imagine it must be possible to back it with zvol, and (less confidently) perhaps with a BTRFS subvol?

                                1. 3

                                  From the docs it looks like you’re only storing the links on that filesystem. The actual data still lives where you would normally store it. The tags are backed by an sqlite database.

                                  So in practice you could double-fuse and store the files on S3 and tag them through this system. Or triple-fuse and store that database on S3 transparently as well.

                                2. 1

                                  Why does it need Rosetta even when I use Brew to compile from source?