1. 21

Tag-based file management seems like a great idea, and would address a lot of the shortcomings of hierarchical directory structures. However, the tools that I have found for tag-based file management on Linux — TMSU, Tagsistant, and TagSpaces — have some severe limitations.

Are there any good options that I have missed? What tool do you use, if any?

Thanks!

    1. 7

      Did you see Supertag? https://amoffat.github.io/supertag/

      1. 1

        Thanks! I think I ran across it, but never got around to giving it a proper look. My first impression is that it’s the slickest and most actively maintained of the contenders.

        One thing that seems like almost a must have for a tag-based file management is for tags to remain attached to files even if when the files are moved or renamed. I didn’t find information in the supertag docs one way or the other; do you happen to know if it supports this? As far as I know TMSU doesn’t.

        1. 3

          As someone who tried various stuff and settled on planning hierarchical structures for storage (but some stuff is handled through generic SQL-queries-as-directories approach), I think it is a good idea to determine how it could work so that you know what you are looking for.

          The issue with renaming is: it is straightforward to track when it is done through your FUSE tool, I guess some kind of hashing/indexing could solve it for immutable files, but if you want movable mutable massive files there has to be something you are willing to compromise.

          One option is to take a performance hit of accessing large files through FUSE, I guess. Not sure how complicated things become with mmap here (which you want for some software).

          Another is to say that the real files are stored in some incomprehensible way on your filesystem, and you only ever work with symlinks to them, and even hierarchical representation is just obtained via special hierarchical tags. There is a risk that some creative pattern of create-and-move for atomic whole-file update will be able to break the illusion, though…

          Yet another might be to say that between indexing runs (possibly inotify-triggered?) each file is either moved or modified but not both, I guess?

          Which one sounds closer to what you want?

          (Oh, and BTW if someone has seen a quasi-DVCS for file moves/updates that keeps track which version of a file is newer than which and what was moved how but only duplicates metadata and relies on at least one side of a sync to have the data we need — it would probably also solve your problem as it could be used as a source of data on moves, and also, I want it too)

          1. 1

            Any half-functional system for keeping symlinks up to date would be better than nothing :) Maybe it would work to have have a file manager plugin (and alias of mv) handle the bulk of moves and renames, and an indexer take care of the rest (tagged files that are moved or renamed by other programs)?

            1. 2

              Hm. BTW, as you might have looked around for tools in this space, do you know any filesystem indexer that you like, that indexes to an SQL database?

              I guess you could have your files on a real FS, and then has a slow-ish and restricted (no mmap etc.) clone-FS that proxies and handles identities during the move.

              (I am somewhat wondering now, if it would be a noticeable effort to actually add all that to the SQL backend support and tagging support that I do have in my virtual FS setup; if there is a good indexer, maybe not)

              1. 1

                Unfortunately I don’t.

                I ran across Watcher (https://github.com/e-dant/watcher) the other day, which might be a useful piece for making one.

        2. 2

          From what I noticed after a small test Supertag uses symbolic links to assign tags, so moving or renaming the file causes the symbolic link to become invalid. I believe that’s not what you want.

    2. 5

      macOS Finder does this using extended attributes(hfs/apfs)

    3. 4

      I think Supertag looks great, and recall trying to set it up on a ZFS backed ZVOL, maybe FUSE-on-EXT-on-ZFS. I don’t remember what stopped me, maybe I really wanted to compile it from source or just put it down one night never came back to it. Might try that again some day!

      A similar tool is found within the git-annex tooling, which I’ll refrain from trying to explain in my own words because their docs are just great:

      https://git-annex.branchable.com/tips/metadata_driven_views/

      Didn’t go through with this one either, because.. I tried to git annex add my entire drive, including old backup directories that had been rsynced over. Git (annex) isn’t made to handle that many files, but it’s mostly on me: there are some config settings that would have eg. avoided unnecessary disk reads, and the real solution is to zip/tar up any folder that I don’t expect to be adding individual files to in the future. Kinda wish there was a FUSE FS that would let me browse or even modify such archives while letting git-annex see just the broad structure (like iso archives and loopbacks), or a modification of git-annex (and presumably git) to that effect.

      1. 3

        Interesting. Motivates me a bit more to someday look into making a git-annex client for gnome a la ‘git annex turtle’ https://github.com/andrewringler/git-annex-turtle

    4. 4

      The thing is, you need arbritrary metadata (text or binary) in your filesystem. I wish this was possible by default. I could right click on a pdf file of book “Elements”, go to Metadata, add a new metadata entry that says this pdf is a copy of euclid’s elements and then all the tags the abstract “Elements” entity had, book, math, etc, get inheritted by the pdf.

      I tried to do something with sqlite and a fuse filesystem as a toy once: https://epilys.github.io/bibliothecula/ Everything interesting is in the front page. And this silly article

    5. 3

      The problem is humans.

      Humans are inconsistent. Given the same file, different humans will come up with different tags. Given the same file with sufficient elapsed time, the same human will come up with different tags.

      If there is a limited list of tags, some of them will get overused and some will be underused. If the list is expandable, lots of tags will be overused and many more will be underused. Synonyms will appear without notice.

      If every file has a very small set of tags, a filesystem will do the same job.

      The solution to tagging is efficient integrated full-text-and-metadata search, where text is loosely defined as “any human-readable words in the represented document”.

      I’m not sure there is an efficient integrated full-text-and-metadata search engine, but it seems plausible.

      1. 3

        The tags limitations you mention are only a problem when you don’t have the right tools to manage the tags, or ‘tag’ them (going into tag meta here ^^).

        I’m personally envisioning a tag system (for my notes, and will probably expand it to file tagging oneday), where I have tools to easily refactor my tags, rename, merge tags together, make more specific tags implicitely add more global tags (kindof inheritence: tagging #lang:rust will implicitely add #tech), mark tags as close/similar (so they both show up in search of one of the tag), etc…

        Tags can be really powerful, unfortunately most tools that allow tags are VERY limited.. :/

        #roll-your-own!

      2. 2

        Consistency between different people is a valid problem. However your own consistency is something that tools can help with a lot. I’m using a document storage system which suggests tags based on the previously used groups. That means i can safely over-tag and then with the next new file, I’ll add one specific tag and get good suggestions for the rest. Just select all that apply.

    6. 2

      Do you have a specific use case? Media files or general solution?

      1. 2

        General solution. I want something to use for everything, with an emphasis on documents I have created.

        1. 3

          Okay, I built a protocol to do just that. It’s called immutag. I haven’t disclosed it yet as it’s a wip. However, your inquiry intrigued me. It’s specifically suited for documents - layers a tagging system on top of content addressable system. There are no file ‘names’, just global content addresses. That way you can find you stuff easily by searching tags that were made or autogenerated. It will also allow you to update a file and it’s still discoverable because the files are globally addressed. However, git or others look for hierarchical files in a directory so those files must exist in a directory. Can work side by side for now - flat and hierarchical where it makes sense. I haven’t worked on immutag for a while as I got busy with other projects but I plan to bring it back and complete it. It’ll need a gui to make it more user friendly.

    7. 2

      Looking for the same app for days but nothing is great as expected. I find that the tag-based approach suits me well after a year of hierarchal structure to manage my second brain. Now I use logseq. it’s great for text.

      But I can’t find any equivalent app for files.

      In the end, I use TMSU with Nautilus extension, but I still migrating my second brain to logseq. So, I still don’t have time to use TMSU.

    8. 2

      Paths are just ordered tags.

      On Windows, I mostly stopped using File Explorer and use this mindset with Voidtools Everything. Similar for the terminal, I only use CD when it autocompletes. I don’t navigate, I search and go.

    9. 1

      I reached that same conclusion for my files, though I didn’t look all that much beyond existing software like Danbooru and Hydrus and started making my own single-SQLite-file-based thing instead that felt more right for my ways of organizing things. It’s not finished and probably won’t ever be for full-on public consumption, but thought I’d chime in with what I use (at a relatively small scale at the moment) regardless: https://github.com/lun-4/awtfdb