I’ve tagged all the entries on my blog for 23 years. I’m a bit loose with the tagging, such that I have about twice the number of tags (~10,400) as posts (~5,400), party due to the “horse/horses” issue, but also due to misspellings and different capitalization, like:
Always Look On The Bright Side Of Life
Always Look On the Bright Side of Life
Always look on the bright side of life
And even with all the tags, I still have issues finding posts. Yesterday I was looking for a particular post where I knew I talked about the keyboard on the Color Computer (and it even included a schematic of the keyboard), yet I couldn’t find it via my tagging system because the tags I used back then (17 years ago) made sense, but did not include the tags I actually used today. That is a hard problem I have.
Also, the tagging system is internal to my blog—I don’t expose tags at all. At first, it was because I didn’t want to implement searches (I wrote my own blogging engine), and it’s still that way because I don’t want to implement searches because of the issues with tagging. Yet I still tag all my posts. Go figure.
A lot of this comes down to ontology drift. To my knowledge, the Cyc project was the first to encounter this. They tried creating a huge corpus of knowledge tagged with metadata for training AI systems. They found that, over time, people would tag the same thing differently, and so they ended up with a lot of inconsistencies as the framework in which the knowledges was being ordered changed slowly over time. I don’t know how they addressed this and I don’t think anyone has ever come up with a good solution. It’s part of the reason that a lot of the modern AI focus has been on avoiding the need to tag the input data set.
About the best I’ve gotten as far as a “solution” to this is tag autocomplete in Obsidian, as well as the tag view and graph view features to find related or orphan tags. Can be pretty tedious as it’s fairly manual though.
My father has always listened to National Public Radio (NPR) news. I was laying on the floor in his office one day when I was about 8, reading a magazine while we listened to the radio.
Then, suddenly, the radio started repeating the same thing it had said an hour before. Word for word. I knew exactly what they’d start talking about next.
I was too scared to tell my father. I thought maybe I was living in a time loop like on Star Trek.
I finally mustered the courage to tell him. Turns out our local NPR station would play the evening news at 4 to fill a time slot and again at 5 to catch the evening commuters.
A couple years back I fell in love with (now mostly defunct) project PSH back at work. The idea was to classify all Czech theses and other gray literature, associate the PSH keywords with Wikipedia articles and then automatically inject links to PSH below Wikipedia articles where a sufficiently large body of freely available gray literature exists.
Apart from some weird politics that eventually killed it, it was just a tree. Not DAG. It was pretty weird the Biochemistry was just under Chemistry, but not also under Biology.
Not sure if it’s a subset or variant of The Instagram problem, but The TikTok problem is also annoying: tags influence the algorithm, so people tag whatever meme or even is going own today in completely, absolutely, unrelated content. I’m talking a cute cat video tagged #RedbullChallenge #TheAvengersInfinityWar.
Perhaps is all just bad actors, and not much you can do if a basic requirement is users tagging their own content.
I was slightly disappointed to discover that this piece was not about tag systems, the abstract machines. I recommend them, too; they’re really fun to think about and make a fun project when learning a new language. https://en.wikipedia.org/wiki/Tag_system
Second, [when using hashtags] there is no place for additional metadata. You can’t do tag aliases or subtags or anything.
Can’t you? I don’t know of any hashtag-style systems that do this, but I feel like once you’ve parsed out the tags themselves, relationally you’d deal with them in the same way as in any system where users are the ones tagging things. It’s just as “out of band” there but it’s possible, and I could even see it working well if your use-case requires balacing the simplicity of a single text box with some level of curation.
I thiiiiink that Hillel is specifically thinking of samizdat tags that arise in systems where
There is no explicit tag infrastructure (so by definition there is no point to parsing out tags)
But there is full-text search
So users hashtag some #keywords, and now if they search for #keyword the will find explicit tags, and not my incidental mention that I had a mustard and keyword sandwich for lunch.
…and hopla: there is tagging, even though it wasn’t built into the system.
At some point Twitter started making hashtags clickable to search them, and then aliased variant/misspelled hashtags to a canonical hashtag, but before then hashtags were a popular phenomenon that piggy-backed on Twitter’s full-text search.
I don’t think so because right before that he says:
Hashtags have two limitations over other tagging systems. First, they have to be parsed as part of the content. This means Twitter cannot have the hashtag #this tag due to the parser interpreting the space as the end of the tag.
The Adobe Lightroom application has a sophisticated system where you can organize tags into categories and have shortcuts that expand into bigger tags. It’s quite useful.
Tag systems for contents retrieval is dying, just like web directories got replaced by search engines. The only use of tags will remain in social networks for short timed sensational usage.
I love that your first reaction to “here’s a smart tagging system” is “what if I encode a paradox in it?”
I’ve tagged all the entries on my blog for 23 years. I’m a bit loose with the tagging, such that I have about twice the number of tags (~10,400) as posts (~5,400), party due to the “horse/horses” issue, but also due to misspellings and different capitalization, like:
And even with all the tags, I still have issues finding posts. Yesterday I was looking for a particular post where I knew I talked about the keyboard on the Color Computer (and it even included a schematic of the keyboard), yet I couldn’t find it via my tagging system because the tags I used back then (17 years ago) made sense, but did not include the tags I actually used today. That is a hard problem I have.
Also, the tagging system is internal to my blog—I don’t expose tags at all. At first, it was because I didn’t want to implement searches (I wrote my own blogging engine), and it’s still that way because I don’t want to implement searches because of the issues with tagging. Yet I still tag all my posts. Go figure.
A lot of this comes down to ontology drift. To my knowledge, the Cyc project was the first to encounter this. They tried creating a huge corpus of knowledge tagged with metadata for training AI systems. They found that, over time, people would tag the same thing differently, and so they ended up with a lot of inconsistencies as the framework in which the knowledges was being ordered changed slowly over time. I don’t know how they addressed this and I don’t think anyone has ever come up with a good solution. It’s part of the reason that a lot of the modern AI focus has been on avoiding the need to tag the input data set.
About the best I’ve gotten as far as a “solution” to this is tag autocomplete in Obsidian, as well as the tag view and graph view features to find related or orphan tags. Can be pretty tedious as it’s fairly manual though.
Did I read this before? I’m almost sure I’ve read this before, but the publication date is today.
I’m not complaining, it’s worth re-reading, but if @hwayne is trying to low-key gaslight me I wanna know =P
I did a tweetstorm back in June and some of that was reused in the newsletter, but there’s at least a thousand new words too! This isn’t the first twitter thread I’ve expanded for the newsletter; I’m doing more now that I’m off twitter
Good, I’m not remembering the future, that would complicate physics XD
My father has always listened to National Public Radio (NPR) news. I was laying on the floor in his office one day when I was about 8, reading a magazine while we listened to the radio.
Then, suddenly, the radio started repeating the same thing it had said an hour before. Word for word. I knew exactly what they’d start talking about next.
I was too scared to tell my father. I thought maybe I was living in a time loop like on Star Trek.
I finally mustered the courage to tell him. Turns out our local NPR station would play the evening news at 4 to fill a time slot and again at 5 to catch the evening commuters.
A couple years back I fell in love with (now mostly defunct) project PSH back at work. The idea was to classify all Czech theses and other gray literature, associate the PSH keywords with Wikipedia articles and then automatically inject links to PSH below Wikipedia articles where a sufficiently large body of freely available gray literature exists.
Apart from some weird politics that eventually killed it, it was just a tree. Not DAG. It was pretty weird the Biochemistry was just under Chemistry, but not also under Biology.
Not sure if it’s a subset or variant of The Instagram problem, but The TikTok problem is also annoying: tags influence the algorithm, so people tag whatever meme or even is going own today in completely, absolutely, unrelated content. I’m talking a cute cat video tagged #RedbullChallenge #TheAvengersInfinityWar.
Perhaps is all just bad actors, and not much you can do if a basic requirement is users tagging their own content.
I was slightly disappointed to discover that this piece was not about tag systems, the abstract machines. I recommend them, too; they’re really fun to think about and make a fun project when learning a new language. https://en.wikipedia.org/wiki/Tag_system
Fun fact, this is a proof vim is Turing complete:
Can’t you? I don’t know of any hashtag-style systems that do this, but I feel like once you’ve parsed out the tags themselves, relationally you’d deal with them in the same way as in any system where users are the ones tagging things. It’s just as “out of band” there but it’s possible, and I could even see it working well if your use-case requires balacing the simplicity of a single text box with some level of curation.
I thiiiiink that Hillel is specifically thinking of samizdat tags that arise in systems where
At some point Twitter started making hashtags clickable to search them, and then aliased variant/misspelled hashtags to a canonical hashtag, but before then hashtags were a popular phenomenon that piggy-backed on Twitter’s full-text search.
I don’t think so because right before that he says:
The Adobe Lightroom application has a sophisticated system where you can organize tags into categories and have shortcuts that expand into bigger tags. It’s quite useful.
Tag systems for contents retrieval is dying, just like web directories got replaced by search engines. The only use of tags will remain in social networks for short timed sensational usage.