1. 27
  1. 2

    heh, my idea for a better browser history a few years ago was extracting semantic markup (schema.org etc. — attempt to unify common vocabularies) and being able to have a history of not just pages, but e.g. organizations, people, events etc. mentioned on all the pages. Not sure if that would actually be useful though :D

    1. 1

      That would be really cool, like a proper Memex! But also I’d imagine very hard.

      1. 2

        this sort of entity extraction and classification is actually pretty straightforward with more or less modern nltk stacks - I did almost exactly this for work a few years ago, pulling people and companies from news articles.

        you’re going to want to search for:

        • Named Entity Recognition (NER)
        • Stanford NLTK

        check out this random tutorial I found (unaffiliated, just hate being given search targets without contextual examples)

    2. 1

      Extracting data from Google, Instagram and such sounds like a chore, but the browser extension part is interesting.

      1. 1

        This seems pretty cool, maybe it should be but upon something perkeep? It seems like the indexing/storage/integration parts are similar to a lot of the goals.