heh, my idea for a better browser history a few years ago was extracting semantic markup (schema.org etc. — attempt to unify common vocabularies) and being able to have a history of not just pages, but e.g. organizations, people, events etc. mentioned on all the pages. Not sure if that would actually be useful though :D
this sort of entity extraction and classification is actually pretty straightforward with more or less modern nltk stacks - I did almost exactly this for work a few years ago, pulling people and companies from news articles.
you’re going to want to search for:
Named Entity Recognition (NER)
Stanford NLTK
check out this random tutorial I found (unaffiliated, just hate being given search targets without contextual examples)
This seems pretty cool, maybe it should be but upon something perkeep? It seems like the indexing/storage/integration parts are similar to a lot of the goals.
heh, my idea for a better browser history a few years ago was extracting semantic markup (schema.org etc. — attempt to unify common vocabularies) and being able to have a history of not just pages, but e.g. organizations, people, events etc. mentioned on all the pages. Not sure if that would actually be useful though :D
That would be really cool, like a proper Memex! But also I’d imagine very hard.
this sort of entity extraction and classification is actually pretty straightforward with more or less modern nltk stacks - I did almost exactly this for work a few years ago, pulling people and companies from news articles.
you’re going to want to search for:
check out this random tutorial I found (unaffiliated, just hate being given search targets without contextual examples)
Extracting data from Google, Instagram and such sounds like a chore, but the browser extension part is interesting.
This seems pretty cool, maybe it should be but upon something perkeep? It seems like the indexing/storage/integration parts are similar to a lot of the goals.