For those who are not in the know and are otherwise likely to skip over due to a too-quick reading of the title, here’s some additional context (included in the link itself, but summarized here):
This is not about Wikipedia being funded
Abstract Wikipedia is a new project
It’s an ambitious one, with highly technical requirements
These technical requirements are risky, per a report
I didn’t even know about Abstract Wikipedia, and appreciate it’s vision. Not sure if it’s really that useful, though. And the very real technical concerns suggest it’s, at best, ahead of it’s time. At worst, an unfortunate example of the complexity of representing the sum of human knowledge and experience in code.
As somebody who created a bunch of interlingual pages for English WP, I would have loved to have some sort of abstraction for managing them. I seem to recall working on a bunch of pages for individual Japanese kana, for example, and you can see for yourself that those pages have many similar parts which could be factored out using some abstractive system.
Sounds like what they actually want is some “libwiki” collection of Lua snippets, hosted by Wikimedia foundation, with well-specified APIs so that they can be ran sandboxed anywhere. All the abstract wikipedia stuff could then be a DSL on top of that or something.
In my opinion, Abstract Wikipedia is not a replacement for Wikipedia, and is quite interesting on its own terms. IMO the name undersells the technology.
The goal of the Wikimedia Foundation (which runs Wikipedia along with other projects) is to unlock the world’s knowledge, to enable every single human being to freely share in the sum of all knowledge. Wikipedia alone doesn’t do this.
Wikimedia also runs wikibooks, wiktionary, the wiktionary thesaurus, wikidata and wikifunctions, among others.
Wikidata is a database of the declarative facts contained in wikipedia. It’s just facts in a database, it isn’t prose, so you aren’t going to have the experience of well written technical prose when looking at the results of a wikidata query. But it’s a database, you can construct complex queries, you can run joins across wikidata and other similar databases. Applications can perform these queries internally and use the results in all sorts of ways. This is cool.
Wikidata is an instance of the WikiBase database engine. Another example I looked at is ArtBase, which is a database of digital art. You can look here for the description of how you can query ArtBase and how the underlying technology works: https://artbase.rhizome.org/wiki/Query. You can click on sample queries and see the results visualized as graphs, trees, grids, bar charts, etc. The page discusses the possibility of queries that pull information from both Wikidata and ArtBase.
Wikifunctions is another database containing pure mathematical functions. It’s probably another WikiBase instance but I didn’t look closely.
@Corbin already linked to the Douglas Adams wikidata entry: https://www.wikidata.org/wiki/Q42. Understand that this web page is showing the results of a simple database query. Database queries don’t have to be this simple, they can be quite sophisticated, pulling together data from multiple sources and using logical inference to infer facts from existing facts and rules. At present, the output of the query isn’t easy to read.
Abstract Wikipedia will be a way to render the results of such queries as prose in English and in other languages, so that the output is easier to read. This is a general facility. The queries don’t have to correspond to wikipedia pages. They could be complex wikidata queries pulling data from multiple sources; they could be ArtBase queries.
The scope of this project is profound. It goes way beyond delivering existing wikipedia pages translated into new languages. At the same time, the automatically generated prose is going to be stilted and mechanical when compared to prose written by a human being, so in that way it will also be worse than wikipedia. That’s okay though.
This just reads like the author disagrees with a specific technical decision that the project has made. Now, I agree in the sense that Abstract Wiki looks like a very pointless idea (you don’t need a programming language to express precise relations between concepts! Every human language ever made is already that language!), but this sort of article just reads like an attempt to politicize a bikeshed.
I think you’re mixing up Abstract WP with Wikifunctions. By no coincidence, one of the technical recommendations was to focus on Abstract WP and ignore Wikifunctions.
The goal of Abstract WP, with an example, was to take Wikidata pages like this famous example page and use NLG to generate Wikipedia-like text:
Douglas Adams (1952-2001) was an author whose notable works included The Hitchhiker’s Guide to the Galaxy pentalogy, the Dirk Gently series, and The Private Life of Genghis Khan.
This is just a synthetic example; I read the Wikidata page and copied data from a few fields into a template sentence (a snowclone!), and if I used a template sentence from another language, then I would get a basic version of WP in that language. That’s what “Abstract” means in “Abstract Wikipedia”.
Yes, and I think the lesson of GPT-3 is that this is a mistaken endeavour that will not ever be usable. We don’t need a programming language to encode language; that’s what language already is.
The promise of NLP has always been that language can be transformed into an unambiguously identified graph form that can then be operated on in lieu of the messy, confusing, ambiguous raw deal. This research program has, in my opinion, never borne any fruit worth using; GPT’s “just operate on natural language directly” approach has squarely relegated it to the annals of history - or if not, then it certainly should have.
Which will generate better results: translating Abstract Wikipedia to {every language} with {toolkit}, or translating English Wikipedia to every other Wikipedia with few-shot GPT-3? I strongly presume the latter, as GPT can take advantage of natural language’s rich set of subtle context cues that will be stripped out by reduction to an abstract programmatic form.
Far from aiding in translation, Abstract Wikipedia will remove the exact information that makes the task viable for humans.
Which will generate better results: translating Abstract Wikipedia to {every language} with {toolkit}, or translating English Wikipedia to every other Wikipedia with few-shot GPT-3?
Define “better”. I’m sure it’ll be easier (GPT-3 already exists!), but I’m not sure we’ll ever be able to be certain of the accuracy of its translations without having a second human translator confirm each sentence of its output. The slew of recent articles and videos showing how GPTchat gets very simple things wrong and straight up invents illogical facts demonstrates this in my opinion.
For those who are not in the know and are otherwise likely to skip over due to a too-quick reading of the title, here’s some additional context (included in the link itself, but summarized here):
I didn’t even know about Abstract Wikipedia, and appreciate it’s vision. Not sure if it’s really that useful, though. And the very real technical concerns suggest it’s, at best, ahead of it’s time. At worst, an unfortunate example of the complexity of representing the sum of human knowledge and experience in code.
As somebody who created a bunch of interlingual pages for English WP, I would have loved to have some sort of abstraction for managing them. I seem to recall working on a bunch of pages for individual Japanese kana, for example, and you can see for yourself that those pages have many similar parts which could be factored out using some abstractive system.
Sounds like what they actually want is some “libwiki” collection of Lua snippets, hosted by Wikimedia foundation, with well-specified APIs so that they can be ran sandboxed anywhere. All the abstract wikipedia stuff could then be a DSL on top of that or something.
In my opinion, Abstract Wikipedia is not a replacement for Wikipedia, and is quite interesting on its own terms. IMO the name undersells the technology.
The goal of the Wikimedia Foundation (which runs Wikipedia along with other projects) is to unlock the world’s knowledge, to enable every single human being to freely share in the sum of all knowledge. Wikipedia alone doesn’t do this.
Wikimedia also runs wikibooks, wiktionary, the wiktionary thesaurus, wikidata and wikifunctions, among others.
Wikidata is a database of the declarative facts contained in wikipedia. It’s just facts in a database, it isn’t prose, so you aren’t going to have the experience of well written technical prose when looking at the results of a wikidata query. But it’s a database, you can construct complex queries, you can run joins across wikidata and other similar databases. Applications can perform these queries internally and use the results in all sorts of ways. This is cool.
Wikidata is an instance of the WikiBase database engine. Another example I looked at is ArtBase, which is a database of digital art. You can look here for the description of how you can query ArtBase and how the underlying technology works: https://artbase.rhizome.org/wiki/Query. You can click on sample queries and see the results visualized as graphs, trees, grids, bar charts, etc. The page discusses the possibility of queries that pull information from both Wikidata and ArtBase.
Wikifunctions is another database containing pure mathematical functions. It’s probably another WikiBase instance but I didn’t look closely.
@Corbin already linked to the Douglas Adams wikidata entry: https://www.wikidata.org/wiki/Q42. Understand that this web page is showing the results of a simple database query. Database queries don’t have to be this simple, they can be quite sophisticated, pulling together data from multiple sources and using logical inference to infer facts from existing facts and rules. At present, the output of the query isn’t easy to read.
Abstract Wikipedia will be a way to render the results of such queries as prose in English and in other languages, so that the output is easier to read. This is a general facility. The queries don’t have to correspond to wikipedia pages. They could be complex wikidata queries pulling data from multiple sources; they could be ArtBase queries.
The scope of this project is profound. It goes way beyond delivering existing wikipedia pages translated into new languages. At the same time, the automatically generated prose is going to be stilted and mechanical when compared to prose written by a human being, so in that way it will also be worse than wikipedia. That’s okay though.
This just reads like the author disagrees with a specific technical decision that the project has made. Now, I agree in the sense that Abstract Wiki looks like a very pointless idea (you don’t need a programming language to express precise relations between concepts! Every human language ever made is already that language!), but this sort of article just reads like an attempt to politicize a bikeshed.
I think you’re mixing up Abstract WP with Wikifunctions. By no coincidence, one of the technical recommendations was to focus on Abstract WP and ignore Wikifunctions.
The goal of Abstract WP, with an example, was to take Wikidata pages like this famous example page and use NLG to generate Wikipedia-like text:
This is just a synthetic example; I read the Wikidata page and copied data from a few fields into a template sentence (a snowclone!), and if I used a template sentence from another language, then I would get a basic version of WP in that language. That’s what “Abstract” means in “Abstract Wikipedia”.
Yes, and I think the lesson of GPT-3 is that this is a mistaken endeavour that will not ever be usable. We don’t need a programming language to encode language; that’s what language already is.
The promise of NLP has always been that language can be transformed into an unambiguously identified graph form that can then be operated on in lieu of the messy, confusing, ambiguous raw deal. This research program has, in my opinion, never borne any fruit worth using; GPT’s “just operate on natural language directly” approach has squarely relegated it to the annals of history - or if not, then it certainly should have.
Which will generate better results: translating Abstract Wikipedia to {every language} with {toolkit}, or translating English Wikipedia to every other Wikipedia with few-shot GPT-3? I strongly presume the latter, as GPT can take advantage of natural language’s rich set of subtle context cues that will be stripped out by reduction to an abstract programmatic form.
Far from aiding in translation, Abstract Wikipedia will remove the exact information that makes the task viable for humans.
Define “better”. I’m sure it’ll be easier (GPT-3 already exists!), but I’m not sure we’ll ever be able to be certain of the accuracy of its translations without having a second human translator confirm each sentence of its output. The slew of recent articles and videos showing how GPTchat gets very simple things wrong and straight up invents illogical facts demonstrates this in my opinion.
I suspect this will run into the same context problems as strong AI.