1. 4
  1. 3

    It’s good to see that they’re on-track with releasing these bigger models. I’m grabbing the 744M one now, & I’ll see whether or not there’s a noticeable improvement in coherence.

    I still suspect that some of the threat is overblown (in no small part because getting a human being to write BS is currently cheaper, more effective, and easier to target than using automated text generation), but I’m glad to see that it wasn’t a way of keeping the model actually secret (like with racter and… well, a long history of text generator projects that people claimed were ‘too dangerous for mere mortals’).

    As Gwern Branwen noted, GPT2 output is really only convincing if you’re skimming, & passing for a drunk and/or insane human is still some ways in the future. But, a lot of us skim more often than we ought to, so that’s no great comfort. And, we all know people who would share an ‘article’ with a title they agreed with even if the contents were literally lorem ipsum.

    1. 1

      I guess optimizing for visualization is not the same as optimizing as comprehension…but we’ll get there…!

      Really amazing how they are handling it all.

      1. 2

        I guess optimizing for visualization is not the same as optimizing as comprehension

        I think it actually is (at least, so far as world-model-free or implied-world-model systems go).

        Like word2vec, GPT-2 takes a structuralist view of semiotics: that the meaning of words, rather than stemming from correspondence with things in the world, stems from association with other words as they are used, and these patterns of association are biased by their utility in playing language games, some of which have some relationship to the real world. For instance, it’s possible to convincingly describe a building by studying architectural terms in such a way that even architects are convinced, but when drawing a blueprint based on that description, or when building a model based on that blueprint, inconsistencies arise that make it clear that the description could not describe a real building: the language games in architecture are constrained in practice by space and gravity, which a system playing this game needs to learn to model accurately, but reading the writing of architects is not an effective way to learn physics.

        The alternative is a model-first text generation system – i.e., a system that has an internal model of the world & procedures for translating between that model and natural language. Usually, these systems are not based on neural nets (though there are a handful of exceptions that don’t work very well). Some good examples of model-first systems are SHRDLU (which is basically a physics model), TALESPIN (which models a small universe of fairy-tale characters who have internal state & drives & can communicate with each other), the lore generator in Dwarf Fortress, and most interactive fiction.

        (Grammar-based text generation is somewhere between: it has a model of how text is supposed to look stylistically, but doesn’t generally model how the semantics of that text are supposed to cohere. Advanced structuralist models like GPT-2 do really well at mimicing style – often better than grammars, which necessarily have limited novelty.)

        GPT-2’s attention is longer than most structuralist text-generation systems. When producing a numbered list, it can sometimes count to 15 before losing track of where it was. Fifteen points is more than a human skimming can keep track of, but less than a human paying attention can. Getting better coherence will probably involve storing more information about associations internally for each token, rather than merely stuffing in more data.