A markov model is stateless, whereas RNNs can (in principle) learn to percolate information over arbitrary distances. This difference is seen in other tasks, such as source code generation, where RNNs produce syntactically (nearly) correct code with properly closed braces, comments, etc. Yoav Goldberg has made a nice markov model comparison to Kaparthy’s famous blog post, which clearly shows that markov models fail in such cases:
Percolating longer-distance information is also necessary in freer word order languages. There are many examples, but consider, for example, separable verb particles in Dutch.
`De spanning [loopt] na de uitschakeling van Duitsland op het WK in Rusland behoorlijk [op] deze week.’
The excitement [increases_verb] after the elimination of Germany at the WorldCup in Russia quite [increases_particle] this week.
Here, the particle op of the verb oplopen is separated from the finite verb due to v2 movement. Such cases are notoriously hard to handle with simpler models, due to the long distance dependency between the verb parts and, in this case, the possible misinterpretation of [op deze week] as a prepositional phrase. RNNs are generally more capable in modeling such cases (especially with bidirectional RNNs).
That seems like a category mistake the me, whereas the title of the article doesn’t.
A Markov chain may be a specific pattern of vector multiplications, but that pattern makes all the difference. Markov chains and vector multiplications are on a different level. On the other hand ‘deep learning’ and ‘Markov chain’ are terms for alternative patterns of vector multiplications, one a lot more involved than the other.
There’s a video on YT somewhere of a talk by a physicist (IIRC) on why deep learning is so ridiculously effective - it pretty much boils down to the same reason that mathematics is so unreasonably effective in describing physical systems in general, i.e. (handwaving extremely wildly from memory) that physical systems tend to be simple functions of their inputs (albeit with many, many inputs!) where causality is preserved. This is what makes it possible for RNNs and the like to approximate physical systems in various ways, because the nature of said physical systems is exactly what permits approximations of the information content of the system to be at least partially valid instead of being a total loss.
(I tried to find the video, but there are too many terrible ones on the same topic these days. I’ll have another look later.)
I wasn’t able to find an answer to which samples were deep learning generated, and which were Markov chain generated. It did seem that two of the samples were “better” in the sense of being more grammatical than the other two, and it would be nice to know if my impressions correspond to the different methods or not.
In general on other sites, though, I have noticed that deep learning results are not more realistic than what you’d come up with from M-x dissociated-press.
Betteridge’s law of headlines: no.
A markov model is stateless, whereas RNNs can (in principle) learn to percolate information over arbitrary distances. This difference is seen in other tasks, such as source code generation, where RNNs produce syntactically (nearly) correct code with properly closed braces, comments, etc. Yoav Goldberg has made a nice markov model comparison to Kaparthy’s famous blog post, which clearly shows that markov models fail in such cases:
http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139
Percolating longer-distance information is also necessary in freer word order languages. There are many examples, but consider, for example, separable verb particles in Dutch.
`De spanning [loopt] na de uitschakeling van Duitsland op het WK in Rusland behoorlijk [op] deze week.’
The excitement [increases_verb] after the elimination of Germany at the WorldCup in Russia quite [increases_particle] this week.
Here, the particle op of the verb oplopen is separated from the finite verb due to v2 movement. Such cases are notoriously hard to handle with simpler models, due to the long distance dependency between the verb parts and, in this case, the possible misinterpretation of [op deze week] as a prepositional phrase. RNNs are generally more capable in modeling such cases (especially with bidirectional RNNs).
No, it’s vector multiplication in disguise as a markov chain.
That seems like a category mistake the me, whereas the title of the article doesn’t.
A Markov chain may be a specific pattern of vector multiplications, but that pattern makes all the difference. Markov chains and vector multiplications are on a different level. On the other hand ‘deep learning’ and ‘Markov chain’ are terms for alternative patterns of vector multiplications, one a lot more involved than the other.
There’s a video on YT somewhere of a talk by a physicist (IIRC) on why deep learning is so ridiculously effective - it pretty much boils down to the same reason that mathematics is so unreasonably effective in describing physical systems in general, i.e. (handwaving extremely wildly from memory) that physical systems tend to be simple functions of their inputs (albeit with many, many inputs!) where causality is preserved. This is what makes it possible for RNNs and the like to approximate physical systems in various ways, because the nature of said physical systems is exactly what permits approximations of the information content of the system to be at least partially valid instead of being a total loss.
(I tried to find the video, but there are too many terrible ones on the same topic these days. I’ll have another look later.)
No, it’s a monoid in the category of endofunctors.
I wasn’t able to find an answer to which samples were deep learning generated, and which were Markov chain generated. It did seem that two of the samples were “better” in the sense of being more grammatical than the other two, and it would be nice to know if my impressions correspond to the different methods or not.
In general on other sites, though, I have noticed that deep learning results are not more realistic than what you’d come up with from M-x dissociated-press.