Is speech recognition really that close to being perfected? I haven’t used anything like Dragon, but I know Siri and Windows Speech Recognition, two of what should be the highest quality pieces of consumer speech recognition, still struggle with me constantly. I would love to dictate my notes or documentation, but trying to do any formatting, let alone speaking uncommon words, ends up making a mess of things. Even things like “text my wife ‘I just got on I-90’” is tricky, because Siri doesn’t recognize that I’m talking about the road. I would think 3-d printing is much further along, unless we’re calling speech recognition solved well before it can understand what we’re saying.
On their hype cycle graph, Gartner puts speech recognition at the “plateau of productivity” stage, natural-language question answering at the “peak of inflated expectations” stage, and virtual personal assistants at the “innovation trigger” stage. I guess “innovation trigger” refers to a well-received movie that came out recently?
It does seem that, by speech recognition, Gartner only means translating sounds received through a microphone into specific words in a text format.
As far as understanding those words, Siri, for example, won’t parse anything longer than a sentence. Short, iOS specific commands work well enough, but still, like your example shows, Siri lacks enough context to translate flawlessly.
Is speech recognition really that close to being perfected? I haven’t used anything like Dragon, but I know Siri and Windows Speech Recognition, two of what should be the highest quality pieces of consumer speech recognition, still struggle with me constantly. I would love to dictate my notes or documentation, but trying to do any formatting, let alone speaking uncommon words, ends up making a mess of things. Even things like “text my wife ‘I just got on I-90’” is tricky, because Siri doesn’t recognize that I’m talking about the road. I would think 3-d printing is much further along, unless we’re calling speech recognition solved well before it can understand what we’re saying.
[Comment removed by author]
On their hype cycle graph, Gartner puts speech recognition at the “plateau of productivity” stage, natural-language question answering at the “peak of inflated expectations” stage, and virtual personal assistants at the “innovation trigger” stage. I guess “innovation trigger” refers to a well-received movie that came out recently?
It does seem that, by speech recognition, Gartner only means translating sounds received through a microphone into specific words in a text format.
As far as understanding those words, Siri, for example, won’t parse anything longer than a sentence. Short, iOS specific commands work well enough, but still, like your example shows, Siri lacks enough context to translate flawlessly.