See also Voice Driven Development.
A good accompaniment to this is a recent presentation at Linux.conf.au this year by Shervin Emami.
Desktop Linux, without a keyboard, mouse or desk
Interesting talk, but the presenter used “um” as a period…
Very solid talk, and definitely a great accompaniment to this article. Thanks for sharing!
I was really excited when I started the article (being this a thought that never occurs to me, it’s interesting to see what options there are out there) but when I started seeing what kind of changes to a standard dictation were necessary to perform even some simple typing (why do I need to use ‘parent’ or similar and not switch to a “command-input-mode” and simply dictate ‘dot-dot-slash’ like one does in their head? Why do I need to say “snake” before a section instead of just ‘underscore’?) it seemed more of a faff than I’m ready for.
Also lots of the settings and commands do seem to require a surprising amount of typing (configs and commands) for tools meant for people who cannot type.
This is probably an article I should write myself at some point, but a lot of the custom words are for speed and accuracy.
E.g. Many letters of the Latin alphabet, at least in English, have extremely similar sub-syllable sounds. B, C, D, E, G, P, T, V, and Z all have extremely similar sounds. (All one-syllable words ending in “ee”), and speech detection would likely lose accuracy when using those directly. One voice-coding program, Talon Voice, has a pre-defined (but customizable) alphabet. Each alphabet has a reasonably distinct sound as far as speech recognition goes, and is (or at least can be pronounced with) one syllable.
As far as speed goes, W is an interesting letter, since it’s the only letter that in English, has three syllables. Imagine trying to type out “www.wikipedia.org” simply with letters. “Double you double you double you dot double you” takes a lot longer to say than, hypothetically “wag wag wag dot wag”, just for the first five characters. Same with “underscore” vs “snake”. Three syllables, vs one. For some words, there’s no easy spoken representation, and it depends on context. Homophones like “their”, “they’re”, and “there” are tricky to simply speak aloud.
As for configuration and commands, all voice coding libraries that I’ve looked into in the past couple days since I found this article have an out-of-the-box setup that can at least get you started using the commands to customize things to your liking.
Voice coding is something that typically trades a steeper learning curve for greater long-term efficiency.