The holy grail is continuous speech recognition (CSR) - basically, you talk and the computer types. But PlainTalk's genius lies in its compromises: it doesn't need training for a particular user, but it does only discrete speech recognition, matching a short phrase to a finite list of predefined possibilities. In the early 1990s, Apple created its own system-level speech recognition component, PlainTalk. Computers are now about a thousand times faster and a thousand times larger in resources, and a thousand times smaller in size and cost, than in those early days, so they have at last begun to meet speech recognition's mathematical demands.
But the really important development has been in hardware.
VIAVOICE ENGLISH SOFTWARE
To be sure, modern HMM is vastly more sophisticated than in those days and one should not underestimate the importance of software optimization, a direction pioneered, again, by James Baker, who went on to found Dragon Systems. Second, it's amazing that we've been doing speech recognition the same way for so long. First, HMM is fundamentally not only crude but almost certainly wrong - however our ears and brains hear and analyze speech, HMM is surely not it. The results proved so superior in that first ARPA funding round that all modern speech recognition uses HMM - a fact which is astounding for two reasons. The trick here lies in the notion "looks like." James Baker, then a graduate student at Carnegie-Mellon University, applied to speech recognition pattern-matching a probabilistic mathematical device called a "hidden Markov model" (HMM). First, characterize the raw sound by a minimal set of numbers then, match those numbers against a template - e.g., this sound is a "p" because numerically it looks like a prerecorded "p". What the ARPA-funded research demonstrated, though, was that you could make more significant practical progress by doing something much more crude. How can the computer work out whether a vowel is "ah" or "ee", whether a consonant is "p" or "t", or even where the phoneme boundaries are? Most researchers expected that computers would find the features of speech, corresponding to how the mouth produced the sounds: "this is a voiced guttural stop, that is a rounded front vowel". The major obstacle was the acoustic model, which may be imagined as phonemic analysis. (I once proposed the term "autoglossomerolysis," but somehow it didn't catch on.) In the early 1970s, ARPA threw massive amounts of funding at the problem.
Another ARPA project was to have computers know what people were saying - called "speech recognition".
Department of Defense during the Cold War you're certainly familiar with one of its creations, the Internet. Wreck a Nice Beach - You've probably heard of ARPA, the advanced research wing of the U.S. Where is HAL 9000? The QWERTY keyboard is a clumsy dinosaur of course you'd eventually like your computer to read your thoughts, but in the meantime, why can't you just tell it what to do? Well, to a large extent, you can you wouldn't want to hand over control of a mission-critical task to a voice-driven computer just yet, but your computer need no longer be as deaf as a post either. On the other hand, by all accounts we should long ago have been talking to our computers. Increased brain power was an obvious prediction, but few foresaw that computers would also become small, cheap, and ubiquitous, with all the tremendous attendant sociological implications. Talk Is Cheap - ViaVoice Enhanced EditionĬlassic science fiction, by and large, has proven both myopic and optimistic when it comes to computers.