Speech Recognition has a long history of being one of the difficult
problems in Artificial Intelligence and Computer Science. As one goes
from problem solving tasks such as puzzles and chess to perceptual tasks
such as speech and vision, the problem characteristics change
dramatically: knowledge poor to knowledge rich; low data rates to high
data rates; slow response time (minutes to hours) to instantaneous
response time. These characteristics taken together increase the
computational complexity of the problem by several orders of magnitude.
Further, speech provides a challenging task domain which embodies many
of the requirements of intelligent behavior: operate in real time;
exploit vast amounts of knowledge, tolerate errorful, unexpected unknown
input; use symbols and abstractions; communicate in natural language and
learn from the environment. Voice input to computers offers a number of
advantages. It provides a natural, fast, hands free, eyes free, location
free input medium. However, there are many as yet unsolved problems that
prevent routine use of speech as an input device by non-experts. These
include cost, real time response, speaker independence, robustness to
variations such as noise, microphone, speech rate and loudness, and the
ability to handle non-grammatical speech. Satisfactory solutions to each
of these problems can be expected within the next decade. Recognition of
unrestricted spontaneous continuous speech appears unsolvable at
present. However, by the addition of simple constraints, such as
clarification dialog to resolve ambiguity, we believe it will be
possible to develop systems capable of accepting very large vocabulary
continuous speechdictation.