An accessible explanation of the technologies that enable such popular
voice-interactive applications as Alexa, Siri, and Google Assistant.
Have you talked to a machine lately? Asked Alexa to play a song, asked
Siri to call a friend, asked Google Assistant to make a shopping list?
This volume in the MIT Press Essential Knowledge series offers a
nontechnical and accessible explanation of the technologies that enable
these popular devices. Roberto Pieraccini, drawing on more than thirty
years of experience at companies including Bell Labs, IBM, and Google,
describes the developments in such fields as artificial intelligence,
machine learning, speech recognition, and natural language understanding
that allow us to outsource tasks to our ubiquitous virtual assistants.
Pieraccini describes the software components that enable spoken
communication between humans and computers, and explains why it's so
difficult to build machines that understand humans. He explains speech
recognition technology; problems in extracting meaning from utterances
in order to execute a request; language and speech generation; the
dialog manager module; and interactions with social assistants and
robots. Finally, he considers the next big challenge in the development
of virtual assistants: building in more intelligence--enabling them to
do more than communicate in natural language and endowing them with the
capacity to know us better, predict our needs more accurately, and
perform complex tasks with ease.