[Elsnet-list] ACL 2008 Tutorial Proposal: Building Practical Spoken Dialog Systems

Antoine Raux antoine at cs.cmu.edu
Wed Mar 12 16:00:43 CET 2008


Title: Building Practical Spoken Dialog Systems


Abstract:
This tutorial will give a practical description of the free software 
Carnegie Mellon Olympus 2 Spoken Dialog Architecture. Building real 
working dialog systems that are robust enough for the general public to 
use is difficult. Most frequently, the functionality of the 
conversations is severely limited - down to simple question-answer 
pairs. While off-the-shelf toolkits help the development of such simple 
systems, they do not support more advanced, natural dialogs nor do they 
offer the transparency and flexibility required by computational 
linguistic researchers.  However, Olympus 2 offers a complete dialog 
system with automatic speech recognition (Sphinx) and synthesis (SAPI, 
Festival) and has been used, along with previous versions of Olympus, 
for teaching and research at Carnegie Mellon and elsewhere for some 5 
years. Overall, a dozen dialog systems have been built using various 
versions of Olympus, handling tasks ranging from providing bus schedule 
information to guidance through maintenance procedures for complex 
machinery, to personal calendar management. In addition to simplifying 
the development of dialog systems, Olympus provides a transparent 
platform for teaching and conducting research on all aspects of dialog 
systems, including speech recognition and synthesis, natural language 
understanding and generation, and dialog and interaction management.

The tutorial will give a brief introduction to spoken dialog systems 
before going into detail about how to create your own dialog system 
within Olympus 2, using the Let's Go bus information system as an 
example. Further, we will provide guidelines on how to use an actual 
deployed spoken dialog system such as Let's Go to validate research 
results in the real world. As a possible testbed for such research, we 
will describe Let's Go Lab, which provides access to both the Let's Go 
system and its genuine user population for research experiments.

Attendees will receive a CD with the latest version of the Olympus 2 
architecture, along with several tutorials and example systems.


Outline:
* Introduction
* Overview of current spoken dialog system architectures
* Description of the Olympus2 dialog architechture
* How to build an Olympus2 dialog system (text I/O)
-break-
* Expanding an Olympus2 system to use speech - a true spoken dialog system
* Discussion of installation requirements and practical system-building 
issues, including:
   - telephony
   - system backend
   - ASR (re)training / (re)tuning
   - improving synthesis output
   - dialog strategies & parameters
   - monitoring / logging
* Using Olympus2 for research and applications
   - Let's Go Lab: a test platform for dialog systems with real users
* Final summary


Presenter Bios:
Antoine Raux
Language Technologies Institute
Carnegie Mellon University
http://www.cs.cmu.edu/~antoine/
email: antoine at cs.cmu.edu

Antoine Raux is a PhD student at the Language Technologies Institute at 
Carnegie Mellon University. He has been conducting research and 
published more than 15 reviewed papers on several aspects of dialog 
systems, including speech recognition, speech synthesis, dialog and 
interaction management, and system building. His teaching experience 
includes two teaching assistantships in natural language-related 
graduate courses, as well as the ongoing design of online tutorials for 
the Olympus architecture.


Brian Langner
Language Technologies Institute
Carnegie Mellon University
http://www.cs.cmu.edu/~blangner/
email: blangner at cs.cmu.edu

Brian Langner is a PhD student at the Language Technologies Institute at 
Carnegie Mellon University. He has been conducting research and 
published more than 12 reviewed papers on speech synthesis, natural 
language generation, and spoken dialog systems. He has six semesters of 
experience as a teaching assistant for graduate and undergraduate 
computing- or natural language- related courses, including some course 
design, in addition to continuing work for the Olympus architecture 
tutorials.


Dr. Alan W Black
Language Technologies Institute
Carnegie Mellon University
http://www.cs.cmu.edu/~awb/
email: awb at cs.cmu.edu

Alan W Black is an Associate Research Professor in the Language 
Technologies Institute at Carnegie Mellon University. He previously 
worked in the University of Edinburgh, and before that at ATR in Japan. 
He received his PhD in Computational Linguistics from Edinburgh 
University in 1993. He is one of the principal authors of the Festival 
Speech Synthesis System. In addition to speech synthesis, he also works 
on two-way speech-to-speech translation systems and, telephone-based 
spoken dialog systems. He also has served on the IEEE Speech Technical 
Committee (2003-2006), is on the editorial board of Speech 
Communications and is a board member of ISCA. He teaches a number of 
graduate and undergraduate courses and has taught a number of short term 
tutorials on speech synthesis, speech technology and on rapid support 
for new languages.


Dr. Maxine Eskenazi
Language Technologies Institute
Carnegie Mellon University
http://www.cs.cmu.edu/~max/
email: max at cs.cmu.edu

Maxine Eskenazi is on the faculty of the Language Technologies Institute 
at Carnegie Mellon University. She has a BA from Carnegie Mellon 
University in French and Education and a These de Troisieme Cycle from 
the Universite de Paris 11 in Computer Science. She has extensive 
publications on the use of automatic speech processing for spoken dialog 
systems and on the use of language technologies for computer-assisted 
language learning. She is the Principal Investigator on the NSF Let's Go 
project.


More information about the Elsnet-list mailing list