[Elsnet-list] CFP, Special Session at Interspeech 2007, Structure-Based and Template-Based ASR

Helmer W.Strik at let.ru.nl
Tue Feb 27 22:01:44 CET 2007


(apologies for multiple cross-posting)

---------------------------
Call for Papers
Submission deadline: 23rd March

Special Session at INTERSPEECH 2007, Antwerp, Belgium:

Structure-Based and Template-Based Automatic Speech Recognition - 
Comparing parametric and non-parametric approaches

While hidden Markov modeling (HMM) has been the dominant technology for 
acoustic modeling in automatic speech recognition today, many of its 
weaknesses have also been well known and they have become the focus of 
much intensive research. One prominent weakness in current HMMs is the 
handicap in representing long-span temporal dependency in the acoustic 
feature sequence of speech, which, nevertheless, is an essential 
property of speech dynamics. The main cause of this handicap is the 
conditional IID (Independent and Identical Distribution) assumption 
inherit in the HMM formalism. Furthermore, in the standard HMM approach 
the focus is on verbal information. However, experiments have shown that 
non-verbal information also plays an important role in human speech 
recognition which the HMM framework has not attempted to address 
directly. Numerous approaches have been taken over the past dozen years 
to address the above weaknesses of HMMs. These approaches can be broadly 
classified into the following two categories.

The first, parametric, structure-based approach establishes mathematical 
models for stochastic trajectories/segments of speech utterances using 
various forms of parametric characterization, including polynomials, 
linear dynamic systems, and nonlinear dynamic systems embedding hidden 
structure of speech dynamics. In this parametric modeling framework, 
systematic speaker variation can also be satisfactorily handled. The 
essence of such a hidden-dynamic approach is that it exploits knowledge 
and mechanisms of human speech production so as to provide the structure 
of the multi-tiered stochastic process models. A specific layer in this 
type of models represents long-range temporal dependency in a parametric 
form.

The second, non-parametric and template-based approach to overcoming the 
HMM weaknesses involves direct exploitation of speech feature 
trajectories (i.e., 'template') in the training data without any 
modeling assumptions. Due to the dramatic increase of speech databases 
and computer storage capacity available for training, as well as the 
exponentially expanded computational power, non-parametric methods using 
the traditional pattern recognition techniques of kNN 
(k-nearest-neighbor decision rule) and DTW (dynamic time warping) have 
recently received substantial attention. Such template-based methods 
have also been called exemplar-based or data-driven techniques in the 
literature.

The purpose of this special session is to bring together researchers who 
have special interest in novel techniques that are aimed at overcoming 
weaknesses of HMMs for acoustic modeling in speech recognition. In 
particular, we plan to address issues related to the representation and 
exploitation of long-range temporal dependency in speech feature 
sequences, the incorporation of fine phonetic detail in speech 
recognition algorithms and systems, comparisons of pros and cons between 
the parametric and non-parametric approaches, and the computation 
resource requirements for the two approaches.

This special session will start with an oral presentation in which an 
introduction of the topic is provided, a short overview of the issues 
involved, directions that have already been taken, and possible new 
approaches. At the end there will be a panel discussion, and in between 
the contributed papers will be presented.

Submission:
Researchers who are interested in contributing to this special session 
are invited to submit a paper according to the regular submission 
procedure of INTERSPEECH 2007, and to select 'Structure-Based and 
Template-Based Automatic Speech Recognition' in the special session 
field of the paper submission form. The paper submission deadline is 
March 23, 2007.

Session organizers:
Li Deng <deng at microsoft.com>
Helmer Strik <strik at let.ru.nl>

Information about this special session can also be found at the 
following websites:
http://www.interspeech2007.org/Technical/structure_template_based_asr.php
http://lands.let.ru.nl/~strik/IS2007-Special_Session-STB_ASR.html



More information about the Elsnet-list mailing list