[Elsnet-list] Two new Ph.D. positions at the University of Edinburgh

Simon King Simon.King at ed.ac.uk
Mon Jun 4 13:49:19 CEST 2007

Two funded Ph.D. positions

Centre for Speech Technology Research,
University of Edinburgh, UK

CSTR has two new Ph.D. positions available, starting in September 2007
or January 2008.

For more information about studying at CSTR and how to apply, start at
http://www.cstr.ed.ac.uk/opportunities or to discuss these
opportunities with us, email Simon.King at ed.ac.uk

1) Study of Source Features and Source Modelling for Speech Synthesis
    and Speaker Recognition

This position includes spending time at a top Indian engineering
institution - further details below. It is open to applicants from the
UK (who would be fully funded with fees and a generous stipend) or
from the EU (where the funding would cover fees-only whilst in the UK,
and full costs whilst in India).

2) Any other topic in speech processing

This position is open to applicants from the UK (who would be fully
funded) or from the EU (where the funding would cover fees only). A
list of topics of interest can be found at
http://www.cstr.ed.ac.uk/opportunities/phd_topics.html but we also
encourage applicants to discuss their own proposals with us.

Further details of position 1

As part of an exciting new UK-India collaboration project, we are
seeking a UK student to work on novel approaches to speech processing
based on statistical models of speech articulation and of the voice
source, with applications in speech synthesis and/or speaker

This position will involve spending 6-12 months at one or both of the
Indian partner institutions: the International Institute of
Information Technology (IIIT) in Hyderabad and the Indian Institute of
Technology (IIT) in Guwahati. These are elite engineering schools with
world-class research programmes, and are very prestigious places to

Speech Synthesis

CSTR is a world leader in text-to-speech, exemplified by the open
source "Festival" TTS system which has been downloaded by tens of
thousands of users. In this project we plan to focus on trajectory
hidden Markov model (HMM) speech synthesis. The trajectory HMM method
has attracted a great deal of attention since being ranked very highly
in the 2005 and 2006 "Blizzard" speech synthesis evaluations. This
model uses a trajectory HMM estimated from data combined with a voice
source model. The generated speech is rated as highly intelligible,
but there is room for improvement in naturalness. We plan to address
this problem via developments in trajectory modelling and improved
source modelling. The work in source modelling will build on the
analyses of excitation source characteristics, previously carried out
at IITG and IIITH, utilising the signal processing methods and
nonlinear models developed by these groups. The work on trajectory
modelling will build on work currently underway at CSTR concerned with
both the underlying statistical model and the development of
constraints related to speech articulation. An advantage of HMM-based
speech synthesis over conventional concatenative methods is that it is
easily trained on modest amounts of data and can thus be adapted to
new speakers using existing techniques, making it especially well
suited for use on a wide variety of languages, dialects and accents,
without the need to collect very large databases.

Speaker Recognition

The unique way in which an individual produces speech can be used to
recognise that person; applications include identity verification for
online services such as banking etc. Short term spectral information
like cepstral features characterising mainly the vocal tract
information have been extensively used in developing speaker
recognition systems. It is well known that the voice excitation source
features contain significant speaker information, and the articulatory
movements of a speaker are also unique. In this project we plan to
combine the expertise of IITG and IIITH in speaker recognition using
voice source characteristics with CSTR's expertise in articulatory
modelling for recognition, and in inferring articulator movements from
the acoustic speech signal. We will combine evidence from the source
characteristics of a speaker with evidence about the movements of
their articulators (both inferred from the acoustic signal alone) to
improve the accuracy and reliability of speaker recognition.

Dr. Simon King                               Simon.King at ed.ac.uk
Centre for Speech Technology Research          www.cstr.ed.ac.uk
For MSc/PhD info, visit  www.hcrc.ed.ac.uk/language-at-edinburgh

More information about the Elsnet-list mailing list