[Elsnet-list] Read-out speech corpus needed

Eric Atwell csc6ea at leeds.ac.uk
Tue Apr 26 15:58:40 CEST 2011


Mike (and anyone else interested),

We can offer the ProPOSEC dataset ("ProPOSEC: A Prosody and PoS 
Spoken English Corpus" by Claire Brierley and Eric Atwell, LREC 2010).

This open source material has already been donated to the Aix-MARSEC
project led by Professor Daniel Hirst at the University of Aix, France.

ProPOSEC uses the following files from Section A of the Lancaster-IBM
Spoken English Corpus of BBC radio broadcast transcripts: A01,
A03, A04, A05, A06, A07, A08, A09, A10, A11. The order of fields for
each entry is as follows:

File ID
word
LOB
C5
SAMPA (transcription from Aix-Marsec)
SAMPA (transcription from ProPOSEL)
Syllable count
Lexical stress pattern
Content/function word status
DISC phonetic transcription
DISC transcription mapped to syllable weightings
Prototype phone-TSM mapping from Aix-Marsec


I hope this helps

Eric Atwell (pp Claire Brierley), Leeds University


On Fri, 22 Apr 2011, Mike Rosner wrote:

> Dear All,
>
> I have a student who is doing some comparative work on formal vs
> informal speech and need to get my hands on an annotated read-out
> speech corpus at low or zero cost.
>
> If anyone can help with that please get in touch.
>
> Many thanks
>
> Mike
> --
> Michael Rosner
> Dept Intelligent Computer Systems
> University of Malta, MSD2080, MALTA
> mike.rosner at um.edu.mt, www.cs.um.edu.mt/~mros
> +356 2340 2519


Eric Atwell, Senior Lecturer, Language research group,
  I-AIBS Institute for Artificial Intelligence and Biological Systems
  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
  WWW: http://www.comp.leeds.ac.uk/arabic
       http://www.comp.leeds.ac.uk/nlp


More information about the Elsnet-list mailing list