[Elsnet-list] Request: Arabic transcript marked with pauses

Eric Atwell E.S.Atwell at leeds.ac.uk
Mon Apr 8 22:37:35 CEST 2013


Researchers at the Universities of Leeds and Jordan are looking for a
small test corpus (5000+ words) of transcribed Modern Standard Arabic 
(MSA) annotated with PHRASE BREAKS. The latter should delineate 
well-formed, meaningful chunks and should not represent disfluencies. 
To illustrate the kind of thing we are looking for, here is
a single MSA sentence of 48 words:
http://www.comp.leeds.ac.uk/claireb/msaSentence.pdf

In this example, only two words are followed by punctuation - and we
have identified these as breaks. In addition, we have also tagged a 
few other words as likely boundary locations. If you know of 
or have such a resource, we would love to hear from you.

Thanks,

Claire Brierley C.Brierley at leeds.ac.uk 
Senior Research Fellow
School of Computing, University of Leeds, UK





More information about the Elsnet-list mailing list