[Elsnet-list] Submission to Elsnet List - Special Session at Interspeech 2012

Florian Metze fmetze at cs.cmu.edu
Tue Mar 6 22:27:20 CET 2012

[Apologies for multiple postings]

We would like to draw your attention on the following special session organized at Interspeech 2012, next September, with submissions being due on April 1, 2012. The goal of this special session is to bring together all ressearchers from the audio, speech and language field working on consumer and semi-professional multimedia material. The special session will also serve as a launching event for the ISA Speech and Language in Mutlimedia SIG.

We invite you to submit original research work to make that event happen and share your work (and eventually more) with colleagues from the field! Contributions are expected on all aspects of speech and audio processing for multimedia contents: research results but also presentation of ongoing research projects or software, multimedia databases and benchmarking initiatives, etc.

Note that the session will be considered at Interspeech only if we have enough accepted contributions.

Waiting forward to seeing you in Portland.

Guillaume Gravier and Florian Metze, General chairs.


                Special Session at Interspeech 2012

Speech and Audio Analysis of Consumer and Semi-Professional Multimedia



Consumer-grade and semi-professional multimedia material (video) is becoming abundant on the Internet and other online archives. It is easier than ever to download material of any kind. With cell-phones now featuring video recording capability along with broadband connectivity, multimedia material can be recorded and distributed across the world just as easily as text could just a couple of years ago. The easy availability of vast amounts of text gave a huge boost to the Natural Language Processing and Information Retrieval research communities, The above-mentioned multimedia material is set to do the same for multi-modal audio and video analysis and generation. We argue that the speech and language research community should embrace that trend, as it would profit vastly from the availability of this material, and has significant own know-how and experience to contribute, which will help shape this field.

Consumer-created (as opposed to broadcast news, “professional style”) multimedia material offers a great opportunity for research on all aspects of human-to-human as well as man-machine interaction, which can be processed offline, but on a much larger scale than is possible in online, controlled experiments. Speech is naturally an important part of these interactions, which can link visual objects, people, and other observations across modalities. Research results will inform future research and development directions in interactive settings, e.g. robotics, interactive agents, etc., and give a significant boost to core (offline) analysis techniques such as robust audio and video processing, speech and language understanding, as well as multimodal fusion.

Large-scale multi-modal analysis of audio-visual material is beginning in a number of multi-site research projects across the world, driven by various communities, such as information retrieval, video search, copyright protection, etc. While each of these have slightly different targets, they are facing largely the same challenges: how to robustly and efficiently process large amounts of data, how to represent and then fuse information across modalities, how to train classifiers and segmenters on un-labeled data, how to include human feedback, etc. Speech, language and audio researchers have considerable interest and experience in these areas, and should be at the core and forefront of this research. To make progress at a useful rate, researchers must be connected in a focused way, and be aware of each other’s work, in order to discuss algorithmic approaches, ideas for evaluation and comparisons across corpora and modalities, training methods with various degrees of supervision, available data sets, etc. Sharing software, databases, research results and projects' descriptions are some of the key elements to success which are at the core of the Speech and Language in Multimedia (SLIM) SIG's objectives.

The special session will serve these goals by bringing together researchers from different fields – speech, but also audio, multimedia – to share experience, resources and foster new research directions and initiatives. Contributions are expected on all aspects of speech and audio processing for multimedia contents: research results but also presentation of ongoing research projects or software, multimedia databases and benchmarking initiatives, etc. A special session, as opposed to a regular session, offers unique opportunities to emphasize interaction between participants with the goal of strengthening and growing the SLIM community. The following format will be adopted: a few selected talks targeting a large audience (e.g., project or dataset descriptions, overview) will open the session, followed by a panel and open discussion on how to develop our community along with poster presentations.


  Florian Metze
  Assistant Research Professor
  Language Technologies Institute
  School of Computer Science
  Carnegie Mellon University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.elsnet.org/pipermail/elsnet-list/attachments/20120306/55bf6d0d/attachment-0001.htm 

More information about the Elsnet-list mailing list