[Elsnet-list] 1st CFP: Fourth LREC Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing

Paul Thompson Paul.Thompson at manchester.ac.uk
Wed Dec 18 10:40:02 CET 2013

DATE:  31st May 2014
organised in conjunction with LREC 2014 (26-31 May 2014, Reykjavik, Iceland)


Over the past years, there has been an exponential growth in amount of biomedical and health information available in digital form. In addition to the 23 million references to biomedical literature currently available in PubMed, other sources of information are becoming more readily available. For example, digitisation efforts have resulted in the ready availability of large volumes of historical material, there is a wealth of information available in clinical records, whilst the growing popularity of social media channels has resulted in the creation of various specialised groups. Extensive information is available in available in languages other than English, e.g. much medical literature is written especially in Chinese, but to a certain extent also in Japanese, Korean and Russian.

With such a deluge of information at their fingertips, domain experts and health professionals have an ever-increasing need for tools that can help them to isolate relevant nuggets of information in a timely and efficient manner, regardless of both information source and mother tongue. However, this goal presents many new challenges in analysis and search. For example, given the highly multilingual nature of available information, it is important that language barriers do not result in vital information being missed.  In addition, different information sources cover varying topics and contain differing styles of language, while varying terminology may be used by lay persons, academics and health professionals. There is also often little standardisation amongst the extensive use of abbreviations found in medical and health-related text.

Building upon the success of workshops on Building and Evaluating Resources for Biomedical Text Mining, held in conjunction with the previous three LREC conferences, the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (BioTxtM 2014) aims to bring together researchers who have designed, created, adapted or evaluated biomedical and health text resources, those who are making use of such resources in their tools and applications (text mining, multilingual search, machine translation, information extraction, question-answering, document authoring, etc), and domain experts/health professionals who would benefit from the use of such resources and tools.  The workshop will allow an assessment of the current state of the art of resources, and will provide a forum for the discussion of current problems, ideas, questions and open issues. This will help to identify both future directions for research and new potential collaborations between members of the community.  We particularly welcome submissions that deal with resources that deal with languages other than English, or which facilitate multilingual access to information.


Applications in the health and biomedical domain are reliant on high quality resources. These include databases and ontologies (e.g., Biothesaurus, UMLS Metathesaurus) and lexica (e.g., BioLexicon and UMLS SPECIALIST lexicon). Given the frequently changing and variable nature of biomedical terminology and abbreviations, combined with the requirement to take multilingual information into account, there is an urgent need to investigate new ways of creating, updating such resources, or adapting them to new languages. New techniques may include combining semi-automatic methods, machine translation techniques, crowdsourcing or other collaborative efforts.

Community shared tasks and challenges (e.g. Biocreative I-IV, ACL BioNLP Shared Tasks (2009-2011-2013) etc.) have resulted in an increase in the number of annotated corpora, covering an ever-expanding range of sub-domains and annotation types. Such corpora are helping to steer research efforts to focus on open research problems, as well as encouraging the development of increasingly adaptable and wider coverage text mining tools.

Interoperability and reuse are also vital considerations, as evidenced by efforts such as the BioCreative Interoperability Initiative (BioC) and the UIMA architecture. Several of the corpora introduced above are compliant with both BioC and UIMA, and are available within the U-Compare and Argo systems, which allow easy construction of NLP workflows and evaluation against gold standard corpora.

There is also a need to consider how resources and techniques can facilitate easier access to information relevant information that is written in a variety of different languages.  For example, can existing techniques and resources used for machine translation, multilingual search and question answering in other domains be adapted simplify access to multilingual information in the biomedical and health domains?

Call for Papers

We invite papers reporting on resources that support the application of biomedical text mining to various text types/information sources, biomedical sub-domains and languages, and the process of designing, building, updating, delivering, using and evaluating such resources for various purposes. The workshop will focus both on the lexical and knowledge repositories themselves (e.g., terminologies, ontologies, controlled vocabularies, factual databases, annotated corpora, etc.) as well as on issues relating to their usability (e.g., design guidelines, standards for building resources, storage and exchange formats, interoperability issues, etc.) and on the different ways in which they are being employed by applications and tools to facilitate information access.

The workshop will act as a stimulus for the discussion of several ongoing research questions driving current and future research in the area of biomedical and clinical text mining, in order to support access to information from a range of sources and written in a variety of languages. These questions include the following:
*Among the available resources, which are the most used? What makes a good resource? How can we ensure that resources are maintained and updated?
*Which types of resources are still lacking and what is needed urgently? Are any resources planned or in development to address such gaps?
*Can existing resources sufficiently support text mining and synthesis of information from multiple text types/channels and biomedical subdomains? How can active learning and crowdsourcing improve the coverage of existing resources?
*Which resources are available that cover languages other than English? Can existing resources/techniques (e.g. machine translation) be used to bootstrap the development of resources for other languages?  Are these resources sufficient to support multilingual access and search of relevant information?
*How easily can resources be employed for different purposes? What efforts have been made to make resources reusable or interoperable? To what extent have these efforts been successful?
*How can machine translation, multilingual search and question answering simplify access to multilingual information?
*Can automated processing of multilingual documents make the process of synthesizing information from multiple sources more efficient?
*How can we involve medical professionals  and biologists to provide documentation and annotate text suitable for machine analysis?
*How well do current technologies for search, machine translation, question answering, etc. work in facilitating the efficient and effective location of information in biomedical and health-related text, from a number of different sources?

Topics of interest include but are not limited to:

*Building biomedical and health resources for various languages : controlled vocabularies, terminologies, ontologies, corpora, multi-lingual resources
*Guidelines, annotation schemas, annotation tools
*Reengineering existing biomedical or general language resources
*Semi-automatic and/or collaborative methods for the update, evolution, extension or enrichment of resources
*Adapting resources to new sub-domains, text types or languages
*Interoperability of resources and standards
*Lightly annotated and noisy resources
*Tools for the exploration of resources
*Data exchange formats
*Evaluation, comparison and critical assessment of resources/ evaluation metrics
*Innovative employment of resources in tools and applications, for both monolingual and multilingual access to biomedical and health-related information within from a variety of textual sources
*Evaluation of tools, applications and technologies making use of biomedical and health-related resources


* Sophia Ananiadou, National Centre for Text Mining, University of Manchester UK
* Khalid Choukri, ELDA, Paris, France
* Kevin Bretonnel Cohen, Computational Bioscience Program, University of Colorado School of Medicine, USA
* Dina Demner-Fushman, National Library of Medicine, USA
* Jan Hajic, Charles University Prague, Czech Republic
* Allan Hanbury, Technical University of Vienna, Austria
* Gareth Jones, Dublin City University, Ireland
* Henning Müller, HES-SO Valais, Sierre, Switzerland
* Pavel Pecina, Charles University Prague, Czech Republic
* Paul Thompson, National Centre for Text Mining, University of Manchester, UK

February 10th 2014    Paper submissions due
March 10th 2014      Paper notification of acceptance
April 1st 2014       Camera-ready papers due
May 31st 2014       Workshop

Papers must describe original, completed or in progress, and unpublished work.  Each submission will be reviewed by two program committee members.
Accepted papers will be given up to 8 pages in the workshop proceedings, and will be presented either as an oral presentation or poster.

Papers should be formatted according to the stylesheet, which will be provided on the LREC 2014 website in due course (http://lrec2014.lrec-conf.org/).

Paper review will be blind, so papers should not include authors' names and affiliations.
Accepted papers will be published in the workshop proceedings.

When submitting a paper from the START page (details to be announced on the workshop website in due course), authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research.
Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.), to enable their reuse, replicability of experiments, including evaluation ones, etc….


Olivier Bodenreider, National Library of Medicine, USA
Wendy Chapman, University of Utah, USA
Hercules Dalianis, University of Stockholm, Sweden
Noémie Elhadad, Columbia University, USA
Graziela Gonzalez, Arizona University, USA
Jin-Dong Kim, DBCLS, Japan
Dimitris Kokkinakis, Gothenburg University, Sweden
Ioannis Korkontzelos, University of Manchester, UK
Hongfang Liu, Mayo Clinic, USA
Naoaki Okazaki, Tohoku University, Japan
Arzucan Özgür, Bogazici University, Turkey
Claire Nedellec, INRA, France
Sampo Pyysalo, University of Turku, Finland
Fabio Rinaldi, University of Zurich, Switzerland
Andrey Rzhetsky, University of Chicago, USA
Guergana Savova, Childrens Hospital Boston and Harvard Medical School, USA
Hagit Shatkay, University of Delaware, USA
Rafal Rak, University of Manchester, UK
Lucy Vanderwende, Microsoft, USA
Karin Verspoor, NICTA, Australia
John Wilbur, NCBI, NLM, NIH, USA
Stephen Wu, Mayo Clinic, USA
Pierre Zweigenbaum, LIMSI, France


Paul Thompson
Research Associate
School of Computer Science
National Centre for Text Mining
Manchester Institute of Biotechnology
University of Manchester
131 Princess Street
M1 7DN
Tel: 0161 306 3091

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.hum.uu.nl/pipermail/elsnet-list/attachments/20131218/6c8be218/attachment-0001.html>

More information about the Elsnet-list mailing list