[Elsnet-list] NLP internship at Xerox Research Center Europe, Grenoble, France
vassilina.nikoulina at xrce.xerox.com
Fri Dec 5 09:23:36 CET 2014
Title: Identification of discontinuous variants of compound terms
Duration 5-6 months
Start date: January-February 2015
The main theme of the internship is the identification of terms and
concepts in domain-specific texts, with a focus on medical texts in the
context of the EURECA project (http://eurecaproject.eu/). We have a
dictionary-based term identification system capable of identifying
occurrences of terms in free texts, including non-listed term variants
(e.g. inflected or misspelled terms). The task of the internship will
consist in contributing to the extension of types of variations that the
term identifier can handle. In particular, the intern will work on the
identification and normalization of discontinuous compound terms that
are involved in specific syntactic structures (e.g. coordination), using
distant supervision with existing domain terminologies. An example of
discontinuous compound terms is “abdominal distention” in the
expression “abdominal bloating or distention”.
The ideal candidate is a student (MSc or PhD) in computational
linguistics, or computer science with a good background in NLP. S/he has
a good knowledge of syntactic structures and parsing. Good programming
skills, preferably in Java, are also required. Prior experience in NLP
for the healthcare domain or in terminologies/ontologies is a plus.
During the internship the candidate will acquire a significant knowledge
and practice in the use of hybrid methods for term identification,
including distant supervision based on rich terminologies and
ontologies. As well, s/he will work closely with researchers and
engineers in an international research environment.
You can find more details about this offer at
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Elsnet-list