[Elsnet-list] Traineeship position for Polish Text Mining and Evaluation at the JRC in Italy
ralf.steinberger at jrc.ec.europa.eu
Sun Feb 12 12:18:13 CET 2012
Readers on this list may be interested in the following traineeship position to work on adapting multilingual text mining tools to the Polish language. Feel free to pass on this message.
European Commission – Joint Research Centre (JRC)
Call Reference Number: 2012-IPSC-16 - ISPRA
Title: Multilingual Text Mining and Evaluation (Polish)
Duration: 6 months
Location: Joint Research Centre (JRC), Ispra, Italy
URL on rules and conditions: http://ec.europa.eu/dgs/jrc/downloads/jrc_trainee_rules_en.pdf
Application via staff recruitment application tool ESRA: http://recruitment.jrc.ec.europa.eu
The mission of the Joint Research Centre (JRC) is to provide customer-driven scientific and technical
support for the conception, development, implementation and monitoring of EU policies. Being a Directorate-
General of the European Commission, the JRC functions both as the in-house science service of the
Commission and as a reference centre for science and technology for the Union. With 7 Scientific Institutes,
3 Corporate Directorates and the DG/DDG Office, the JRC is located in 5 Member States (Belgium,
Germany, Italy, the Netherlands and Spain). Further information is available at: http://www.jrc.ec.europa.eu
The current vacancy is in the Institute for the Protection and Security of the Citizen (located in Ispra, Italy).
The Institute provides research results and supports EU policy-makers in their effort towards global security
and protection of European citizens from accidents, deliberate attacks, fraud and illegal actions against EU
policies. More details on IPSC can be found at: http://ipsc.jrc.ec.europa.eu.
The vacancy is within the Global Security and Crisis Management Unit (GlobeSec), in the OPTIMA Action
(Open Source Text Information Mining and Analysis). Research and development efforts in the OPTIMA
group produce novel and unique approaches and software that gather and analyse an average of 100,000
media reports per day from online news portals world-wide in 50 languages. The tools classify according to
subject domains, cluster related articles, summarise the news clusters, extract information from them,
aggregate the extracted information, track topics over time, issue breaking news alerts and produce visual
presentations of the information found. See http://emm.newsbrief.eu/overview.html to access the public
Europe Media Monitor (EMM) portals.
We propose a trainee position in Ispra, Italy.
We are looking for a person to help us analyse Polish language news and social media posts, and
specifically to help us adapt EMM’s multilingual suite of text mining tools to the Polish language. EMM’s tools
- currently developed for up to 20 languages - include the following functionality: Named Entity Recognition
and disambiguation (persons, organisation, locations, dates); co-reference resolution of definite descriptions;
quotation recognition; document clustering; document categorisation using Boolean search expressions;
multi-document summarisation; Statistical Machine Translation.
Trainee Project Sheet
The selected person will be a member of an international and highly motivated team of researchers and
developers. They will learn about the inner workings of some of the most highly multilingual text analysis
applications world-wide, and they are likely to become co-authors of scientific publications on the
applications they work on.
The successful candidate will be asked to contribute to the group effort by working on the following tasks:
· Creating lexical resources for Information Extraction, by using semi-automatic methods;
· Exploiting externally available dictionaries and corpora, which requires format conversion, data cleaning, consistency checking;
· Adapting the currently existing language-independent rule set to Polish, if necessary;
· Evaluating the output of the Polish text mining tools and helping to improve them;
· Possibly, producing gold-standard annotations for various information extraction tasks for evaluation purposes;
· Contribute to scientific publications (with co-authorship).
We look for:
We look for a candidate who fits the following description:
· University degree in Computational Linguistics or a related field, either completed or near completion;
· Hands-on Java programming skills;
· Knowledge of Polish morphology;
· Ability to work in a predominantly English-speaking team;
· Willingness to contribute hands-on to produce working online applications.
One or more of the following skills would be an asset:
· Programming skills in a scripting language like Perl or Python;
· Knowledge of, and hands-on experience with, a variety of text mining tools;
· Hands-on experience with using databases;
· Hands-on experience with using Polish linguistic resources;
· Experience in morphology, lexicology and or text annotation;
· Knowledge, even passive, of further natural languages;
· Experience with XML and with text data format conversion.
Mandatory language skills:
· For EU nationals: knowledge of at least 2 Community official languages, of which one should be English, French or German. Required 2nd language level is B2 according to the Common European Framework of Reference for Languages.
· For non-EU nationals: very good knowledge of English, French or German. Required level of the language is C2 according to the Common European Framework of Reference for Languages.
· Other requirements are according to the Rules Governing the Traineeship Scheme of the Joint Research Centre.
In order to apply please follow directions next to the published call:
Please note that only online applications via the ESRA tool will be considered.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Elsnet-list