SemEval 2010

Coreference Resolution in Multiples Languages

The task is concerned with intra-document coreference resolution for six different languages: Catalan, Dutch, English, German, Italian and Spanish. The core of the task is to identify which noun phrases (NPs) in a text refer to the same discourse entity.
Data is provided for both statistical training and evaluation, which extract the coreference chains from manually annotated corpora: the AnCora corpora for Catalan and Spanish, the OntoNotes and ARRAU corpora for English, the TüBa-D/Z for German, the KNACK corpus for Dutch, and the LiveMemories corpus for Italian, additionally enriched with morphological, syntactic and semantic information (such as gender, number, constituents, dependencies, predicates, etc.).

Webpage: http://stel3.ub.edu/semeval2010-coref/

Datasets download: task01.train_.v1.0_2.zip