Introduction

AnCora consists of a Catalan corpus and a Spanish corpus, each of them of 500,000 words. The corpora are annotated at different levels:

  1. Morphological categories
  2. Syntactic constituents and functions
  3. Argument structure and thematic roles
  4. Semantic classes of the verb  
  5. Nouns related to WordNet synsets
  6. Named Entities

 

Two verbal lexicons are also available as the result of this annotation process. The Spanish verbal lexicon consists of 2.580 entries and the Catalan lexicon of 2.142. Each verb sense is detailed with the following information: semantic classes, syntactic subcategories, argumental structure and thematic roles.

The AnCora Corpus is mainly based on journalist texts. For more information, click AnCora-corpus.

The annotators of AnCora are:

Joan Aparicio Mera, Oriol Borrega Cepa, Isabel Britz Hernández, Núria Bufí Cabrol, Maria Jesús Díaz Cabrera, Silvia Garcia Casaseca, Marina Lloberes  Salvatella, Difda Monterde, Aina Peris Morant, Lourdes Puiggros Casals, Marta Recasens Potau, Alba Rodríguez, Rita Zaragoza Jove, Bàrbara Soriano Bautista

The annotators of CESS-ECE are:

Núria Bufí Cabrol, Montserrat Civit Torruella, Raquel Hernández Bitinas, Marina Lloberes  Salvatella, Raquel Marcos, Borja Navarro, Bàrbara Soriano Bautista