Parc Científic de Barcelona  
Edifici Florensa c/ d'Adolf Florensa s/n  
08028 Barcelona  

AnCora

AnCora consists of a Catalan corpus and a Spanish corpus, each of them of 500,000 words. The corpora are annotated at different levels:

  1. Morphological categories
  2. Syntactic constituents and functions
  3. Argument structure and thematic roles
  4. Semantic classes of the verb  
  5. Nouns related to WordNet synsets
  6. Named Entities
  7. Coreference

Two verbal lexicons are also available as the result of this annotation process. The Spanish verbal lexicon consists of 2.647 entries and the Catalan lexicon of 2.142. Each verb sense is detailed with the following information: semantic classes, syntactic subcategories, argumental structure and thematic roles.

The AnCora Corpus is mainly based on journalist texts.

Access to AnCora corpus.