IARG-AnCora aims to enrich the Spanish AnCora-ES and AnCora-CA corpora with the annotation of implicit arguments of nominal predicates derived from verbs, i.e. deverbal nominalizations (Taulé et al., 2012).
By implicit argument we mean those arguments syntactically unrealized in the local context of the predicate (verb, noun or adjective) but whose semantic interpretation depends on the linguistic or extralinguistic context. This project focuses on the annotation of the nuclear implicit arguments (arg0, arg1, arg2, arg3, arg4) whose semantic interpretation depends on the linguistic context and that may be related to a discourse entity (Recasens and Martí, 2010).
The task consisted basically of identifying the implicit arguments of nominalizations and assigning them an argument position -iarg0, iarg1, iarg2, iarg3, iarg4- and a thematic role (agent, patient, cause, etc..).
We used the same annotation scheme used for the annotation of the explicit arguments of deverbal nominalizations (Peris and Taulé, 2012), which, in turn, is the same one used for the annotation of the verbal argument structure (Taulé et al., 2008).
The annotation of the corpus with implicit arguments was conducted in two stages:
- First, we developed a model of semantic role labelling based on machine learning techniques ─LIARC (Peris et al., 2013)─, with which the whole corpus was automatically labelled. This model was inferred from an annotated manually training corpus. The training corpus is a selected sample consisting of 469 nominal occurrences from the Spanish corpus.
- Second, we manually reviewed the annotation obtained in the automatic process in order to ensure the quality of the final resource.
The number of nominalizations annotated is 19,275 in AnCora-ES and 10,043 in AnCora-CA.
The annotators of IARG-AnCora were:
Esther Arias, Oriol Borrega, Montserrat Nofre, Aina Peris y Rita Zaragoza
IARG-AnCora (AnCora 3.0.0) corpus can be downloaded here.