Stance and Gender Detection in Tweets
on Catalan Independence@Ibereval 2017
Stance and Gender Detection in Tweets on Catalan Independence will be organised within Ibereval 2017, the 2nd Workshop on Evaluation of Human Language Technologies for Iberian languages, which will be held with SEPLN 2017 in Murcia, Spain, at Facultad de Letras on September 19th, 2017.
Introduction and motivation
The aim of this task is detecting the author's gender and stance with respect to the target "independence of Catalonia" in tweets written in Spanish and/or Catalan.
Classical sentiment analysis tasks carried out in recent years in evaluation campaigns for different languages have mostly involved the detection of the subjectivity and polarity of microblogs at the message level, i.e. determining whether a tweet is subjective or not, and, if subjective, determining its positive or negative semantic orientation. However, comments and opinions are usually directed towards a specific target or aspect of interest, and therefore give rise to finer-grained tasks such as stance detection, where the focus is on detecting what particular stance (in favor, against or neutral) a user takes with respect to a specific target.
Stance detection is related to sentiment analysis, but there are some significant differences, as is stressed in Mohammad et al (2016a) and Mohammad et al (2016b):
In sentiment analysis, the systems detect whether the sentiment polarity of a text is positive, negative or neutral.
- In stance detection, the systems detect whether the author is favorable or unfavorable to a given target, which may or may not be explicitly mentioned in the text.
Stance detection is particularly interesting for studying political debates in which the topic is controversial. Therefore, for this task we have chosen to focus on a specific political target: the independence of Catalonia (Bosco et al 2016). The stance detection task is also related to a textual inference task due to the fact that the position of the tweeter is often expressed implicitly, therefore, the stance has to be inferred in many cases. See, for instance, the following tweet:
Target: Catalan Independence
Tweet: Avui #27S2015 tot està per fer... Un nou país és possible ||*|| A les urnes... #27S http://t.co/ls2nkRWt2b
(‘Today #27S2015 the future is ours to make… A new country is possible ||*|| Get out and vote … #27S http://t.co/ls2nkRWt2b’, where ||*|| stands for the Catalan Independence flag)..
Stance detection and author profiling tasks on microblogging texts are currently carried out in several evaluation forums, including SemEval-2016 (Task-6) (Mohammad et al., 2016a) and PAN@CLEF (Rangel et al., 2016). However, these two tasks have never been performed together for Spanish and Catalan as part of one single task. The results obtained will be of interest not only for sentiment analysis but also for author profiling and for socio-political studies.
The task is open to everyone from academia and industry.
The aim of this task is to detect author's stance and gender in Twitter messages written in Catalan and Spanish.
Stance Detection: Given a message, decide the stance taken towards the target "Catalan Indepencence".
The possible stance labels are: FAVOR, AGAINST and NONE:
- FAVOR: positive stance towards the independence of Catalonia. Example:
Tweet: "He ido a votar tan sobrado que cuando me han devuelto el DNI les he dicho que ya se lo podían quedar. #27S"
(‘When I went to vote I was so sure of the result that I told them that they could keep my (Spanish) ID card. #27S’)
- AGAINST: negative stance towards the independence of Catalonia. Example:
Tweet: "En el día de hoy #27S sólo me sale del alma gritar ¡¡VIVA ESPAÑA! ! http://t.co/w9Bmsf4TUK"
(‘Today #27S the only thing that my heart tells meto do is to shout ¡¡VIVA ESPAÑA!!http://t.co/w9Bmsf4TUK’)
- NONE: neutral stance towards the independence of Catalonia and cases in which the stance cannot be inferred. Example:
Tweet: "100% escrutat a Arbúcies #27s http://t.co/avMzng6iyV"
(‘100% of votes counted in Arbúcies #27s http://t.co/avMzng6iyV’)
Identification of gender: Given a message, determine its author's gender.
The possible gender labels are: FEMALE and MALE.
In the following examples of tweets llabelled for both author's stance and gender in both languages:
Tweet: "15 diplomàtics internacional observen les plebiscitàries, serà que interessen a tothom menys a Espanya #27S"
(‘15 international diplomats observe the plebiscite, perhaps it is of interest to everybody except to Spain#27S’)
Tweet: "#27S Brutal! #JunstPelSi no cree que haya independencia. Solo busca forzar una negociación. Escúchalo antes de votar https://t.co/OBL1LaAm0S"
(‘#27S Incredible! #JunstPelSi doesn’t believe in the possibility of Independence. Theyonly want to get a better negotiating position.Listen before voting https://t.co/OBL1LaAm0S’)
Tweet: "#27S ¿cuál fue la diferencia en 2012 entre los resultados de la encuesta de TV3 y resultados finales? Nos serviría para hacernos una idea"
(‘In 2012, what was the difference between the results of the TV3 poll and the final results? That would give us an idea…’)
The dataset will include short documents taken from Twitter on the debate in Catalonia (Spain) collected during the regional elections in September 2015, which have been interpreted by many political actors and citizens as a de facto referendum on the possible independence of Catalonia from Spain.
A detailed description of data (annotation scheme applied, data format, etc.) will be soon available in the task guidelines. The development and test dataset will be released in compliance with Twitter policies.
Each participating team will initially have access only to the training data. Later, the unlabelled test data will be released (see the timeframe below). After the assessment, the labels for the test data will also be released.
The evaluation will be performed according to standard metrics. Details on evaluation metrics to be applied for the evaluation of the participant results will be published in the Task guidelines.
How to participate
Information about the submission of results and their format will be available in the Task guidelines.
We invite the potential participants to subscribe to our mailing list in order to be kept up to date with the latest news related to the task. Please share comments and questions with the mailing list. The organizers will assist you for any potential issues that could be raised.
Participants will be required to provide an abstract and a technical report including a brief description of their approach, an illustration of their experiments, in particular techniques and resources used, and an analysis of their results for the publication in the Proceedings of task.
Papers must be submitted in PDF format (details will be published later). Submission of abstracts and technical reports is to be done electronically through the Easychair system.
- 20th March 2017: training data available to participants
- 24th April 2017: test data available to participants
- 8 May 2017: system results due to organizers
- 15thMay 2017: assessment returned to participants
- 29th May 2017: working notes/papers submission
- 12nd June 2017: Working notes/papers reviewed (peer-reviewed)
- 26th June 2017: camera ready papers due to the organizers
- 19th September 2017: IBEREVAL@SEPLN 2017 Workshop
C. Bosco, M. Lai, V. Patti, F. Rangel, P. Rosso (2016) Tweeting in the Debate about Catalan Elections. In: Proc. LREC workshop on Emotion and Sentiment Analysis Workshop (ESA), LREC-2016, Portorož, Slovenia, May 23-28, pp. 67-70.
S. M. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, C. Cherry (2016a). Semeval-2016 task 6: Detecting stance in tweets, Proceedings of the International Workshop on Semantic Evaluation, SemEval-2016.
S. M. Mohammad, P. Sobhani, S. Kiritchenko (2016b) Stance and Sentiment in Tweets. CoRR abs/1605.01655
F. Rangel, P. Rosso, B. Verhoeven, W. Daelemans, M. Potthast, B. Stein (2016) Overview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations. In: Balog K., Cappellato L., Ferro N., Macdonald C. (Eds.) CLEF 2016 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1609, pp. 750-784.