title: Data Sets layout: widepage excerpt: “” tags: [] share: false image: feature: background.jpg credit: #WeGraphics creditlink: #http://wegraphics.net/downloads/free-ultimate-blurred-background-pack/

Emotion Classification Corpora

Appraisal enISEAR: A reannotation of the enISEAR corpus with Cognitive Appraisal (2020)

We reannotate the enISEAR corpus with cognitive appraisal dimensions following the Smith/Ellsworth model. The corpus consists of 1001 English event descriptions, annotated with the emotion the event has been described for and the appraisal dimensions of pleasantness, insecurity, self- and situational control, attention, and effort.

Authors: Jan Hofmann, Enrica Troiano, Roman Klinger

Data source: Self reports
Annotation procedure: Postannotation
Paper
Download

deISEAR, enISEAR: Self-reports of events associated with given emotions (2019)</summary>

deISEAR and enISEAR are a German and an English corpus created in the spirit of the original ISEAR data set, but via crowdsourcing in a two-step procedure, to ensure quality. The corpora consist of 1001 event descriptions which are associated with a predefined emotion.

Authors: Enrica Troiano, Sebastian Pado, Roman Klinger

Data source: Self reports
Annotation procedure: Crowdsourcing
Paper
Download
Alternative location

</details>

Unified Emotions (2018)</summary>

Several emotion corpora exist nowadays, many in different file formats and with different label sets. We aggregated these corpora with an automatic download and conversion pipeline such that these resources are easier to be used and compared.

Authors: Laura Bostan, Roman Klinger

Data source: Different existing corpora
Annotation procedure: diverse
Paper
Download
Alternative location

</details>

Implicit Emotions Shared Task (2018)</summary>

For this shared task which took place with WASSA 2018, we collected data to have similar properties as the ISEAR data, but via distant supervision on Twitter. These data therefore mainly consist of event description without the explicit mention of an emotion word.

The test data is freely available. Contact me for a password to directly access the training data.

Authors: Roman Klinger, Saif Mohammad, Alexandra Balahur, Veronique Hoste, Orphee de Clercq

Data source: Twitter
Annotation procedure: weak/distant
Paper
Download

</details>

SSEC Corpus: Annotation of SemEval 2016 Stance Sentiment Corpus with Emotions (2018)</summary>

We reannotated the existing SemEval 2016 corpus, a resource already labeled with stances and sentiment, with emotions in a multiclass setting. This enables comparisons of these annotation layers. We publish all annotations of all annotators.

Authors: Hendrik Schuff, Jeremy Barnes, Julian Mohme, Sebastian Pado, Roman Klinger

Data source: Tweets from SemEval Stance Corpus 2016
Annotation procedure: Experts
Paper
Direct Download
More information
Alternative location

</details>

Emotion Analysis from Text and Images (2017)</summary>

Emotion analysis in social media might need to consider images together with the text which refers to them, for instance on Twitter. For analyzing this complementarity, we collected a corpus of Tweets which contain images. It is automatically labeled based on hashtags. We only provide Tweet-IDs. If you need help with downloading the corresponding data via the Twitter API, contact us.

Author: Roman Klinger

Data source: Twitter
Annotation procedure: Distant labeling
Paper
Downloads:
Alternative location

</details>

Relational Emotion Corpora

Emotion relation corpus for the recognition of emotional relations of characters in fan fiction (2019)</summary>

Semantic role labeling of emotion events is a challenging task. In this corpus, we simplify this to a binary relation extraction task, in which character mention pairs are labeled with directed emotional relations between them, i.e., a character is either an emotion experiencer or the cause of an emotion.

Authors: Evgeny Kim, Roman Klinger

Data source: Fan Fiction
Annotation procedure: Expert Annotation
Paper
Download
Alternative location

</details>

Emotion Communication Channels (2019)</summary>

The author of fictional texts can decide to let the character of a story to express in emotions in different ways, for instance by facial expressions, body movements, voice. With this corpus, we provide a resource in which we annotated these communication channels. This corpus is an extension of the emotion relation corpus mentioned above.

Authors: Evgeny Kim, Roman Klinger

Data source: Fan Fiction
Annotation procedure: Expert Annotation
Paper
Download
Alternative location

</details>

REMAN and GoodNewsEveryone: Emotion Corpora for Semantic Roles of Emotion Events (2019)</summary>

Emotions are commonly expressed in context of a mention of an experiencer (which can be the author of a text), with specific trigger words, and can describe the target and the stimulus of the emotion. We publish two corpora with such annotations, one of literature from Project Gutenberg and one of news headlines (additionally annotated with the reader perspective of emotions).

Authors: Laura Bostan, Evgeny Kim, Roman Klinger

Corpus 1: REMAN

Data source: Literature
Annotation procedure: Expert Annotation
Paper
Download
Alternative location

Corpus 2: GoodNewsEveryone

Data source: News headlines
Annotation procedure: Expert Annotation
Paper
Download

</details>

Resources and Dictionaries for Emotion Analysis

IMS Participation in EmoInt 2018</summary>

We participated in the shared task on emotion intensity prediction at WASSA in 2018 and scored second. Our model and results consist of a comparably standard neural architecture informed with different dictionaries of emotions, abstractness, concreteness, valence, arousal. We make all these resources and our implementation available.

Authors: Maximilian Koeper, Evgeny Kim, Roman Klinger

Data source: EmoInt data set, automatically generated dictionaries
Annotation procedure: Automatic
Paper
Resource download
Code
Alternative location

</details>

German Emotion Dictionaries created for the Analysis of Franz Kafka’s Texts (2016)</summary>

We manually created German dictionaries for emotion analysis in Kafka’s Schloss and Amerika. These dictionaries are more specific than general dictionaries and might perform worse on other texts, however, they might be a good starting point for related text analyses.

Authors: Roman Klinger, Surayya Samat Suliya

Data source: Manually collected words from Franz Kafka’s texts
Procedure: Manual
Paper
Downlaod
More information

</details>

Irony, Sarcasm and Satire

Twitter Corpus to compare irony to sarcasm (2016)</summary>

The concepts of irony and sarcasm are often used interchangeably, though they are not the same. With this corpus (and paper), we analze if a difference between these concepts can empirically be found on Twitter. We publish the Tweets themselves, together with meta information.

Authors: Jennifer Ling, Roman Klinger

Data source: social media
Annotation procedure: Distant labeling
Paper
Download
More information

</details>

German Satire Detection Corpus (2019)</summary>

We publish the first German corpus for satire detection. It is also the first corpus available with the information from which source an article came which enables training models with adversarial methods to not overfit to such confounding variables.

Authors: Robert McHardy, Heike Adel, Roman Klinger

Data source: Regular and satirical news
Annotation procedure: Distant labeling
Paper
Download
Alternative location

</details>

Resources for Sentiment Analysis and Opinion Mining

SCARE: German Corpus for Aspect-based Sentiment Analysis in App-Reviews (2016)</summary>

There are not many resources for aspect-based sentiment analysis in German. We contribute a corpus of Google Play reviews annotated with subjective phrases, aspects, and their relation.

Authors: Mario Saenger, Roman Klinger

Data source: Google Play Reviews
Annotation procedure: Experts
Paper
Download

</details>

USAGE: German and English Corpora for Aspect-based Sentiment Analysis in Product Reviews (2014)</summary>

There are not many resources for aspect-based sentiment analysis in German. We contribute a corpus of Amazon reviews annotated with subjective phrases, aspects, and their relation.

Authors: Roman Klinger

Data source: Amazon reviews
Annotation procedure: Experts
Paper
Download

</details>

Resources for Biomedical and Chemical Text Mining

Corpus and resources for the detection of miRNA mentions in scientific text (2014)</summary>

Authors: Shweta Bagewadi, Tamara Bobic, Martin Hofmann-Apitius, Juliane Fluck, Roman Klinger

</details>

Weakly labeled corpus for protein-protein and drug-drug interactions (2012)</summary>

Authors: Philippe Thomas, Tamara Bobic, Martin Hofmann-Apitius, Ulf Leser, Roman Klinger

</details>

Corpus for testing normalization of variation mentions (2011)</summary>

Authors: Philippe E Thomas, Roman Klinger, Laura I Furlong, Martin Hofmann-Apitius, Christoph Friedrich

</details>

Corpus of Medline abstracts annotated with chemical entities (2008)</summary>

Authors: Corinna Kolarik, Roman Klinger, Christoph M. Friedrich, Martin Hofmann-Apitius, and Juliane Fluck

</details>

Corpus of Medline abstracts annotated with IUPAC entities (2009)</summary>

Authors: Roman Klinger, Corinna Kolářik, Juliane Fluck, Martin Hofmann-Apitius, Christoph M. Friedrich

</details>

Other Resources

Obituary Corpus Annotated for Logical Zones (2020)</summary>

Authors: Valentino Sabbatino, Laura Bostan, Roman Klinger

Paper (accepted for LREC 2020)
Alternative location
Data Download
- You need a password for to access the data. Send us a mail clearly stating that you do not redistribute the data and that you will only use it for research.

</details>