Roman Klinger’s Homepage

Conference Report: KONVENS 2024

2024-10-12T12:01:00+02:00

KONVENS

In September 2024, I participated in the KONVENS – the “Konferenz zur Verarbeitung natürlicher Sprache” (Conference on Natural Language Processing) in Vienna.

KONVENS is the computational linguistics and natural language processing conference in the German speaking countries. Various countries have such more local CL/NLP conferences, complementing the large and global conferences by the ACL, the COLING, or LREC, which have different foci, but are always very international. (there are also many other venues, like machine learning, language models, and AI focused events, but given that KONVENS is CL/NLP, I only contrast it to this field here).

Other examples for established more regional conferences are the NoDaLiDa (Nordic Conference on Computational Linguistics), CliC-it (Italy), or the CLIN (Netherlands).

You may ask: Why would I go to such a regional conference? (and by the way, all of these conferences are international these days, and the language spoken there is English, but the focus is a bit more regional)

I think there are a couple of reasons:

There are papers that fit better to KONVENS than to larger, global venues. In NLP, we mostly publish at conferences. Regional conferences also publish proceedings as larger venues do, which typically also go into the ACL Anthology the main paper repository in the field (and all open access). The reputation of these regional conferences is lower than EMNLP or ACL, but, as with focus workshops, there are papers which find a more interested audience here. For example, if you work on the German language, it’s more likely that you find German speaking people at KONVENS.
You don’t need travel so far. Sure, UAE or Miami might be nice for some, but for others, traveling there is not an option. Be it visa issues or are not feeling comfortable with the legal situation in a place (some readers might find this a euphemism, it can be pretty bad for some people in some countries), or they are hesitant to travel far, by plane.
Sometimes there is no funding available to go to a distant conference. With KONVENS and other regional venues, also papers that have been written based on, for instance, Master’s thesis, where the main author might not have an affiliation, could be published.
Networking. It’s so much easier to enter a new field in smaller conferences than in bigger ones, and you meet people who are typically geographically closer to you. This makes it easier to collaborate, based on discussions that may take place at the conference. Networking is the main reason I participate in these conferences.

KONVENS 2024

KONVENS 2024 took place in Vienna, and has been organized not only by the German Society for Computational Linguistics (GSCL) but jointly with the Austrian Research Institute for Artificial Intelligence (OFAI) and the Austrian Society for Artificial Intelligence (ASAI). The main local organizer has been the University of Vienna.

The conference received 57 submissions and accepted 39 papers. During the conference, there were 30 poster presentations and 9 oral presentations. Most papers came from Germany (70), Austria (20), and Switzerland (14). Authors from other countries contributed 7 more papers. In addition, there were three invited talks (Leonie Weissweiler who is a postdoc at UT Austin; Sebastian Schuster from UC London; Jana Diesner from TU Munich). The conference was complemented by a set of (partially as large as the main conference) workshops: GermEval Shared Task 2: Statement Segmentation in German Easy Language (StaGE), Workshop on Linguistic Insights from and for Multimodal Language Processing (LIMO), and Workshop on Computational Linguistics for Political Text Analysis (CPSS), and GERMS-DETECT Sexism Detection in German Online News Fora (GERMS-DETECT).

My Favorite Contributions

All of the invited talks were awesome. I’d like to point out the presentation by Sebastian Schuster (because I found it most relevant for my own work), who explained limitations of large language models based on inference tasks that are easy for humans and difficult for machines. The main paper his talk was based on is Kim and Schuster (2023), which also won a best paper award at the recent ACL in Toronto 2023. The task is to follow a description how entities are moved from one box to another, and the model needs to say in which box which entity is.

The whole proceedings of KONVENS are available in the ACL Anthology.

Under the assumption that you might be reading this because you have similar research interests as I do, I’d like to point out papers, that I personally found particularly interesting and relevant (for my work).

Hellwig et al. (2024) report on a German restaurant review dataset, annotated for aspect-based sentiment analysis. There are a couple of German sentiment corpora (for instance our own corpora USAGE Klinger and Cimiano (2014) and SCARE Sänger et al. (2016)), but in contrast to English, there is not a lot, and the restaurant domain did, as far as I know, not receive any attention yet. The resource consists of more than 3000 manually annotated reviews.
Language models are often used for text classification now, and offer themselves as a training data efficient method, via prompting. Kluge and Kähler (2024) present experiments on indexing medical book titles via prompting. The authors work German National Library, so I assume that this paper reports not only on a purely academic work, but on something that has practical relevance for their direct environment. Subject indexing is an interesting and challenging task, sometimes considered extreme classification, because you need to decide for many labels which are fitting. While the paper does not provide statistics on the inventory of possible labels used here, I assume that the set is large.
Petersen-Frey and Biemann (2024) present a method on quotation and attribution – the task is to detect speech in written text and attribute it to the speaker (“Roman said ‘this is true’”). We worked on speaker and quotation identification a while ago (Scheible, Klinger, and Padó (2016)) and my former collaborators continued to contribute to the topic (e.g., Papay and Padó (2020)). Petersen-Frey and Biemann (2024) approach the task in a structured prediction framework.
While a lot of efforts go into mitigating gender bias in representations (see Sun et al. (2019) for a survey), Gross et al. (2024) take a different approach: they induce gender bias in language models to then be able to study the effects in a controlled environment.
With the increasing popularity of populist parties, some research goes into analysing the language of populists in contrast to other political parties. While we know that populists use particular rhethoric strategies (to convince people without having actually good arguments) more frequently than other parties, there is not too much work on the language complexity. Zanotto, Frassinelli, and Butt (2024) investigate the hypothesis that populists use simpler language (for instance to have a larger outreach). They do, however, not find any significant effects, but confirm the more frequent use of persuasion tactics.

Awards

I cannot write this blog post without mentioning that my Ph.D. student Enrica Troiano won the award of the GSCL for the best thesis in the years 2023. Her thesis is on bringing event analysis and emotion analysis together. In contrast to the various papers we wrote, is really a nice aggregation of the work, and worth reading (Troiano (2023)).

Venue and Place

The conference took place in Vienna - a city I should have visited more often already. Now, from my new workplace in Bamberg, this is reachable with a short train trip (4 hours from Nurnberg). Of course, I brought my bicycle, so I could commute from the hotel to the conference venue by bike. Unfortunately, there was a storm and rain warning, so towards the end, cycling around became a bit challenging. Actually, when I traveled back, I took one of the last three trains that made it to Germany, before the track was shut down for a couple of days. Read more about this storm here.

The conference itself took place at the University of Vienna, in a pretty modern lecture hall. The poster sessions were right in the lobby, so no long commutes between places for various parts of the program.

The social event was a small walk through the vineyards and dinner in a beergarden. I prefer more vegetarian-friendly places and non-seated dinners at conferences, but the place was very nice.

Bibliography

Gross, Stephanie, Brigitte Krenn, Craig Lincoln, and Lena Holzwarth. 2024. “Analysing Effects of Inducing Gender Bias in Language Models.” In Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024), edited by Pedro Henrique Luz de Araujo, Andreas Baumann, Dagmar Gromann, Brigitte Krenn, Benjamin Roth, and Michael Wiegand, 222–30. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2024.konvens-main.24.

Hellwig, Nils Constantin, Jakob Fehle, Markus Bink, and Christian Wolff. 2024. “GERestaurant: A German Dataset of Annotated Restaurant Reviews for Aspect-Based Sentiment Analysis.” In Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024), edited by Pedro Henrique Luz de Araujo, Andreas Baumann, Dagmar Gromann, Brigitte Krenn, Benjamin Roth, and Michael Wiegand, 123–33. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2024.konvens-main.14.

Kim, Najoung, and Sebastian Schuster. 2023. “Entity Tracking in Language Models.” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, 3835–55. Toronto, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.213.

Klinger, Roman, and Philipp Cimiano. 2014. “The USAGE Review Corpus for Fine Grained Multi Lingual Opinion Analysis.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 2211–18. Reykjavik, Iceland: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2014/pdf/85_Paper.pdf.

Kluge, Lisa, and Maximilian Kähler. 2024. “Few-Shot Prompting for Subject Indexing of German Medical Book Titles.” In Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024), edited by Pedro Henrique Luz de Araujo, Andreas Baumann, Dagmar Gromann, Brigitte Krenn, Benjamin Roth, and Michael Wiegand, 141–48. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2024.konvens-main.16.

Papay, Sean, and Sebastian Padó. 2020. “RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text.” In Proceedings of the Twelfth Language Resources and Evaluation Conference, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, et al., 835–41. Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.104.

Petersen-Frey, Fynn, and Chris Biemann. 2024. “Fine-Grained Quotation Detection and Attribution in German News Articles.” In Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024), edited by Pedro Henrique Luz de Araujo, Andreas Baumann, Dagmar Gromann, Brigitte Krenn, Benjamin Roth, and Michael Wiegand, 196–208. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2024.konvens-main.22.

Sänger, Mario, Ulf Leser, Steffen Kemmerer, Peter Adolphs, and Roman Klinger. 2016. “SCARE ― the Sentiment Corpus of App Reviews with Fine-Grained Annotations in German.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, et al., 1114–21. Portorož, Slovenia: European Language Resources Association (ELRA). https://aclanthology.org/L16-1178.

Scheible, Christian, Roman Klinger, and Sebastian Padó. 2016. “Model Architectures for Quotation Detection.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Katrin Erk and Noah A. Smith, 1736–45. Berlin, Germany: Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1164.

Sun, Tony, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. 2019. “Mitigating Gender Bias in Natural Language Processing: Literature Review.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, edited by Anna Korhonen, David Traum, and Lluı́s Màrquez, 1630–40. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1159.

Troiano, Enrica. 2023. “Where Are Emotions in Text? A Human-Based and Computational Investigation of Emotion Recognition and Generation.” PhD thesis, University of Stuttgart. https://elib.uni-stuttgart.de/handle/11682/13671.

Zanotto, Sergio E., Diego Frassinelli, and Miriam Butt. 2024. “Language Complexity in Populist Rhetoric.” In Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and Short Papers, edited by Christopher Klamm, Gabriella Lapesa, Simone Paolo Ponzetto, Ines Rehbein, and Indira Sen, 61–80. Vienna, Austria: Association for Computational Linguistics. https://aclanthology.org/2024.cpss-1.5.

[Download this post as PDF]

Conference Report: LREC-COLING 2024

2024-05-26T00:01:00+02:00

End of May 2024, I participated in the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING) in Turin. Both COLING and LREC enrich the landscape of competitive conferences to publish in natural language processing and computational linguistics. While ACL, EMNLP, NAACL and EACL have a tendendency to aim at focusing on accepting high impact papers, also by keeping the acceptance rate low (~25%), both COLING and LREC are traditionally more inclusive. COLING and LREC recently had acceptance rates around 28% and 65%, respectively. While COLING also has been a bit higher in the past, these numbers are generally pretty typical for these venues.

The conferences LREC and COLING happened together this year, and the general chairs explained this to be a one time event to reschedule LREC to happen every even year and COLING every odd year, while both so far took place in even years. Joining these two conferences was interesting for authors who submitted, because it was not really clear what to expect. Also the organizers seem to have been surprised by the numbers of submitted papers.

Overall, there have been 3,471 submissions, with 1,554 acceptances. Out of those 275 were presented in talks, 837 as poster, and 442 remotely. Therefore, the acceptance rate was 44%. It’s difficult to say, but it might be that tracks that were more LREC-like had a higher acceptance rate and more COLING-style tracks had a lower one. I suspect that this is the case because the track with most acceptances is in the track “Corpora and Annotation”. LREC’s idea has always been to optimize for high recall here, given that resources may have an impact in low-resource languages without showing a high impact overall across the community. However, I’d like to note that LREC only started in 2020 to review its papers! Until 2018, extended abstracts were submitted and reviewed, and accepted abstracts were invited to submit a full paper, which did not get reviewed again. I am quite happy that this has been changed. The overall quality of papers, posters, and presentations has been comparable to other conferences, but before 2018, I’ve seen a couple of presentations at LREC where a review of the full paper might have had the chance to improve the quality of the work.

In the opening session, the program chairs also shared information on the countries from where most papers came (ranked list: China, USA, German, France, Japan, UK, Rep. of Korea, Spain, Italy, India). While China is, since a couple of years, having more and more papers in the NLP venues, I was a bit surprised to see quite many papers from Korea, which I think I did not before. Maybe the reason is that COLING 2022 was in Korea and made the conference more popular in this part of the world. It’s been quite interesting to also see many papers who worked on Korean. There were also some differences between countries in the acceptance rates, but I am not sure if these are just artefacts, so I don’t want to republish the overall highest numbers of acceptance rates (because the numbers of submissions were low in these places). The overall number of submissions is also roughly mirrored in the numbers of participants: China (472), USA (313), Germany (286), Italy (237), France (221), UK (143), Japan (141), Korea (91). I was also very happy to see that there were 89 scholars from Ukraine.

Overall, the conference felt very much like COLING and LREC together - one could clearly see the origin of this joint conference, and I liked this a lot.

As usual in our field, most papers were presented as posters, and LREC-COLING made no difference regarding the difference between the quality of orally presented papers and posters: there is none. Therefore, posters are often much more interactive than presentations, and its great to have discussions. I still like to go to presentations, particularly for topics where I am not an expert. For me, oral presentations are better to learn something new I don’t know a lot about. I don’t feel comfortable with asking a poster presenter for very basic introductions while they want to talk about their most recent research.

Tutorials

For the first time in my life, I’ve been asked to be a tutorial co-chair. I did act as a senior area chair a couple of times, but that’s a more guided process. I was very happy to do that together with Naoaki Okazaki who had experience already as a tutorial chair. Without him, I would not have been able to do this job, I learned a lot from him.

Due to his experience, nearly everything went very well with the tutorials, as far as I can say. We did select a good set of tutorials who attracted people from various areas. We received 20 submissions from which we selected 13 to be taught at the conference. Out of those three were introductory (one to an adjacent topic), and the majority presented cutting-edge topics. Unsurprisingly, a popular topic is large-language models, which are covered by multiple tutorials with varying perspectives on multimodality, evaluation, knowledge editing and control, hallucination, and bias. Other tutorials cover argument mining, semantic web, dialogue systems, semantic parsing, inclusion in NLP systems, and applications in chemistry. You can find the tutorial summaries at Klinger et al. (2024). I did attend two tutorials, one on knowledge editing and one on recognizing and mitigating hallucinations.

Only one thing did not go well: For one tutorial, the presenters did not come on site but presented entirely virtual; something that we did not intend. We believe that we communicated that, for each tutorial, at least one presenter needed to be on site. It’s currently not clear to me what the reason for his presumable misunderstanding is, but for future tutorials, I would suggest to ensure already at submission time to have people tick a box that they will come to the conference, if the proposal is accepted. Also, it might be a good idea to check with the local organizers if the presenters actually registered to the conference early enough.

Overall, if you participated in the tutorials as a teacher or participant, let me know if you have any feedback. LREC-COLING will compile a summary document to be handed over to the next organizing team and I would make sure to pass along any constructive feedback.

Own Contributions from Bamberg and Stuttgart

Stuttgart was very well presented in Turin, as usual, but as this was the first time for me to be at a conference with my Bamberg affiliation, I will focus on mentioning the contributions that came from Bamberg.

We had two papers in which my group was involved:

Velutharambath, Wührl, and Klinger (2024) presents our Defabel corpus in which we asked people to argue for a given statement (“Convince me that camels store water in the hump.”) Depending on their own belief, we labeled the argument as deceptive or not. By doing so, we have a corpus in which deceptive arguments and “honest” arguments were created for the same statements. Our intend was to disentangle fact-checking and deception detection.
Wemmer, Labat, and Klinger (2024) describes the corpus creation of customer agent and dream corpora annotated with cumulative emotion labels. Most emotion corpora are either for the whole text, for isolated sentences, or for sentences in context. We compiled a corpus with annotations in which the raters only had access to the prior context, which is the realistic setting how we also read text or talk to other people – we cannot look into the future!

I was happy to also see another contribution from Bamberg, namely from the group of Andreas Henrich:

Fruth, Jegan, and Henrich (2024) discuss in their paper published at the “Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context” a reinforcement learning based approach to German text simplification. Noteworthy is that they also tackle hallucination to some degree, namely by checking of any named entities are included that have not been in the original non-simplified text.

My Favorite Contributions

I found quite a set of talks and papers very interesting. This only reflects my personal opinion, and that I do not mention a particular paper probably only means that I did not have the time to see its presentation. There are many interesting papers in the proceedings, I did not go through all of them yet.

Invited Talks

Before I mention my favorite papers, I’d like to say something about the invited talks. There were three of them with quite different foci. I’ll mention the two here that I found most interesting.

Roger Levy talked about mistakes that humans and language models do; in the same way or in different ways. He also gave possible explanations for both humans and models. His talk was full of interesting text completion examples, for instance “The children went outside to…” or “The squirrel stored some nuts in the…” – where in the latter case apparently many people answer “tree”.

Michele Loporcaro talked about differences in the dialect in italy. I found this inspiring, not only because there was barely anything in the talk that I knew before (not a linguist…), but also because it gave me an interesting example for linguistic research to which I am not often enough exposed yet.

Papers

I found the following papers particularly interesting. I selected these papers based on my own interests. Given that you read this here on my blog there is some chance that you share some research interests with me and hopefully find my selection useful. Still, I want to point out that it’s absolutely not a negative opinion statement if I did not include a paper here, despite it being related to my interests. I probably just missed it.

Biomedical and Health NLP:

Raithel et al. (2024) create a pharmacovigilance corpus across multiple languages. It is annotated with drug names, changes in medication, and side effects, as well as causal relations. Interestingly, the baseline experiments also include cross-lingual experiments (training on multiple languages and testing in a zero-shot setting on another one). The performance scores are similar to monolingual experiments, sometimes even higher. The paper might have some overlap to our BEAR corpus, but we had a different focus, namely the goal to develop an entity and argumentative claim resource (Wührl and Klinger (2022),Wuehrl et al. (2024)).
Giannouris et al. (2024) describe an approach in the Determit workshop to automatically summarize clinical trial reports in plain language. I’ve been interested in biomedical text summarization for laypeople for a while, and our FIBISS project has also been motivated with such challenges in mind. They contribute an interesting and valuable resource on the topic.

Ensembles/Aggregations of annotators:

Basile, Franco-Salvador, and Rosso (2024) is a bit of a demo paper. They present a Bayesian approach to annotator aggregation with an integration of the STAN language to specify directed probabilistic models. First of all, I was quite happy to see some probabilistic graphical model work at the conference, and secondly, this really looks like a useful approach. We’ll definitely have a look!
Flor Miriam Plaza del Arco presented work in the NLPerspectives workshop in which she and her colleagues showed that the annotation-aggregation method MACE can be used to build ensembles of language models which are better than simpler aggregation methods, like majority vote (Plaza-del-Arco, Nozza, and Hovy (2024)). That’s interesting because LLMs are not really diverse as humans are, but the method still works for aggregation Instruction-tuned LMs show specialization in different tasks. We talked potential future work in which the components of the ensemble could be conditioned on personas to explicitly make the ensemble more diverse as humans are in annotation tasks. I am curious to see how this goes!

Corpus collection and Analysis:

Jin et al. (2024) report on their study on bragging in social media. They find that rich males brag more about their leisure time while low income people focus more on the self. Very interesting analysis. I am wondering if these results could impact general social media happiness analysis.
Dick et al. (2024) describe their collected corpus of various ways to formulate in a gender-inclusive manner in German. They include comparably known cases like “Arbeiter:innen”, but also nominalized particles (Lehrende) and abstract nouns (Lehrkraefte). They collected these latter cases pretty much manually if I understood their work correctly. I think that their corpus would be an interesting resource to build an automatic system that can find unknown and rare cases of such inclusive language. I sometimes feel a bit challenged to always formulate gender-inclusive, and I’d like to learn from other people how they do that in rare, less established cases than “Studierende”.
Fiorentini, Forlano, and Nese (2024) create a corpus of italian WhatsApp messages. An interesing approach: The authors collected their own whatsapp messages, including voice messages and asked the interlocutors for consent. The resources seems not to be available yet, but I am super curious. I remember that there has been a paper on trust in social media platforms some while ago and this resource might be an interesting opportunity to study such effects computationally (Nemec Zlatolas et al. (2019)).
Troiano and Vossen (2024) created the CLAUSE-ATLAS corpus. They aim at a full (clause-level) annotation of events and emotions in some books. This can of course not be done with reasonable costs manually. They therefore only annotate beginnings of chapter manually and the rest automatically and analyze the agreement between human annotators and a large language model. They find that the agreement is comparable.
Maladry et al. (2024) build, to best of my knowledge, the first irony-labeled corpus in which annotators were asked for their confidence that the text is actually ironic. They formulated the labels as a rating scale. Interestingly, automatic systems are better with predicting irony on the instances in which humans were confident. That result is in line with our findings for emotion analysis a couple of years ago (Troiano, Padó, and Klinger (2021)), where we also showed that humans can quite well predict the inter-annotator agreement for the instances they annotated.

Arguments and News:

Feger and Dietze (2024) build an argument corpus in which the discussion is kept as a tree. They label Arguments as Statements and Reasons and Non-arguments as Notification or None. I think it might be nice to also see persuasiveness labels for the arguments in comparison to each other in each tree.
Song and Wang (2024) build an automatic system to persuade people, here in a specific context, namely to make donations. Their system is a chatbot that can automatically recognize which persuasion strategy might be most promising. They consider “credibility appeal”, “foot-in-the-door”, and “emotional appeal”. Again, that’s super relevant for our EMCONA project and we’ll consider to use this for our work.
Pu et al. (2024) built a system to automatically generate news reports out of scientific texts. Their idea is similar to our analysis in our recent work in Wuehrl et al. (2024). Its impressive that they were able to automatize such complex task! I’d be curious to understand if their automatic system makes the same changes to the text and scientific claims that we found (making the articles more sensational, or simplifying correlation reports to causations).
Nejadgholi et al. (2024) create counter-stereotype statements. I put this in this category of argument mining, because I’d consider counter-stereotype statements as attempts to convince the dialogue partner to change a stance. The work is nicely grounded in categories of stereotypes, like counter-facts and broadening universals. I am also here wondering if a convincingness study would make sense.
Kalashnikova, Vasilescu, and Devillers (2024) describe a wizard-of-oz study in which they nudge people carefully to change their opinion or emotion. They compare smart assistance, robots, and humans and … human nudges are most successful.

LLM-Specific things:

Rao et al. (2024) develop a hierarchy of jailbreak attempts to LLMs. I did not too much look into possibilities to trick LLM to do things they are supposed not to do (like to leak training data), and the authors provide a set of possible approaches. It is interesting to see these weaknesses of existing models.
Addlesee, Lemon, and Eshghi (2024) describe a study that showcases possible differences how LLM answer requests to how humans would do that. Particularly, they put incomplete questions into a LLM and check if the behaviour is human-like. My favorite example from the poster was: “What is the zip code of…” and the LLM answers “of Nevada?”.
Pucci, Ranaldi, and Freitas (2024) report on an experiment on the importance of the order of instructions with varying difficulty. Instead of just using answer length or such proxies to assess the difficulty, they rely on the concept of Bloom’s taxonomy (remembering, understanding, applying, creating/evaluating/analyzing) and show that fine-tuning a LLM in order of these categories in increasing difficulty level leads to better results. This paper is a beautiful example that imports knowledge from psychology and humanities into machine learning.

Emotions:

Li, Peng, and Hsu (2024) describe a chat system that can help people to regulate emotions. This is the first work I am aware of that builds on top of emotion regulation theories. Their system learns that guilt can be tackled with curiosity, and fear with admiration. Very impressive example of how an analysis of a data-driven system confirms knowledge that we have from other fields, confirming the learning approach.
Bhaumik et al. (2024) describe a corpus and modeling effort of detecting agendas on social media. What do people intend with a particular post? I find this related to our research interest in the EMCONA project, in which we want to understand how people use emotions to persuade people. However, their work is more general and focuses on agendas that are less explicitly formulated in the annotated task.
Christop (2024) described her effort on building a speech corpus in Polish, labeled with emotions. Her intend is to use it later for text-to-speech systems. The data has been created by asking actors to show specific given emotional states.
Plaza-del-Arco et al. (2024) nicely complements my recent paper on the current state of research on event-centric emotion analysis (Klinger (2023)). The coverage by Flor and her colleagues is much broader than mine, and they particularly point out the subjective nature of emotions needs to be considered more. Further, there is quite a large set of emotion models in psychology that has not been considered yet.
Prochnow et al. (2024) desribe an (automatically generated) data set of idioms with emotion labels.
Cortal (2024) reports on an emotion labeled corpus of dreams. Interestingly, emotions in this dreambank corpus are mostly expressed quite explicitly. The corpus also contains some semantic role annotations, making it one of the few corpora with structured emotion annotations. We also worked on this for a while, with the REMAN and GoodNewsEveryone corpora (Kim and Klinger (2018),Bostan, Kim, and Klinger (2020)), amongst others. It might be interesting to see how literature and news annotations compare to those in dreams, and if emotion role labeling systems could be transferred between these very different domains.

Other things:

Arimoto et al. (2024) develop a long-term chat system to study how a perception of intimacy between the human and the chat agent develops.
Jon and Bojar (2024) use an optimization method to find translations that show a high evaluation score, but are wrong.

Awards

The Best papers of LREC-COLING are:

Bafna et al. (2024): “When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages”
Someya, Yoshida, and Oseki (2024): “Targeted Syntactic Evaluation on the Chomsky Hierarchy”

Venue and Place

The conference took place in Turin, a nice city which is not too touristic. It has an acceptable cycling infrastructure which I used to go from downtown to the conference center every day. The cars seem not to be used to bicycles yet and did not check at all if there is a bike when they turned into an intersection, but the infrastructure was preventing bad incidence. Definitely not a perfect infrastructure, but much better than in Stuttgart, so I enjoyed cycling in Turin a lot.

The conference center was the old Fiat factory Lingotto which now has, next to the conference center, also a mall and a car museum. I am not a car fan, but the test track on the roof was pretty impressive.

The conference center itself was pretty nice (and huge!). The poster sessions were in a separate hall with a lot of space. While the venue has not been as charming as in Iceland (LREC 2014), Marrakesh (LREC 2008), or Santa Fe (COLING 2018), I enjoyed the venue being close to the city.

Altogether, I was very happy with the whole conference, and I am looking forward to the next LREC 2025 and the COLING 2026.

Bibliography

Addlesee, Angus, Oliver Lemon, and Arash Eshghi. 2024. “Clarifying Completions: Evaluating How LLMs Respond to Incomplete Questions.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 3242–49. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.288.

Arimoto, Tsunehiro, Hiroaki Sugiyama, Hiromi Narimatsu, and Masahiro Mizukami. 2024. “Comparison of the Intimacy Process Between Real and Acting-Based Long-Term Text Chats.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 3639–44. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.322.

Bafna, Niyati, Cristina España-Bonet, Josef van Genabith, Benoı̂t Sagot, and Rachel Bawden. 2024. “When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 17544–56. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1526.

Basile, Angelo, Marc Franco-Salvador, and Paolo Rosso. 2024. “PyRater: A Python Toolkit for Annotation Analysis.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 13356–62. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1169.

Bhaumik, Ankita, Ning Sa, Gregorios Katsios, and Tomek Strzalkowski. 2024. “Social Convos: Capturing Agendas and Emotions on Social Media.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 14984–94. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1303.

Bostan, Laura Ana Maria, Evgeny Kim, and Roman Klinger. 2020. “GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception.” In Proceedings of the Twelfth Language Resources and Evaluation Conference, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, et al., 1554–66. Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.194.

Christop, Iwona. 2024. “NEMO: Dataset of Emotional Speech in Polish.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 12111–16. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1059.

Cortal, Gustave. 2024. “Sequence-to-Sequence Language Models for Character and Emotion Detection in Dream Narratives.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 14717–28. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1282.

Dick, Anna-Katharina, Matthias Drews, Valentin Pickard, and Victoria Pierz. 2024. “GIL-GALaD: Gender Inclusive Language - German Auto-Assembled Large Database.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 7740–45. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.684.

Feger, Marc, and Stefan Dietze. 2024. “TACO – Twitter Arguments from COnversations.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 15522–29. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1349.

Fiorentini, Ilaria, Marco Forlano, and Nicholas Nese. 2024. “Towards the WhAP Corpus: A Resource for the Study of Italian on WhatsApp.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 16659–63. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1448.

Fruth, Leon, Robin Jegan, and Andreas Henrich. 2024. “An Approach Towards Unsupervised Text Simplification on Paragraph-Level for German Texts.” In Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, edited by Giorgio Maria Di Nunzio, Federica Vezzani, Liana Ermakova, Hosein Azarbonyad, and Jaap Kamps, 77–89. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.determit-1.8.

Giannouris, Polydoros, Theodoros Myridis, Tatiana Passali, and Grigorios Tsoumakas. 2024. “Plain Language Summarization of Clinical Trials.” In Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, edited by Giorgio Maria Di Nunzio, Federica Vezzani, Liana Ermakova, Hosein Azarbonyad, and Jaap Kamps, 60–67. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.determit-1.6.

Jin, Mali, Daniel Preotiuc-Pietro, A. Seza Doğruöz, and Nikolaos Aletras. 2024. “Who Is Bragging More Online? A Large Scale Analysis of Bragging in Social Media.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 17575–87. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1529.

Jon, Josef, and Ondřej Bojar. 2024. “GAATME: A Genetic Algorithm for Adversarial Translation Metrics Evaluation.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 7562–69. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.668.

Kalashnikova, Natalia, Ioana Vasilescu, and Laurence Devillers. 2024. “Linguistic Nudges and Verbal Interaction with Robots, Smart-Speakers, and Humans.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 10555–64. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.923.

Kim, Evgeny, and Roman Klinger. 2018. “Who Feels What and Why? Annotation of a Literature Corpus with Semantic Roles of Emotions.” In Proceedings of the 27th International Conference on Computational Linguistics, edited by Emily M. Bender, Leon Derczynski, and Pierre Isabelle, 1345–59. Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://aclanthology.org/C18-1114.

Klinger, Roman. 2023. “Where Are We in Event-Centric Emotion Analysis? Bridging Emotion Role Labeling and Appraisal-Based Approaches.” In Proceedings of the Big Picture Workshop, edited by Yanai Elazar, Allyson Ettinger, Nora Kassner, Sebastian Ruder, and Noah A. Smith, 1–17. Singapore: Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.bigpicture-1.1.

Klinger, Roman, Naozaki Okazaki, Nicoletta Calzolari, and Min-Yen Kan, eds. 2024. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): Tutorial Summaries. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-tutorials.0.

Li, Junlin, Bo Peng, and Yu-Yin Hsu. 2024. “Emstremo: Adapting Emotional Support Response with Enhanced Emotion-Strategy Integrated Selection.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 5794–5805. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.514.

Maladry, Aaron, Alessandra Teresa Cignarella, Els Lefever, Cynthia van Hee, and Veronique Hoste. 2024. “Human and System Perspectives on the Expression of Irony: An Analysis of Likelihood Labels and Rationales.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 8372–82. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.734.

Nejadgholi, Isar, Kathleen C. Fraser, Anna Kerkhof, and Svetlana Kiritchenko. 2024. “Challenging Negative Gender Stereotypes: A Study on the Effectiveness of Automated Counter-Stereotypes.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 3005–15. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.268.

Nemec Zlatolas, L., T. Welzer, M. lbl, M. ko, and A. ć. 2019. “A Model of Perception of Privacy, Trust, and Self-Disclosure on Online Social Networks.” Entropy (Basel) 21 (8).

Plaza-del-Arco, Flor Miriam, Alba A. Cercas Curry, Amanda Cercas Curry, and Dirk Hovy. 2024. “Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 5696–5710. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.506.

Plaza-del-Arco, Flor Miriam, Debora Nozza, and Dirk Hovy. 2024. “Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation.” In Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024, edited by Gavin Abercrombie, Valerio Basile, Davide Bernadi, Shiran Dudy, Simona Frenda, Lucy Havens, and Sara Tonelli, 19–30. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.nlperspectives-1.2.

Prochnow, Alexander, Johannes E. Bendler, Caroline Lange, Foivos Ioannis Tzavellos, Bas Marco Göritzer, Marijn ten Thij, and Riza Batista-Navarro. 2024. “IDEM: The IDioms with EMotions Dataset for Emotion Recognition.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 8569–79. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.752.

Pu, Dongqi, Yifan Wang, Jia E. Loy, and Vera Demberg. 2024. “SciNews: From Scholarly Complexities to Public Narratives – a Dataset for Scientific News Report Generation.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 14429–44. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1258.

Pucci, Giulia, Leonardo Ranaldi, and Andres Freitas. 2024. “Does the Order Matter? Curriculum Learning over Languages.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 5212–20. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.464.

Raithel, Lisa, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, et al. 2024. “A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions Across Languages.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 395–414. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.36.

Rao, Abhinav Sukumar, Atharva Roshan Naik, Sachin Vashistha, Somak Aditya, and Monojit Choudhury. 2024. “Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 16802–30. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1462.

Someya, Taiga, Ryo Yoshida, and Yohei Oseki. 2024. “Targeted Syntactic Evaluation on the Chomsky Hierarchy.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 15595–605. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1356.

Song, Yuhan, and Houfeng Wang. 2024. “Would You Like to Make a Donation? A Dialogue System to Persuade You to Donate.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 17707–17. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.1540.

Troiano, Enrica, Sebastian Padó, and Roman Klinger. 2021. “Emotion Ratings: How Intensity, Annotation Confidence and Agreements Are Entangled.” In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, edited by Orphee De Clercq, Alexandra Balahur, Joao Sedoc, Valentin Barriere, Shabnam Tafreshi, Sven Buechel, and Veronique Hoste, 40–49. Online: Association for Computational Linguistics. https://aclanthology.org/2021.wassa-1.5.

Troiano, Enrica, and Piek T. J. M. Vossen. 2024. “CLAUSE-ATLAS: A Corpus of Narrative Information to Scale up Computational Literary Analysis.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 3283–96. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.292.

Velutharambath, Aswathy, Amelie Wührl, and Roman Klinger. 2024. “Can Factual Statements Be Deceptive? The DeFaBel Corpus of Belief-Based Deception.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 2708–23. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.243.

Wemmer, Eileen, Sofie Labat, and Roman Klinger. 2024. “EmoProgress: Cumulated Emotion Progression Analysis in Dreams and Customer Service Dialogues.” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), edited by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, 5660–77. Torino, Italia: ELRA; ICCL. https://aclanthology.org/2024.lrec-main.503.

Wuehrl, Amelie, Yarik Menchaca Resendiz, Lara Grimminger, and Roman Klinger. 2024. “What Makes Medical Claims (Un)verifiable? Analyzing Entity and Relation Properties for Fact Verification.” In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Yvette Graham and Matthew Purver, 2046–58. St. Julian’s, Malta: Association for Computational Linguistics. https://aclanthology.org/2024.eacl-long.124.

Wührl, Amelie, and Roman Klinger. 2022. “Recovering Patient Journeys: A Corpus of Biomedical Entities and Relations on Twitter (BEAR).” In Proceedings of the Thirteenth Language Resources and Evaluation Conference, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, et al., 4439–50. Marseille, France: European Language Resources Association. https://aclanthology.org/2022.lrec-1.472.

[Download this post as PDF]

How to apply to a researcher position

2023-12-24T09:00:00+01:00

The process to apply for positions at Universities differs a lot between different countries. I wrote some observations what’s special about the German system in a previous post. One particular aspect of the German system is that there are rarely “Ph.D. programs”. You apply for a researcher position which is funded by some source (a third-party funded research project, or a university-funded position, or something else). There is rarely a centrally organized admission process. Essentially, you directly apply to your potential Ph.D. supervisor, and this person will decide who to hire.

Now, after I decided to move to Bamberg, I have some positions to offer. I’d like to provide some guideance what distinguishes a good from a not-so-good application.

Does your application look like you send it like that to multiple places?

It is totally understandable that you apply to multiple positions. Nevertheless, make your application appear to be tailored exactly for the position you are applying for. Address the correct person, mention the correct place, but less obvious: Really explain why you apply for this position at this university, with this supervisor. For instance, formulations like “your esteemed institution” or “after a long period of research where to apply” or “your PhD program” sound not very targeted.

Some anecdotes: I received applications which mentioned the wrong name in the letter, I received applications from people who work in entirely different research areas (like… chemistry), and I received applications which mention the wrong University.

The application explains why you are interested in this position.

Explain why you are interested in this position. The supervisor wants somebody who is really interested in this job. The supervisor probably developed a research plan for this job to get money for the project, and is very likely super-excited about the ideas. The candidate should, given the little information they have access to, at least show some enthusiasm. If you need more information, ask the person you are applying to. Showing your interest in this particular project is probably more important than your past work. That’s also important, but for a different reason.

Again, some anecdotes: I sometimes receive applications which explain why the position is a good opportunity to develop. The candidates explain that such Ph.D. position helps them to do X or Y, and that it will make them an independent researcher. That’s not unimportant, it’s good to see that you have such goals. But these things are, again, not very targeted.

The application explains your qualification for the position.

It is important that you are genuinely interested in the position you apply for. Your supervisor wants to share the enthusiasm of the research with you, but of course they also want you to be able to succeed. For that, they want to know why and how you are qualified. Which previous work made you the ideal candidate for this position? Do not just list everything you did. Explain why some specific experience will be valuable for the position you are applying for.

For instance, if you apply with me for an emotion analysis from literature position (nope, none open in the moment, sorry), it is not that relevant to hear about your experience in text–to–image generation. If you wrote a paper about it, that might, however be relevant, because experience in writing papers generalizes across concrete fields. Try to be as specific as you can. It’s also not terrible if you don’t have awesome previous experiences that make you super-qualified, but explain what you have to offer.

My application is structured as requested in the job post.

Sounds so simple, but I receive quite many applications that do not provide all information that is asked for. That’s a no go. Do you provide all documents that are requested? Do you put them together in one PDF file (if that’s requested)? Failing to follow the instructions might look like you are not genuinely interested in the position. Or worse: that you are not able to read the instructions carefully. It would be a pity if your application looks worse than it needs to be.

It is surprisingly common to see applications that do not mention the position the application is for. It’s common to receive mails with multiple files as attachments (despite me asking specifically for just one PDF file).

Every person going through such applications has a specific approach to organize that. It might be that you don’t agree that having one PDF file which includes the motivation letter, the CV, and additional information is the best thing to send, but please make it easy for the person who makes the decision.

In my case, it is very unlikely that you receive an invitation to an interview if you don’t follow the explanations how to apply. I might ask you for a fixed application, but I might also not do that.

Anything else?

Is there something that’s still unclear to you? Please let me know, and I’ll edit this post to provide more information.

Move to University of Bamberg

2023-12-08T09:00:00+01:00

I mentioned in a previous blog post already that I move to the University of Bamberg in March 2024. I will be the Chair for Foundations of Natural Language Processing and Full Professor at the Faculty for Information Systems and Applied Computer Science.

With this blog post, I would like to tell you about a bit about this decision and my future plans in Bamberg.

My time at IMS, University of Stuttgart

I came to the IMS in 2014, following an invitation by Sebastian Padó to substitute him as the chair for theoretical computational linguistics. I’ve been very happy about this offer, despite being a happy postdoc in Bielefeld. I felt that such a substitute professor position gives me the opportunity to learn more about what it is to be a professor. Even more importantly, I was able to come to the IMS, which was as is a very visible place at which a lot of great research happened.

I was not trained as a computational linguist, I’ve been trained as a computer scientist and always felt a bit like a foreigner in this community. I didn’t know too much about linguistics and what’s computational about it, and I did not know too many people in the field. My impression at this time was that essentially everybody from Stuttgart, Saarland, Potsdam, Heidelberg (all the big CL places in Germany) knew each other, and I was incredibly happy to get to know the IMS and hoped to also be able to become a member of this (national and internationa) CL community.

Long story short - I think this worked out incredibly well. I met awesome people right from the start, all the group leaders and groups, including people who were more influenced by linguistics and philosophy. I was fortunate enough to be able to get a tenured “lecturer” position associated with Sebastian Padó’s chair. I am incredibly thankful to him and all other members of the TCL group and the IMS - I learned so much and got quite a good feeling for what NLP and CL is about by now, and started to contribute to the field myself.

I also had the chance to write my own grant proposals. I’ve been lucky enough to get my first proposal accepted on emotion analysis in 2018 without a revise and resubmit round, and that was only possible with the help of people in the institute who helped me with very valuable feedback. I’ve been quite successful with proposals since then; and I am sure that this is at least partially due to a privileged situation I had with awesome people around me who supported me with critical questions and feedback. I cannot value this enough. With these proposals and some people that I was able to supervise on positions that I essentially got granted by Sebastian, I was able to have my own group of PhD students, Postdocs, and Bachelor and Master students; and I enjoyed to work with all of them; I was able to write my habilitation to be able to formally supervise them, and finally this year (2023) got anointed as apl. Prof. (which is essentially a professor title without being one).

Moving to Bamberg

I am very grateful for all the opportunities I got in Stuttgart, and I do not take them for granted. I am aware that it was only partially my own doing. Nevertheless, I decided to leave the IMS for another position. In Germany, it is very difficult to step up inside the same institution for various reasons which are grounded in the system and are outside of the institute’s control. With a W3 position I get in Bamberg, I will have a yearly lump sum, an own secretary, my own large office space, a lab, and positions that are paid by the university. This allows me, I believe, to speed up my research and increase the impact I can have on the research community and on society. I am sad about leaving the IMS, but I see a huge chance that I can develop something good in Bamberg, which, by the way, also offers great opportunities, but more about that later.

In addition, there is the private aspect. I will communite again between my main private life location and my work life location. That will be stressful, and I am grateful to my wife that she supported me all the time to do this step. At no moment in time she told me that I shouldn’t do that, despite all the doubts I had.

However, despite the communite situation that I will be facing, Bamberg will also be very good for me. I am not a huge fan of individual transportation due to it’s space and resource requirements, and Stuttgart is quite clearly a car city, despite having great public transportation. The development of bicycle infrastructure, while it clearly developed, falls far behind what happens in other cities. Bamberg is not awesome when it comes to cycling infrastructure, but it is pretty good. Also: the city of Bamberg makes use of it’s river - I am really looking forward to enjoying evenings at the water.

Plans for Bamberg

Work-wise Bamberg will offer a lot. While Stuttgart is a great environment for computational linguistics and NLP, I never managed to start collaborations outside of the IMS in Stuttgart – I don’t really know why. My guess is that the IMS just offered enough opportunities in it’s own, but I am not sure.

In Bamberg, my group Foundations of NLP (which I might call “BamNLP”) will be one of many other new and established AI research groups, including some on dialogue systems and language generation, explainable AI, cognitive machine learning, and AI system engineering, computer graphics, visualization, media informatics and many more. Further, Bamberg has a focus on the humanities, they have social sciences, psychology, a linguistics department.

This gives me the opportunity to build a group at the intersection between these fields. While the University of Bamberg has a huge focus on AI, I would like to connect various fields inside and outside of this AI focus.

I like to see myself like this:

My plan is to be connected to psychology (something that the University of Stuttgart could not offer) and digital humanities, computational social sciences and linguistics. I will collaborate with other AI and computer science areas. Of course all these areas already talk to each other.

I am looking forward to this next step in my professional life. If you made it until here: thank you! If you want to talk about what I wrote: write me. If you want to work together with me research or application-wise: let me know.

What to do to become a full professor?

2023-10-20T10:00:00+02:00

I recently received an offer for a full professor position, which I accepted. I will write a bit more about that in a couple of weeks. Personally, the process to get this position was challenging from many different angles, and I think that it would have been easier for me if somebody wrote down how it all worked out for them. That’s what I would like to do with this blog post. I think there are three phases of becoming a professor (in Germany), and I will say a couple of things about each of them: (1) Qualifying to become a professor, (2) Applying (and getting rejected), (3) Applying and going through the process of being accepted.

With this information, I’d like to help people who want to understand what they need to do to be able to become a professor (see 1). I hope this helps people who apply without any success (see 2). And I want to help those who might receive an offer (see 3). This whole text is heavily Germany-focused, but some aspects might be similar or the same in other places in the world. It’s also biased towards my own experiences in computer sciences and natural language processing, and of course other people or people in other research areas might have other opinions.

Before we start: what is a professor?

The term “professor” is, in Germany, nearly exclusively used for full professors (salary level W3) or associate professors (salary level W2) at Universities. However, there are other positions that have a similar profile in what people do in their everyday job. Probably you don’t really care about the “title”. What most people who want to enter this system care about is the possibility to do their own research, follow their own ideas, with a large amount of independence and freedom (nobody tells them what to do research on). Finding such jobs is also possible outside of academia, of course. In Germany, you could be a research group leader in a research association, for instance the Leibniz Association, the Fraunhofer Society, the Max-Planck Society, or the Hemholtz Association. Some of these organizations have a stronger focus on foundational research (Max-Planck), some focus on applied research (Fraunhofer). Some have a substantial funding, others need to get (more) money from industry or other third parties. Some nearly solely have time-limited contracts, in others, you can get a tenured position with a higher probability. And of course, there is also great research happening in industry.

What I am talking about here are, however, tenured positions at Universities, where teaching and research are both obligatory, at least to some degree. These positions can be obtained in various ways. I won’t talk too much about Universities of Applied Research (FH or HAW) because I know too little about them (I’ve just not personally been exposed to this career option). Instead, I focus on “research universities”. Here, the term “professor” is typically equated to a chair holder, something like the head of a department. These positions often come with substantial long-term funding for PhD students or postdocs to hire, technical staff, and a secretary. In addition to this “proper” professor positions, there are other tenured positions at universities, but these are quite diverse and it’s hard to say anything about them without inappropriately generalizing. If you have questions about that, send me a mail. I will, in the following, focus on professor positions at universities according to the salary ranks W2 and W3. That’s what’s typically understood to be a “professor” in Germany.

1. Qualifying for Professor Positions (or another similar position)

The typical way to become a professor in most research areas is to study a topic based on your interest (I studied computer science) and then get a Ph.D. degree to gather practical research experience under guideance but with some degree of independence. Commonly, the Ph.D. is in an area very closely related to what you studied (I stumbled into text mining and NLP, accidentially). What to expect from a Ph.D. student is hard to generalize, but typically it involves writing multiple papers, going to conferences to present the own work. It’s typically highly competitive to get these papers into conferences and journal, so once this all is achieved and you defended your Ph.D. thesis in front of a commitee: Great, now you are a doctor. What’s next?

In computer science and related areas, it is not at all expected that you do a postdoc before going to industry - you will very likely find a nice job in some small or big company that is interesting. If you want to go for a professor position, you need to develop yourself into an independent researcher. And, importantly, you need to do it in a way that is visible, that people perceive you as an independent researcher (that’s not necessarily the same thing).

That means, you need to develop the skill to identify research questions. You might already have done that during the Ph.D., for instance by defining Master’s theses topics. It’s also fine to work on research questions yourself and, at some point, look back to identify a common theme how things come together. If you come up with an idea that is bigger than something you can do yourself, it might be worth writing a grant proposal to hire somebody. The prerequisites to do that differ by funding agency, and I won’t go into detail here, but writing (and getting) a third-party funded grant shows that you are able to develop a research idea and plan that is successfully reviewed by independent reviewers. That’s a huge argument to convince others to perceive you as an independent researcher. Having good papers accepted is of course another one. By the way: being independent doesn’t mean you cannot accept any help.

Once you are a postdoc with some history of defining own research ideas and perhaps you even got some grants, you can go for applying for professor positions. Depending on the field, it might also be helpful to put the past research together in a “second book”, the habilitation. That will give you the right to be a Ph.D. student reviewer. In some areas (for instance some that are more humanities-oriented) a habilitation might also be expected. I must say that the habilitation for me was a “no-brainer” in comparison to the Ph.D. In the Ph.D., I was quite stressed to do things that somehow fit together in a Ph.D. thesis. When working towards the habilitation, I did nearly never think about how to put things together, because I already had a quite clear vision what I want to do. I only needed to put it together in writing.

Postdoc positions are also not the only way to go. Others are

becoming a junior professor (assistant professor), but getting such positions directly after the Ph.D. is (in Germany) not too likely (but not impossible).
applying for grants to lead a junior research group.

These are nice, but require at least some degree of independence and ability to define research questions. Most people will likely postdoc a bit before going this way. Then, however, it’s a great way to develop an own research profile and it supports you in doing so more than most postdoc positions. if you have the chance – go for it! If you are not sure if you are ready for that step yet, I’d suggest to try. Learning from the process is already quite valuable.

Technically, in the end, you need to able to show a “habilitation equivalent qualification” as a prerequisite for being hired as a professor. That involves excellent research and the ability to teach well (the original law for Baden-Wuerttemberg for instance can be found here).

2. Applying and Failing

Applying

Great, you are now an independent researcher and already supervised some Ph.D. students or led some projects. Let’s apply for professor positions.

These positions are typically published in various ways. I personally like the job market of the newspaper Die Zeit. Alternatives include the mailing lists of the Deutscher Hochschulverband. Once you find a call for applications that seems to suit you, you need to apply. The call will likely say something like “with the typical documents”. That’s the first challenge. These documents clearly contain:

A motivation letter
Your curriculum vitae, including education, jobs, publication lists, invited talks, awards, grants, scientific service.
Lists of teaching experience
Lists of supervised students (on all levels, particularly Ph.D.)
Copies of certificates

Commonly, the hiring committee also wants to see:

Concept paper for research at the new place
Concept paper for teaching at the new place

These two documents are, together with the motivation letter, probably the most difficult things to write. Make sure that they show your excellence but also how you fit into the new environment.

The committee might also ask for additional documents, for instance teaching evaluations or your most important and influential three papers, perhaps with explanations why you find them relevant. Sometimes they also want you to fill standardized forms (what’s your h score, how many papers did you write). My last application document was a PDF with altogether 468 pages. They asked for it… ;-)

Invitation

Once your application documents are evaluated positively, you will be invited. The meeting typically consists of a scientific presentation and an explanation of the future planned work and how that fits into the new environment. The talk is typically public to the university. That also means that at this moment in time, it is likely that somehow people at your current university might learn about your application. That’s not desirable, but sometimes cannot be avoided. The community is too small. Part of the presentation is sometimes also a teaching unit, where you need to give a lecture on a topic of your choice or on a predefined topic. Often, this part is just something like 20 minutes long.

After the presentation, there will be a session in which the committee asks you questions. Questions that I remember to have heard often are the following (these are explicitly not questions that describe the situation in one specific university or hiring process I was part of):

Why is your research important? Why is it excellent?
How do you complement the work of professor X who we already have?
Great work, but how do you plan to work together with professor Y?
With whom do you want to work together?
Which network do you contribute to our institution?
Do you have plans for bigger grants? Which ones? What’s the topic?
Do you want to involve yourself in the administration of the faculty/university/department? How?
Why did you not achieve Z yet?
How do you motivate your Ph.D. students?
What do you do if one of your Ph.D. students develops a problem with alcohol?
What do you do if one of your Ph.D. students is sad because of frequent paper rejections?
How do you plan to support minorities in the field?
How will you make the study programs at our university more attractive to students?

After that, you can ask questions to the committee. And after that, there is typically a session in which you talk to student representative who can also ask questions. They often focus on the teaching experiences and style. They often want to know that you actually care about educating them well.

Waiting and Failing

After that, things take a while, definitely months, sometimes (a low number of) years. If you don’t hear anything after a couple of weeks, it’s likely that you are out of the process. Sometimes, I received feedback informally about the application once everything was over. The negative feedback that I heard over the years was:

Your work does not fit to what we want.
We did not see that you were enthusiastic about your plans.
You appeared to be too informal for such prestigious position.
You appeared to be too little approachable to the students.
Your work is not as excellent as other people’s work who applied.
You clearly did not prepare well.
The students did not like you.
Your presentation was not entertaining, you did not even make a single joke.
We did not have the impression that you actually want this position.

I am not kidding. Every single sentence I heard in some place.

You see that this is quite personal, and often it hurt, particularly because I found some of this criticism inappropriate or even wrong. But that’s difficult - the interview is an extreme situation, and even if you are an extreme extravert, it might be different in this situation (or the other way around). That means: you need to practice. Try to get invited to such interviews, even if you think that you have no chance. I needed around 10 attempts to succeed.

What I took from it was essentially: No need to try to appear to be a person who you not are (I did do that). Be authentic (I did only do that in the more recent applications).

It can also help to make everything easier if you talked to the head of the committee before applying. Send them a mail or call them on the phone. It’s normal to do that. They won’t be surprised.

3. Applying and Succeeding

The following things I only know as a candidate. I never was part of a search committee, so some things might be (slightly) wrong.

Once you are successful in the interview and the committee things that you will be a good fit, they compile a so-called list. This list contains one or multiple people. Funnily, the list can only have three positions (at least in some universities or states), so the entries are called 1, 2, 3a, 3b, 3c,… Lists of length 3 are, however, common. These lists might be compiled strategically: The first person might be the big shot the university really wants to have and at later ranks there are people who are more likely to accept an offer, such that the university does not need to write a call again and go through the whole process. I am not sure how often that actually happens, but I heard such stories.

Part of the list creation is a process in which external reviews are acquired. To do so, other professors are asked to write reviews about the candidates, sometimes in a comparative manner. I have never seen such reviews, so I cannot say too much about that. I heard that sometimes these reviews are quite personal, sometimes they are wrong. I believe that this is also a difficult business, because people who know you too well might have a conflict of interest. Those who don’t might be slightly outside of your main research field.

Once you are on the first position of this list (or people higher in this list reject an offer) you will receive a letter from the president with the call (“dem Ruf”). This is an invitation to start negotiations and you are expected to prepare concept papers for research and teaching in which you explain what you plan to do and how many positions you need, how many rooms, how much money to get started, and how much money you want to have as a yearly budget. These concept papers can be based on the concept papers you prepared for the application, but they need to be more concrete. Every item needs to be explained clearly. It has been very helpful for me to have seen such documents that other people prepared in the past. I won’t share mine with you publicly, but perhaps you have friends who recently succeeded in getting a professor position who you could ask? Also the DHV was very helpful in reviewing draft version of these documents and giving feedback. They also publish average numbers that you can expect to get (which look, from my perspective, very high, because they include people’s successes who are in the system for longer than I am).

These documents are read by the chancellor (the person who is responsible for money) and the president of the University. Then, you have a meeting with them and you talk about the various items. They will tell you what they can offer, and you can try to negotiate. Once this meeting is over, they will send a formal offer that you can accept or reject. Or you try to further negotiate. That’s it.

After that, all other applicants for the position are informed and they can formally complain to not have been considered appropriately. Hope that this does not happen, because it can delay the process.

Once you got the offer, the bureaucratic process starts to get you the position. It’s tedious, but not likely to fail. Congrats :-).

New Grant Proposal Accepted: INPROMPT

2023-07-20T10:00:00+02:00

I got the information a couple of days ago that I got a new grant proposal accepted by the DFG. It’s a Sachbeihilfe, requesting funding for one Ph.D. student and substantial additional support for user studies.

The project’s name is “Interactive Prompt Optimization with the Human in the Loop for Natural Language Understanding Model Development and Intervention” (INPROMPT).

The work will start from the motivation of few-shot or zero-shot settings for the creation of models in algorithmic natural language understanding. A currently modern and popular approach to develop models without too much annotated data is to use pre-trained neural language models and use a prompt to generate a word that describes an instance of text. For example, you can do sentiment polarity classification by entering a text instance such as “The person is very satisfied with the product.” associated with a prompt and check whether the sentence “The product is good” or “The product is bad” results in a higher probability.

Creating such prompts has the advantage that it does not necessarily require technical expertise, but creating good prompts is still not trivial. Existing research has approached the problem from (at least) two perspectives: (1) adapting existing language models using (few) annotated data points and manually generated prompt sets, and (2) using data-driven automatic prompt generation.

We will combine these two directions and start with the typical situation in which a language comprehension task is formulated vaguely, a more precise specification is still missing, and no annotated (but certainly non-annotated) texts are available. Our goal is to develop and analyze systems that automatically guide domain experts without technical training in machine learning to create well-functioning prompts.

To do this, we use optimization methods that change prompts iteratively and estimate their quality with the help of a target function. This estimation is based on automatic predictions on text instances, based on the readability of the prompt, and based on the conclusiveness of an explanation of the decision-making. In our project, the objective function based on these factors is not automatically evaluated, but replaced by a “human in the loop”. However, in order to study the problem of iterative optimization of prompts on a larger scale, we also simulate human decisions using automatic approximations of the human objective function.

We expect that our project will significantly improve the transparency of prompt-based models and contribute to the democratization of the use of machine learning algorithms.

My current plan to is start the project latest in April 2024.

The Wisscomm Project I am a Scientist

2023-06-26T10:00:00+02:00

I am not too experienced with communicating my research to people outside of the research community (except for, for instance, this event). Of course we do that when we teach, but that’s not accessible to a broader audience.

Recently, I have been contacted by the initiative “I’m a scientist - Get me out of here”. This is a research communication project for school children.

This initative also uses an instagram channel, where they explain more about what they do.

Goals and Procedure

I’m-a-Scientist has, as far as I understand, two goals:

Give school children the possibility to ask researchers questions and get answers directly from them.
Show to them that doing research is a possible job choice.

To do that, they have an online platform in which they allow teachers to register and ask researchers to register. Apparently, they do that in rounds where each round is specific to a topic. There was already quite a set of topics in the past, including Social Media and AI and Medicine. I was part of Does AI communicate?. In the list of researchers of this round, one can find a profile of each researcher which describes a bit what they are doing and what they are working on. Mine is here.

Children could then ask questions in two ways: (1) asynchronously and (2) synchronously. The asynchronous approach allowed them to put a question on the platform and some (or all?) researchers on the platform got a notification mail that there is a new question. These could then be answered. I did for instance answer this question.

In addition, there was a synchronous text chat, in which students were put together in a chat system and could ask questions to around 3 researchers who signed up for this time slot of 30 minutes. There was only text and no other modality available. In addition, the live chats had a moderator, and the teacher was also in the room and could write. Except for the researchers, everybody was anonymous.

Own Observations

Overall, I liked this whole thing a lot. I learned a lot what children find interesting and also it was very surprising how different the answers by other researchers (from a similar research field or a different field) were from the own answers.

I’d like to share my impressions in a bit more detail in the following:

General

The general topic was “Does AI communicate”. I don’t know how the organizers come up with these topics, but I liked it. It’s broad enough to attract many people’s attention but still specific enough that, I guess, many people have an idea what this is about. For the case that you read this in a couple of years: we are here a couple of months after ChatGPT and other Instruction-tuned models were made publicly available.

Next to the challenge to find a good name for such a topic-focused round, there were, of course, other challenges to be solved by the organizers. I believe that finding teachers who want to participate with their class is one of them, but I don’t know anything about the process. What I found interesting is the choice of researchers who were involved. I would say they were from very varying fields (see for yourself):

Researchers who work in communication sciences and want to understand the impact of generative language models on human communication.
Researchers who study the impact of AI on the society and the future.
Computer scientists, natural language processing people, computational linguistics, and digital humanities scholars.
People at the border to art, who want to understand how AI can be used to support creative writing.

I found this selection really good, but interestingly, I was quite often (not always) the only technical person in the chat rooms. However, this did also not really matter: most of the questions were not about the technical background of what makes an “AI”. That’s probably not too suprising: “AI” is, in my perspective, not a concept of interest. It is a buzz word that combines many things, including generative language models, optimization, search, machine learning, or logic. Conflating all of that in one word creates an abstraction that makes it hard to focus on specific aspects. I do not really like that, but I also see that one needs simplified concepts to be able to talk about them.

However, the questions that I have seen show that this abstraction made it sometimes hard to answer them. I do not criticize anybody here – I wouldn’t know how make it better - but I often felt challenged.

Many of the questions that I have seen in the chat platform were about:

Can an AI take over the world? Can it be dangerous? Can it have a will to survive and replace humans?
- I mostly answered this with “no”, but that it can be used for harmful purposes by people. I typically focused on the aspect that computer systems (including “AIs”) are tools that are used by people. That seems to be something that is not really something that is general knowledge. I knew that this opinion/fear exists, but it seemed to really be wide spread.
Are you an AI?
- I tried to briefly explain the Turing test.
How does an AI work?
- I never answered this question because other’s were always quicker, and I was quite happy about this. The explanations by other people were mostly focusing on supervised learning but did not include reinforcement learning, crowdsourcing, or pretraining of models. That would of course also have been totally out of scope.
Why is it difficult to understand how an AI works?
- I tried to answer that this is not a conceptual issue, but more of a challenge to deal with a complex system.

What I would like to know more is - how have the teachers been prepared, and were they instructed to prepare their students. My impression was that there were huge differences in preparation, both quality and quantity. I think it would be great if the students would know about in advance to whom they are talking - and I’d also like to know more about the students in advance. I will ask the organizers about that and update this post when I receive an answer.

Synchronous Chat

This was an interesting experience, with the synchronous chat. It felt a bit old-fashioned, without emojis, avatars, or direct messages. The threading was also quite basic; perhaps a more hierarchical presentation of the chat would have been a good idea. However, the advantage of a linear chat was that it was easy to understand, and the moderator sometimes paused the questions such that the researchers were able to answer. One feature that I really liked was that, after typing a specific number of characters, the window turned red. A very intuitive way to tell me to keep my answer short.

I was not too happy with the teachers in the chat rooms. One teacher answered in one situation that “even his children” were able to recognize a specific property of an AI. I found that patronizing, and it reminded me what I did not like about school. Another teacher was apologizing for the behaviour of a child, who “tried to be funny”. Come on - what’s wrong with being funny?

Altogether, the chats could have been a bit longer. They were limited to 30 minutes, which was barely enough. There was also no real dialogue. The school children shot their questions and we tried to answer as quickly as we could. I think that smaller groups could have also helped.

Asynchronous Questions

This feature was also used a lot! I, personally, however, did not like it too much. I was always late; when I clicked on a question for which I received an notification, there were already tons of answers. Perhaps this could be structured more in a debate mode, where diverse answers could be grouped, and people can also answer to each other. I’d also like to be able to ask a question to the person who asked; for clarification. But I see that such forum style would make things much more complicated. Perhaps that was just my own perception. I liked the chats more.

Summary

Altogether, I must say that participating in I-am-a-scientist was a great experience. If you, as a researcher, have the chance to do that, I can only recommend it. It helps to think about the own work and the impact it has on other people.

How to enter the academic system in Germany after the Master level

2023-06-02T12:00:00+02:00

I regularly receive mails asking if I might have an open PhD student or Postdoc position to offer. These mails come from all over the world, and of course it cannot be expected that everybody knows how such positions can be funded in the various countries in the world. To contribute to some better understanding how this works in Germany, I’d like to describe my perspective. This might be different for other disciplines than computer science and can also differ between different universities or even groups.

PhD students

In contrast to some other countries, PhD students are typically not admitted by a centralized committee and then assigned a supervisor. In contrast, the supervisor is typically the person who selects the PhD student and hires them. There is commonly also not one deadline per year when people apply for grad school. This follows from how PhD students are funded in Germany. There are generally two different ways get money for PhD research work:

Money from the University
No money from the University

In the first case, you receive money from the University, you are typically hired as a researcher. Technically, this does not mean that you are hired to do your PhD! It means that you are hired for a job. In the ideal (and typical) situation, your job overlaps to nearly 100% with the work that you need to do for your PhD. But that also depends on where the money comes from.

If your funding comes directly from the University (more concretely, money that a professor or another form of group lead (principle investigator, PI) got from the University to fund their research) the PI is pretty free to decide what the work is about. Often, the PI will give you a lot of freedom, but sometimes they might also micro-manage a PhD student. It really depends on the situation. Make sure that you and your potential supervisor have a similiar opinion on how things work. The university might also require that you need to teach a bit. Sometimes, you also need to do some administrative work. These things should be clarified with the PI and you have some interest to understand what the opinion of the PI is how much time you will have to do research. Most people I know who work as PIs are interested to ensure that you can nearly focus 100% on research, but it’s good to clarify that.

If the money comes from some funding agency, you are paid to do research in a project. If it’s a DFG project (“Sachbeihilfe”), you are supposed to nearly do research only. The DFG does allow that you do some teaching, but that’s actually limited by them to ensure that you focus on the research. DFG money is typically a very good funding source, because there is not too much overhead in reporting. The money could also come from EU or BMBF (or some foundations) and the situations depend on the concrete project. Often, there is more interaction with other project partners involved than in DFG project, but that also really depends on the concrete project.

Depending on the discipline, it is sometimes common to give 50%, 65%, or 75% project position to PhD students. The argument is sometimes, that a PhD thesis is not project work, and if the project work takes 100% of a persons time, there would not be enough time to work on the PhD. This situation is one reason for the outcry by German researchers who complain about problematic working environments (see an article about #ichbinhanna) This is something that a PI can barely change, despite the fact that ideally the research in the project is actually the content of the PhD work. In computer sciences, it is common to get a 100% position for PhD students.

If you are funded by the University, from a project or directly with money from the PI/institution, the PI will likely publish a job post on mailing lists, the university’s career page, or in social media. It might make sense to ask a PI with whom you would like to work with where they publish their calls, and if they expect to publish something soon. It probably does not make sense to ask “do you currently have an open position” – it’s typically just not very likely that this is the case.

In the second case mentioned above, you are not funded by the university. That possibility exists because technically doing the PhD and working as a researcher at a University are two different things. You could work at a company and enroll at the University as a student to work on the PhD, you could receive some scholarship from some source (e.g., DAAD), or you even work at some other university as a project member or researcher. In such cases, it makes sense that you just contact a potential supervisor at some university. You could tell them that you work on X and that this fits well to their work. You could ask them if they would be willing to guide them in the PhD process. Depending on the situation of the PI, they might have the capacity to do that or not. But please, write this mail especially for this person – don’t send the same mail to many people without making clear that you really selected this person carefully. Sometimes I receive such mails with the wrong name in the opening - that’s annoying and not likely to be answered.

Postdoc

As a postdoc, the situation is mostly the same: You can work funded by the University directly, by a project or by a scholarship. In addition, you could write your own project proposal to fund yourself. If that’s something that you want to do, you could contact a person you would like to work with. They would probably know how to fund people and guide you in the process of applying for money for a position. However, note that this process takes time. Preparing a grant proposal easily takes half a year, followed by at least half a year until you receive a decision. If that’s something that you want to do, talk to people very, very early.

Professor

I am not a full professor, so I am the wrong person to ask ;-). I found, however, this page quite nice, which explains the situation in Bavaria, but I think that’s not very different in other states in Germany (despite that some people will now make the joke that Bavaria is indeed very different from the rest of Germany).

To understand the academic ranks in Germany, this Wikipedia page might be helpful.

Conference Report: The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)

2023-05-12T12:00:00+02:00

Last week, May 2-6, 2023, the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL) took place in Dubrovnik. I think this has been my first EACL conference, and the program was very interesting, with a good mixture of recent state-of-the-art research on fashionable topics while maintaining a good diversity across various research fields. It’s been one of my favorite conferences that I attended so far.

The conference hat 281 papers accepted, out of which there were 229 long and 41 short papers. The acceptance rate was 24,1%. Such a comparably low rate leads to many good papers not finding a space in the conference. EMNLP invented the model of the Findings of ACL/EMNLP/EACL/NAACL, in which papers are included that might not fit in the main conference but are still worth publishing. In some conferences these papers are presented as posters, sometimes invited to be presented in workshops, and sometimes presented only as videos online. For EACL, all 201 Findings papers (149 long) were presented in a video on the Underline platform, and some papers were additionally presented in workshops.

I am wondering if the invitations to present in workshops shaped the perception that Findings papers are somehow between the main conference and the workshops - I heard this opinion multiple times at the conference. Personally, my perception is more that Findings papers are in the same category as main conference papers and not more similar to workshop papers. Workshop papers are not worse – they are more focused on specific topics. Findings papers often lack this special focus and therefore they are not submitted to a dedicated workshop but to the main conference.

Next to many oral presentations, there were very nice poster sessions. At *ACL/EMNLP conferences, poster papers are considered to be of the same quality as those presented as talks: there is no difference in the proceedings.

The program of the conference, with talks and posters, has been complemented by tutorials and so-called Birds-of-a-feather (BoF) sessions, in which people interested in a specific topic introduced each other, as a networking event. This format has been introduced during the COVID-online conference times (I think) and still exists. It’s actually quite nice to get in touch with a subcommunity that one did not know yet. I did participate in such sessions for the first time and can only recommend it.

Contributions from University of Stuttgart

Tutorials

The IMS at the University of Stuttgart had many contributions, and it felt very nice to be at the conference with so many nice colleagues. I did not have this experience as a PhD student (where I was essentially the only person targeting ACL conferences for publications in the group), and I really appreciated it. It is so much easier to meet new people if you already know many.

The IMS contributed two tutorials. I was part of the Emotion Analysis Tutorial (Štajner and Klinger 2023), offered together with Sanja Štajner. It’s been my first tutorial, as usual, I did not plan enough time for the material that I wanted to cover. Thanks to Sanja, who was flexible enough with her timing, we did not overrun too much.

Another tutorial with substantial involvement by IMS people was given by Gabriella Lapesa, Eva Maria Vecchi, Serena Villata, and Henning Wachsmuth (Lapesa et al. 2023). The topic was argument mining, and it unfortunately took place in parallel, which was a pity because the same people might have been interested in both. Luckily the tutorials were recorded and will be online on the underline platform.

Papers

We further had a set of papers in the main conference, the workshops, and Findings.

Wuehrl, Grimminger, and Klinger (2023) proposed a real-world pipeline for biomedical fact-checking, based on the idea that reformulated claims can be better checked against scientific text than the original formulation of a claim as it occurs in social media. Miletic and Schulte im Walde (2023) showed how compositionality information can be extracted from BERT. Eichel, Schlipf, and Schulte im Walde (2023) investigate how LLM can be prompted for plausability with applications in the material sciences. Falk and Lapesa (2023) show how adapters can be used to efficiently predict argument quality, based on a large set of datasets and quality dimensions. Nikolaev and Padó (2023) study representation biases in sentence transformers, Gaser et al. (2023) explore segmentation approaches for neural machine translation with code-switching, and Väth, Vanderlyn, and Vu (2023) show how dialog systems allow for more complex tree-like conversations with intelligent agents.

My Favorite Contributions

In addition to the IMS contributions to the conference, I found a set of talks and papers very interesting. This only reflects my personal opinion, and that I do not mention a particular paper probably only means that I did not have the time to go to its presentation. There were many interesting papers in the program, I did not go through all of them yet.

Invited Talks

Before I say something about the papers I liked a lot, I would like to point out two of the three invited talks. Joyce Chai talked about embodied AI. As a student of computer science, I often heard the phrase “intelligence needs a body”, and I must say, I never really understood. Now, with this talk, and the nice demonstration videos that Joyce showed, I finally got a grasp of what’s behind this phrase. Full understanding of the whole context is only possible in multimodal interactions. That does not mean that every researcher needs to work on multimodal interaction analysis, but there needs to be such integration efforts to not miss important aspects. I found that very intuitive.

The other keynote, given by Edward Grefenstette included discussions on the efficiency of LLM and their future use. He mentioned work by Lyle et al. (2020) who studied why LLM can actually generalize. Apparently, it is crucial to only have one epoch during pre-training. On a more entertaining side, he pointed out that LLM currently mostly fail with pragmatics (“Have you seen my phone” – “Yes, I have seen your phone.”)

Papers

My Favorite papers:

Eisenschlos et al. (2023) study how LLM can learn new words in-context at inference time and develop a method to measure such word acquisition (by prompt-based coreference resolution with new words). This paper also won a best paper award.
Ishibashi et al. (2023) analyze the robustness of prompts by prompt pertubation. One interesting finding is that manual prompts are more robust than automatically learned prompts in few-shot settings. Very interesting study to get a better understanding what “good prompts” are.

We know that LLM tend to hallucinate content. This can also happen during machine translation (I did not know that, and it sounds pretty scary!). Understanding such hallucinations is the topic of the work by Guerreiro, Voita, and Martins (2023). They also propose a method to mitigate the issue by regularization during inference.

Govindarajan et al. (2023) point out that there is no such thing as unbiased language! They look at interpersonal bias and emotion.
Zhong, Dhuliawala, and Stoehr (2023) study a task that sounds like it should be super-straight-forward to solve: extract mentions of counts from social media (here: victim counts). Apparently, the task is really difficult, because models need to understand enumerations, implicit references, next to actual mentions of numbers. This is a very interesting paper, because it shows another case where general models fail and that specifically developed models for particular tasks are important.
Mohammad (2023) wrote a paper about how to use emotion lexicons and how to build them. The paper style is worth pointing out: it’s written in a question-answer style, and I think that this is very accessable.
Most of our language models are huge these days, and luckily, there are methods to compress them, such that they can not only be used on GPU clusters. Du et al. (2023) show that such compression comes with disadvantages: it reinforces biases.
Recently, several methods have been proposed to automatically find well-performing prompts (for instance Shin et al. (2020), Ding et al. (2022)). In their paper, Prasad et al. (2023) focus on instruction tuning without a need to calculate gradients.
Narayanan Venkit et al. (2023) look into nationality bias (instead of a bias towards particular languages).
Parmar et al. (2023) (also outstanding paper) study if instructions in crowd-sourcing tasks create biases (unfortunately, the answer is yes).
Cortal et al. (2023) build on top of previous work that we published, on the emotion component process model and appraisal theories for emotion analysis, particularly Casel, Heindl, and Klinger (2021). They build on top of another appraisal theory that focuses on the cognitive component of emotions and create a corpus in French, with a focus on emotion regulation. This is the first work that I have seen that puts emotion regulation into focus in NLP work.

Awards

I won’t go into detail regarding the paper awards, these papers have already been evaluated by other people to be interesting, I’ll just list them here:

Outstanding papers:

Best papers:

Venue and Place

At the end of this blog post, I’d like to comment on the location. Personally, I like to categorize conference locations into three types:

Nice conference centers in some city with hotels around, without a hotel directly associated to the venue. An example was COLING 2018 in a very nice place downtown Santa Fe. The advantage is: such conference locations are nice! The disadvantage is: participants do not tend to hang out at the venue.
Conference hotels downtown of some big city. An example was NAACL-HLT 2019 in Minneapolis or ACL 2014 in Baltimore. If one can effort the conference hotel, that’s nice because people hang around also outside of the conference schedule. Unfortunately, these hotels are sometimes prohibitively expensive, and then people are just elsewhere - and in contrast to (1), the conference center is typically not even nice but arbitrary.
Conference hotels somewhere where nothing else is that might motivate one to be elsewhere. This has been the setup of RANLP 2009 and RANLP2011 (not sure about later). Of course both places had a lot to offer outside of the conference hotel, but given that these places were pretty empty outside of the respective season, one ran into conference people everywhere.

The conference hotel Valamar Lacroma Dubrovnik Hotel of EACL 2023 was something in-between. It was slightly too expensive such that all particants would decide to stay there, but downtown was sufficiently far away such that people did not go elsewhere during breaks, even if they had the accomodation elsewhere. I must say that I found this to be a very good setup. The city of Dubrovnik was beautiful, and in the evening we ran into hundreds of EACL people. But the conference venue/hotel had enough to offer that one could also stay there and talk.

Bibliography

Belouadi, Jonas, and Steffen Eger. 2023. “UScore: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 358–74. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.27.

Casel, Felix, Amelie Heindl, and Roman Klinger. 2021. “Emotion Recognition Under Consideration of the Emotion Component Process Model.” In Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), 49–61. Düsseldorf, Germany: KONVENS 2021 Organizers. https://aclanthology.org/2021.konvens-1.5.

Chen, Yunmo, William Gantt, Weiwei Gu, Tongfei Chen, Aaron White, and Benjamin Van Durme. 2023. “Iterative Document-Level Information Extraction via Imitation Learning.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1858–74. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.136.

Lyle, Clare, Lisa Schut, Binxin Ru, Yarin Gal, and Mark van der Wilk. 2020. “A Bayesian Perspective on Training Speed and Model Selection.” In 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Vancouver, Canada. https://proceedings.neurips.cc/paper/2020/file/75a7c30fc0063c4952d7eb044a3c0897-Paper.pdf.

Cortal, Gustave, Alain Finkel, Patrick Paroubek, and Lina Ye. 2023. “Emotion Recognition Based on Psychological Components in Guided Narratives for Emotion Regulation.” In Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 72–81. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.latechclfl-1.8.

Ding, Ning, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Haitao Zheng, and Maosong Sun. 2022. “OpenPrompt: An Open-Source Framework for Prompt-Learning.” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 105–13. Dublin, Ireland: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-demo.10.

Du, Mengnan, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu, and Ahmed Hassan Awadallah. 2023. “Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1766–78. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.129.

Eichel, Annerose, Helena Schlipf, and Sabine Schulte im Walde. 2023. “Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1420–35. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.104.

Eisenschlos, Julian Martin, Jeremy R. Cole, Fangyu Liu, and William W. Cohen. 2023. “WinoDict: Probing Language Models for in-Context Word Acquisition.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 94–102. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.7.

Epure, Elena, and Romain Hennequin. 2023. “A Human Subject Study of Named Entity Recognition in Conversational Music Recommendation Queries.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1281–96. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.92.

Falk, Neele, and Gabriella Lapesa. 2023. “Bridging Argument Quality and Deliberative Quality Annotations with Adapters.” In Findings of the Association for Computational Linguistics: EACL 2023, 2469–88. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.findings-eacl.187.

Gaser, Marwa, Manuel Mager, Injy Hamed, Nizar Habash, Slim Abdennadher, and Ngoc Thang Vu. 2023. “Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3523–38. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.256.

Govindarajan, Venkata Subrahmanyan, Katherine Atwell, Barea Sinno, Malihe Alikhani, David I. Beaver, and Junyi Jessy Li. 2023. “How People Talk about Each Other: Modeling Generalized Intergroup Bias and Emotion.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2496–506. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.183.

Guerreiro, Nuno M., Elena Voita, and André Martins. 2023. “Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1059–75. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.75.

Ishibashi, Yoichi, Danushka Bollegala, Katsuhito Sudoh, and Satoshi Nakamura. 2023. “Evaluating the Robustness of Discrete Prompts.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2373–84. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.174.

Krishna, Kalpesh, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, and Kyle Lo. 2023. “LongEval: Guidelines for Human Evaluation of Faithfulness in Long-Form Summarization.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1650–69. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.121.

Lapesa, Gabriella, Eva Maria Vecchi, Serena Villata, and Henning Wachsmuth. 2023. “Mining, Assessing, and Improving Arguments in NLP and the Social Sciences.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, 1–6. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-tutorials.1.

Lu, Hongyuan, and Wai Lam. 2023. “PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 68–82. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.5.

Madusanka, Tharindu, Riza Batista-navarro, and Ian Pratt-hartmann. 2023. “Identifying the Limits of Transformers When Performing Model-Checking with Natural Language.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3539–50. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.257.

Miletic, Filip, and Sabine Schulte im Walde. 2023. “A Systematic Search for Compound Semantics in Pretrained BERT Architectures.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1499–1512. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.110.

Milich, Marcel, and Alan Akbik. 2023. “ZELDA: A Comprehensive Benchmark for Supervised Entity Disambiguation.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2061–72. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.151.

Misra, Kanishka, Julia Rayz, and Allyson Ettinger. 2023. “COMPS: Conceptual Minimal Pair Sentences for Testing Robust Property Knowledge and Its Inheritance in Pre-Trained Language Models.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2928–49. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.213.

Mohammad, Saif. 2023. “Best Practices in the Creation and Use of Emotion Lexicons.” In Findings of the Association for Computational Linguistics: EACL 2023, 1825–36. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.findings-eacl.136.

Narayanan Venkit, Pranav, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao Huang, and Shomir Wilson. 2023. “Nationality Bias in Text Generation.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 116–22. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.9.

Nguyen, Laura, Thomas Scialom, Benjamin Piwowarski, and Jacopo Staiano. 2023. “LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware Summarization.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 636–51. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.46.

Nikolaev, Dmitry, and Sebastian Padó. 2023. “Representation Biases in Sentence Transformers.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3701–16. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.268.

Parmar, Mihir, Swaroop Mishra, Mor Geva, and Chitta Baral. 2023. “Don’t Blame the Annotator: Bias Already Starts in the Annotation Instructions.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1779–89. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.130.

Prasad, Archiki, Peter Hase, Xiang Zhou, and Mohit Bansal. 2023. “GrIPS: Gradient-Free, Edit-Based Instruction Search for Prompting Large Language Models.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3845–64. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.277.

Shin, Taylor, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. 2020. “AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4222–35. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.346.

Štajner, Sanja, and Roman Klinger. 2023. “Emotion Analysis from Texts.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, 7–12. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-tutorials.2.

Väth, Dirk, Lindsey Vanderlyn, and Ngoc Thang Vu. 2023. “Conversational Tree Search: A New Hybrid Dialog Task.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1264–80. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.91.

Winata, Genta Indra, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, et al. 2023. “NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 815–34. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.57.

Wuehrl, Amelie, Lara Grimminger, and Roman Klinger. 2023. “An Entity-Based Claim Extraction Pipeline for Real-World Biomedical Fact-Checking.” In Proceedings of the Sixth Fact Extraction and VERification Workshop (FEVER), 29–37. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.fever-1.3.

Zhong, Mian, Shehzaad Dhuliawala, and Niklas Stoehr. 2023. “Extracting Victim Counts from Text.” In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 1925–40. Dubrovnik, Croatia: Association for Computational Linguistics. https://aclanthology.org/2023.eacl-main.141.

[Download this post as PDF]

Emotion Role Labeling and Stimulus Detection: An Overview

2023-04-24T12:00:00+02:00

The Need for Structured Emotion Analysis

Emotion analysis, as I described it in my previous blog post which focused on appraisal theories, typically focuses on predictions based on a predefined textual unit. This could be a sentence, a Tweet, or a paragraph. For instance, given the text

I am very happy to be able to meet Donald Trump.

one could task an automatic emotion analysis system to output:

the emotion that the person writing the text felt at writing time or wants to express (author emotion),
the emotion that a person might develop based on the text (reader emotion),
the emotion that is explicitly mentioned in the text (text level emotion).

For (1), it’s pretty clear that the author of the text “I” feels joy. The text expresses that quite clearly, which also fits (3). To understand what emotion the reader might feel (2) depends on various aspects. If they like the author and Donald Trump, for instance, there might be some increased chance that sharing the joy of the author is more likely than a negative emotion. I also think that (3) is not really a task - it’s more an underspecified setup, in which the decision whose emotion is to be detected is left open.

There has been some work on the question of perspective - the reader vs. the writer (Buechel and Hahn 2017). What we cannot do, however, with such text classification approach is to extract the emotion of other mentioned entities, here, “Donald Trump”. The difference between the tasks is that there is always an author and a reader, but the entities are flexible parts of the text. To assign an emotion to an entity, we first need to know which entities in the text are mentioned.

If text classification is sufficient depends on the actual task. For social media analysis, extracting the emotion of authors of full tweets makes a lot of sense. For literature, the author’s emotion obviously is not that relevant, similarly for the journalist’s emotion when writing a news headline. For some domains, it is much more intuitive to look at the emotions of entities that are mentioned. Given, for instance, the following news headline (from Bostan, Kim, and Klinger (2020))

A couple infuriated officials by landing 
their helicopter in the middle of a nature reserve.

the question which emotion the author felt is probably irrelevant – it’s a journalist, they don’t feel anything about it, and if they do, it probably doesn’t matter. We might be interested to understand what emotion is caused in a reader, for instance to improve recommendation systems (to only read good news; or to find headlines which are suitable for clickbait). Still, arguably, in such tasks the emotions that are felt by interacting entities are more relevant for analysis of news. Here, we would like to know that “officials” are described to be angry. We could also try to infer an emotion of “A couple” - perhaps they were pretty happy (anyway, they have a helicopter).

Coming back to the example of “The sorrows of the young Werther” that have been mentioned earlier, finding out which emotions are ascribed to entities in a novel clearly requires more than just text-level classification, to not just be a straight oversimplification.

Finally, we might also want to know which event (object, or other person) is described to cause the emotion. Being able to do that would allow us to automatically extract social network representations (Barth et al. 2018) and understand which stimuli are often described to cause a specific emotion.

Structured Sentiment Analysis

This is all pretty related to another more popular task that you might have heard of: aspect-based sentiment analysis. Trying to understand not only if a text is positive, but what aspect is described to be positive, who the opinion holder is, and which words express this opinion. This is now an established task in sentiment analysis. As a recent example, the SemEval Shared Task on Structured Sentiment Analysis (Barnes et al. 2022) aimed at detecting parts of the text corresponding to the opinion holder, the expression, and the target, as the organizers illustrate in the following Figure.

The setup of structured sentiment analysis or aspect-based sentiment analysis is older and more established than structured emotion analysis. However, transfering sentiment analysis to emotion analysis is not entirely straight-forward. One reason is that tasks do not perfectly align:

Detecting an opinion holder in sentiment analysis totally makes sense. Such thing like an “emotion holder” does, however not really exist. It would be the person experiencing or feeling an emotion, to whom we could refer as the emoter (we could also say feeler, but that word is more ambiguous). This also shows one difference between emotion analysis and sentiment analysis in the sense of opinion analysis - expressing an emotion is often not a voluntary process (sometimes not even conscious), while this is more often the case for an opinion. Also, opinions could develop more out of a conscious cognitive process.
The aspect/target in sentiment analysis might correspond to two things in emotion analysis. It can be a target, I can be angry at something or someone, that is not necessarily the cause of that emotion. I can be angry at a friend, because she did eat my emergency supply of chocolate. But I cannot be sad at somebody. In emotion analysis, we care more about the stimulus or cause of an emotion.
The evaluative phrase in sentiment analysis (something is good or bad) pretty clearly corresponds to emotion words (something makes someone sad or happy).

To understand the relations between these tasks, we conducted the project SEAT (Structured Multi-Domain Emotion Analysis from Text) between 2017 and 2021, to which CEAT is the successor (which started in 2021).

Data Sets and Methods for Full Structured Emotion Analysis

There are now a couple of data sets available to develop systems that detect emoters and causes. Recently, the project SRL4E aggregated several of them into a common format (Campagnano, Conia, and Navigli 2022), including the ones by Gao et al. (2017), Liew, Turtle, and Liddy (2016), Mohammad, Zhu, and Martin (2014), and Aman and Szpakowicz (2007). I will focus in this blog post on our own work, namely Kim and Klinger (2018), Kim and Klinger (2019a), and Bostan, Kim, and Klinger (2020). Not part of SRL4E is x-enVENT (Troiano et al. 2022), because it has been published more recently, but we will also talk about this.

The two corpora by Kim and Klinger (2018) and Bostan, Kim, and Klinger (2020) aimed at developing resources that enable the development of models that recognize emotion labels for all potential emoters mentioned in the text and the relations between them (that one entity is part of a target or a cause):

There are two main differences in these data:

Annotation Procedure (crowdsourcing vs. carefully trained annotators)
Domain (Literature vs. News headlines)

The REMAN corpus

When we started with the annotation of the REMAN corpus, we were involved in the CRETA Project, a platform that combined multiple projects from the digital humanities. There was some focus on literary studies, and therefore we decided to annotate literature. We chose Project Gutenberg, because of its relative diversity and accessability. However, literature comes with challenges - it’s not exactly written to communicate facts concisely and clearly. Emotion causes and the associated roles can be distributed across longer text passages, and we expected the annotation to be difficult, because of the artistic style. This lead to some decisions:

Each data instance consists of three sentences, in which the one in the middle is selected to contain the emotion expression. The sentences around would only be annotated for roles.
We performed the annotation with students from our institute with whom we could interactively discuss the task (before fixing the annotation guidelines and letting them annotate independently).

These decisions lead to quite some ok inter-annotator agreement, but was still clearly below tasks that are more factual. Particularly detecting the cause spans was challenging. We attributed this to the subjective nature of emotions and the domain being quite challenging.

The GoodNewsEveryone corpus

For the GoodNewsEveryone corpus, we decided therefore to move to crowdsourcing, to be able to retrieve multiple subjective annotations which were then aggregated. Emotion role labeling is a structured task, and this required a multi-step annotation procedure. To not make the data more difficult than necessary, we chose a domain that is characterized by short instances: news headlines. This came, however, with another set of difficulties, namely the missing context. We did not anticipate that it would be so hard for annotators to interpret specific events. That was particularly the case when annotators from the US were tasked to annotate UK headlines (or the other way around).

Simplifying Role Labeling to Stimulus Detection or Entity-Specific Predictions

To our knowledge, up until today, there is no work on fully extracting emotion role graphs automatically. The most popular subtask is arguably emotion cause/stimulus detection, in which the part of the text is to be detected that describes what caused an emotion. In Mandarin, it is common to formulate the task as clause classification. It seems that in English, stimuli are often described with non-consecutive text passages or cannot be mapped clearly to clauses. Therefore, in English, it is more common to detect emotion stimuli on the token level (L. A. M. Oberländer and Klinger 2020). We also worked on stimulus detection quite a bit, as part of the corpus papers mentioned above, and additionally in German (Doan Dang, Oberländer, and Klinger 2021). We also wanted to understand if knowledge about the roles can improve emotion classification (L. Oberländer, Reich, and Klinger 2020) (yes), and how emotions are actually ascribed to a character in literature (Kim and Klinger 2019b) (depends on the emotion category).

We decided to additionally follow another research direction. While, clearly, the emotion stimulus plays an important role as the trigger to the affective sensation, there is no emotion without the person experiencing it. If we believe that emotions help us in surviving in a social world, we also need to put the entity that feels something on the spot. Our first attempt was Kim and Klinger (2019a), in which we annotated the data based on entity relations.

We left it to the automatic model to figure out which parts of the text are important to decide which emotion somebody feels and took the stance that the relation between characters is important to be analyzed. We did that with a pipeline model, which detects entities, assigns emotions, and aggregates them in a graph.

This work follows the motivation to analyze social networks most directly, because in the evaluation of the model, we evaluated on the network level - it was fine to miss some entity relation, as long as we find it somewhere in the text.

The second and more recent paper acted as an aggregating element between our work of appraisal-based emotion analysis and emotion role labeling (Troiano et al. 2022). We went back to in-house annotations based on trained experts, because we wanted to acquire entity-specific emotion and appraisal annotations which we needed to first develop together with annotators. Therefore, this paper also acted as a preliminary study to Troiano, Oberländer, and Klinger (2023) which we already discussed in a previous blog post. Here, we reannotated a corpus of event reports (based on Troiano, Padó, and Klinger (2019) and Hofmann et al. (2020)), but only from the perspective of the author (the person who lived through the event and told us about that), but in addition from the perspective of every person participating in the event.

We left the relation between entities underspecified, but in the analysis of the data, that can be quite clearly observed, on the emotion and the appraisal-level. For instance, when one person is annotated to feel responsible, that decreases the probability that the other person is also responsible. As self/other-responsibility is an appraisal dimension known to be relevant for the development of guilt, shame, and pride, this also influences the emotion. We also did perform automatic modeling experiments, which very clearly showed that simple text classification does not entirely capture the emotional content of text - it conflates multiple emotional dimensions into one (Wegge et al. 2022).

Summary

In this blog post, I summarized the work we performed on emotion role labeling which is a way to represent emotions described in text. In contrast to text classification, it is more powerful to accurately represent what’s in the text, but the modeling is also more challenging. Because of that, various subtasks have been defined, including experiencer specific emotion detection and stimulus detection; which both focus just on one role.

Why is this important? Most of what I wrote about is about resources, and only a bit about modeling and automatic systems. Before automatic systems can be developed, we need corpora, not only to train models, but also as a process to understand the phenomenon. I think that the emotion role labeling formalism is powerful enough to represent all relevant aspects of emotions as they are expressed in text, but it is challenging to create high-quality corpora. Further, it is challenging because sometimes, a simulus cannot be exactly located in text. Emotions do not develop just based on one single event that can be referred to with a name or a short text. That might be ok in news data, but in literature, an event can be described with many more words, perhaps stretching over pages or even a whole book.

What comes next? Some data sets exist now, and we have a good understanding of the challenges in annotation. For each subtask, there also exist various modeling approaches. We have also seen that emotion classification and role detection influence each other (L. A. M. Oberländer and Klinger 2020). While emotion stimulus detection and emotion classification is very commonly addressed as a joint modeling task now in Mandarin (Xia and Ding 2019), we do not yet have joint models that find all roles and emotion categories together. Such structured prediction models might not only provide a better understanding of what’s expressed in text than single predictions, the quality of models on each subtask might also improve, because the various variables interact. In my opinion, developing such models is still one of the most important tasks in emotion analysis from text. This will not only help to develop better performing natural language understanding systems. It can also contribute to develop a better understanding of the realization of emotions in text.

Acknowledgements

I would like to thank all my collaborators with whom I worked on role labeling and emotions. These are (in no specific order) Laura Oberländer, Evgeny Kim, Bao Minh Doan Dang, Kevin Reich, Max Wegge, Valentino Sabbatino, Amelie Heindl, Jeremy Barnes, Tornike Tsereteli, and Enrica Troiano.

This work has been funded by the German Research Council (DFG) in the project SEAT, KL 2869/1-1.

Bibliography

Aman, Saima, and Stan Szpakowicz. 2007. “Identifying Expressions of Emotion in Text.” In Text, Speech and Dialogue, edited by Václav Matoušek and Pavel Mautner, 196–205. Berlin, Heidelberg: Springer Berlin Heidelberg.

Barnes, Jeremy, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, and Erik Velldal. 2022. “SemEval 2022 Task 10: Structured Sentiment Analysis.” In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), 1280–95. Seattle, United States: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.semeval-1.180.

Barth, Florian, Evgeny Kim, Sandra Murr, and Roman Klinger. 2018. “A Reporting Tool for Relational Visualization and Analysis of Character Mentions in Literature.” In Book of Abstracts – Digital Humanities Im Deutschsprachigen Raum. Cologne, Germany. http://dhd2018.uni-koeln.de/wp-content/uploads/boa-DHd2018-web-ISBN.pdf.

Bostan, Laura Ana Maria, Evgeny Kim, and Roman Klinger. 2020. “GoodNewsEveryone: A Corpus of News Headlines Annotated with Emotions, Semantic Roles, and Reader Perception.” In Proceedings of the 12th Language Resources and Evaluation Conference, 1554–66. Marseille, France: European Language Resources Association. https://www.aclanthology.org/2020.lrec-1.194.

Buechel, Sven, and Udo Hahn. 2017. “EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis.” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 578–85. Valencia, Spain: Association for Computational Linguistics. https://aclanthology.org/E17-2092.

Campagnano, Cesare, Simone Conia, and Roberto Navigli. 2022. “SRL4E – Semantic Role Labeling for Emotions: A Unified Evaluation Framework.” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 4586–4601. Dublin, Ireland: Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.314.

Doan Dang, Bao Minh, Laura Oberländer, and Roman Klinger. 2021. “Emotion Stimulus Detection in German News Headlines.” In Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), 73–85. Düsseldorf, Germany: KONVENS 2021 Organizers. https://aclanthology.org/2021.konvens-1.7.

Gao, Qinghong, Jiannan Hu, Ruifeng Xu, Gui Lin, Yulan He, Qin Lu, and Kam-Fai Wong. 2017. “Overview of NTCIR-13 ECA Task,” 6.

Hofmann, Jan, Enrica Troiano, Kai Sassenberg, and Roman Klinger. 2020. “Appraisal Theories for Emotion Classification in Text.” In Proceedings of the 28th International Conference on Computational Linguistics, 125–38. Barcelona, Spain (Online): International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.11.

Kim, Evgeny, and Roman Klinger. 2018. “Who Feels What and Why? Annotation of a Literature Corpus with Semantic Roles of Emotions.” In Proceedings of the 27th International Conference on Computational Linguistics, 1345–59. Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://aclanthology.org/C18-1114.

———. 2019a. “Frowning Frodo, Wincing Leia, and a Seriously Great Friendship: Learning to Classify Emotional Relationships of Fictional Characters.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 647–53. Minneapolis, Minnesota: Association for Computational Linguistics. https://www.aclanthology.org/N19-1067.

———. 2019b. “An Analysis of Emotion Communication Channels in Fan-Fiction: Towards Emotional Storytelling.” In Proceedings of the Second Workshop on Storytelling, 56–64. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-3406.

Liew, Jasy Suet Yan, Howard R. Turtle, and Elizabeth D. Liddy. 2016. “EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 1149–56. Portorož, Slovenia: European Language Resources Association (ELRA). https://aclanthology.org/L16-1183.

Mohammad, Saif, Xiaodan Zhu, and Joel Martin. 2014. “Semantic Role Labeling of Emotions in Tweets.” In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 32–41. Baltimore, Maryland: Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-2607.

Oberländer, Laura Ana Maria, and Roman Klinger. 2020. “Token Sequence Labeling Vs. Clause Classification for English Emotion Stimulus Detection.” In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, 58–70. Barcelona, Spain (Online): Association for Computational Linguistics. http://www.romanklinger.de/publications/OberlaenderKlingerSTARSEM2020.pdf.

Oberländer, Laura, Kevin Reich, and Roman Klinger. 2020. “Experiencers, Stimuli, or Targets: Which Semantic Roles Enable Machine Learning to Infer the Emotions?” In Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. Barcelona, Spain: Association for Computational Linguistics. https://www.aclanthology.org/2020.peoples-1.12/.

Troiano, Enrica, Laura Ana Maria Oberlaender, Maximilian Wegge, and Roman Klinger. 2022. “X-enVENT: A Corpus of Event Descriptions with Experiencer-Specific Emotion and Appraisal Annotations.” In Proceedings of the Language Resources and Evaluation Conference, 1365–75. Marseille, France: European Language Resources Association. https://aclanthology.org/2022.lrec-1.146.

Troiano, Enrica, Laura Oberländer, and Roman Klinger. 2023. “Dimensional Modeling of Emotions in Text with Appraisal Theories: Corpus Creation, Annotation Reliability, and Prediction.” Computational Linguistics 49 (1). https://doi.org/10.1162/coli_a_00461.

Troiano, Enrica, Sebastian Padó, and Roman Klinger. 2019. “Crowdsourcing and Validating Event-Focused Emotion Corpora for German and English.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4005–11. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1391.

Wegge, Maximilian, Enrica Troiano, Laura Ana Maria Oberlaender, and Roman Klinger. 2022. “Experiencer-Specific Emotion and Appraisal Prediction.” In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), 25–32. Abu Dhabi, UAE: Association for Computational Linguistics. https://aclanthology.org/2022.nlpcss-1.3.

Xia, Rui, and Zixiang Ding. 2019. “Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1003–12. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1096.

[Download this post as PDF]