Chloé Clavel

Senior Researcher (Directrice de recherche) INRIA Paris

Resources

SOFTWARE resources

Models and software:

LogiTorch is a PyTorch-based library for logical reasoning on natural language developed by Chadi Helwé during his Ph.D., it consists of:

  • Textual logical reasoning datasets
  • Implementations of different logical reasoning neural architectures
  • A simple and clean API that can be used with PyTorch Lightning

dialign : a software developed during his post-doc  by Guillaume Dubuisson-Duplessis for the computation of generic measures of verbal alignment in dyadic dialogue based on sequential pattern mining

ProtoSeq: a model developed by Gaël Guibon during his post-doc for few-shot emotion recognition in conversation with sequential prototypical networks

Evaluation metrics

Linguistic-diversity: Python scripts developed by Yanzhu Guo during her PhD for analyzing the linguistic diversity among collections of texts, specifically measuring lexical, semantic, and syntactic diversity. Code for the paper [The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text](https://aclanthology.org/2024.findings-naacl.228/) (Guo et al., Findings 2024)

BaryScore, InfoLM: new automatic metrics developed by Pierre Colombo during his PhD for evaluating Natural Language Generation

Demo

Meta-learning demo: a demo developed with Gaël Guibon (post-doc) and Hi!Paris engineering team to illustrate the collaboration with SNCF on meta-learning for emotion recognition in conversations

Corpora and benchmarks

SubstanReview

The SubstanReview dataset has been developed by Yanzhu Guo. It consists of 550 annotated peer reviews (440 in the train set and 110 in the test set). It is released in the json line format with two different fields:

« review » includes the textual content of the peer review.

« label » includes the annotated spans of claim-evidence pairs. A claim span and an evidence span forms a pair if they are assigned with the same number. For example, « Jus_neg_1 » is linked to « Eval_neg_1 ».

Resources for the paper: [Automatic Analysis of Substantiation in Scientific Peer Reviews](https://aclanthology.org/2023.findings-emnlp.684/) (Guo et al., Findings 2023)

HANNA Benchmark

HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation. HANNA contains annotations for 1,056 stories generated from 96 prompts from the WritingPrompts dataset. Each story was annotated by 3 raters on 6 criteria (Relevance, Coherence, Empathy, Surprise, Engagement, and Complexity), for a total of 19,008 annotations.

Additionally, we release the scores of those 1,056 stories evaluated by 72 automatic metrics and annotated by 4 different Large Language Models (Beluga-13B, Llama-13B, Mistral-7B, ChatGPT).

Resources for the paper « Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation », published in TACL.

Authors: Cyril Chhun, Fabian Suchanek and Chloé Clavel.

[Note: resources for the paper « Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation », accepted in COLING 2022, can be accessed in the coling branch here.]

3MT_French dataset

The 3MT_French dataset contains a set of public speaking annotations collected on a crowd-sourcing platform through a novel annotation scheme and protocol. Global evaluation, persuasiveness, perceived self-confidence of the speaker and audience engagement were annotated on different time windows (i.e., the beginning, middle or end of the presentation, or the full video). This new resource will be useful to researchers working on public speaking assessment and training. It will allow to fine-tune the analysis of presentations under a novel perspective relying on socio-cognitive theories rarely studied before in this context, such as first impressions and primacy and recency theories.More details about the dataset can be found here: https://doi.org/10.21203/rs.3.rs-2122814/v1

TURIN Annotations of Vernissage dataset

Annotations of trust on the Vernissage dataset (Human-robot interactions)Hulcelle, M., Varni, G., Rollet, N., & Clavel, C. (2021, September). TURIN: A coding system for Trust in hUman Robot INteraction.
In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 1-8). IEEE.

Annotations of sentiments of POM dataset

This dataset is an annotated variant of the Persuasive Opinion Multimedia (POM) corpus. It was developed for the opinion prediction task and includes opinion annotations at the expression and word levels. Expression-level annotations label the textual span of the opinion. Word-level annotations (e.g. holder, target, polarity) label the word components of the opinion. Further details can be found in (Garcia et al. 2019 (1)).

MIAM Benchmark

Multilingual dIalogAct benchMark is a collection of resources for training, evaluating, and analyzing natural language understanding systems specifically designed for spoken language. Datasets are in English, French, German, Italian and Spanish. They cover a variety of domains including spontaneous speech, scripted scenarios, and joint task completion. All datasets contain dialogue act labels.

[Code-switched inspired losses for spoken dialog representations](https://aclanthology.org/2021.emnlp-main.656) (Colombo et al., EMNLP 2021)

SILICONE BENCHMARK

The Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE (SILICONE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems specifically designed for spoken language. All datasets are in the English language and cover a variety of domains including daily life, scripted scenarios, joint task completion, phone call conversations, and television dialogues. Some datasets additionally include emotion and/or sentiment labels.

Emile Chapuis, Pierre Colombo, Matteo Manica, Matthieu Labeau, Chloe Clavel, Hierarchical Pre-training for Sequence Labelling in Spoken Dialog, Findings of EMNLP 2020,

POTUS corpus

The POTUS Corpus is composed of high-quality audio-video files of political addresses to the American people. The database is divided into two sets:

  • speeches of former president Barack Obama to the American people.
  • videos of these same speeches given by a virtual agent named Rodrigue. The ECA reproduces the original address as closely as possible using social signals automatically extracted from the original one.

Both are annotated for social attitudes, providing information about the stance observed in each file. It also provides the social signals automatically extracted from Obama’s addresses used to generate Rodrigue’s ones.

Janssoone, Thomas, Kévin Bailly, Gaël Richard, and Chloé Clavel. « The POTUS Corpus, a database of weekly addresses for the study of stance in politics and virtual agents. » In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 1546-1553. 2020,

UE-HRI dataset

Recordings of interactions between humans and the robot Pepper

Atef Ben Youssef, Miriam Bilac, Slim Essid, Chloé Clavel, Angelica Lim, Marine Chamoux, UE-HRI: A New Dataset for the Study of User Engagement in Spontaneous Human-Robot Interactions In Proceedings of ACM International Conference on Multimodal Interaction, Glasgow, Scotland, November 2017 (ICMI17)

SAFE Corpus

Situation Analysis in a Fictional and Emotional Corpus: The SAFE corpus contains 7 h of audiovisual sequences in English extracted from fiction movies from various genres (action, thriller, historical drama). The movies have been chosen according to the following criteria: they illustrate various threat situations and contain realistic portrayals of strong emotional manifestations. A corpus sequence corresponds to a movie section illustrating one type of situation – kidnapping, physical aggression, flood, etc. The corpus was annotated according to a task-dependant and audio-based annotation strategy with various levels of accuracy defined in order to describe emotional diversity in fiction

How to download?

  1. Read the user Licence : User License
  2. Download the SAFE corpus (I HAVE READ AND AGREED TO THE FOLLOWING TERMS AND CONDITIONS)

References :

C. Clavel, I. Vasilescu, and L. Devillers. Fiction supports for realistic portrayals of fear-type emotional manifestations. Computer Speech and Language, 25(1) :63–83, January 2011.

C. Clavel, I. Vasilescu, L. Devillers, T. Ehrette, and G. Richard. Safe corpus : fear-type emotions detection for surveillance application. In Proc. of LREC, pages 1099–1104, Genoa, 2006.

Annotation tools:

EZCAT: an Easy Conversation Annotation Tool:  annotation tool in conversations developed by Gaël Guibon during his post-doc in collaboration with SNCF

noytext : a text annotation web-application developed by Luis Gasco during his research stay at Telecom-ParisTech for opinion analysis in social network.