Chloé Clavel

Professor in Affective Computing


SOFTWARE resources

EZCAT: an Easy Conversation Annotation Tool:  annotation tool in conversations developed by Gaël Guibon during his post-doc in collaboration with SNCF

BaryScore, InfoLM: new automatic metrics developed by Pierre Colombo during his PhD for evaluating Natural Language Generation

dialign : a software developed during his post-doc  by Guillaume Dubuisson-Duplessis for the computation of generic measures of verbal alignment in dyadic dialogue based on sequential pattern mining

noytext : a text annotation web-application developed by Luis Gasco during his research stay at Telecom-ParisTech for opinion analysis in social network.

Corpora and benchmarks


The Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE (SILICONE) benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems specifically designed for spoken language. All datasets are in the English language and cover a variety of domains including daily life, scripted scenarios, joint task completion, phone call conversations, and television dialogues. Some datasets additionally include emotion and/or sentiment labels.

Emile Chapuis, Pierre Colombo, Matteo Manica, Matthieu Labeau, Chloe Clavel, Hierarchical Pre-training for Sequence Labelling in Spoken Dialog, Findings of EMNLP 2020,

POTUS corpus

The POTUS Corpus is composed of high-quality audio-video files of political addresses to the American people. The database is divided into two sets:

  • speeches of former president Barack Obama to the American people.
  • videos of these same speeches given by a virtual agent named Rodrigue. The ECA reproduces the original address as closely as possible using social signals automatically extracted from the original one.

Both are annotated for social attitudes, providing information about the stance observed in each file. It also provides the social signals automatically extracted from Obama’s addresses used to generate Rodrigue’s ones.

Janssoone, Thomas, Kévin Bailly, Gaël Richard, and Chloé Clavel. « The POTUS Corpus, a database of weekly addresses for the study of stance in politics and virtual agents. » In Proceedings of The 12th Language Resources and Evaluation Conference, pp. 1546-1553. 2020,

UE-HRI dataset

Recordings of interactions between humans and the robot Pepper

Atef Ben Youssef, Miriam Bilac, Slim Essid, Chloé Clavel, Angelica Lim, Marine Chamoux, UE-HRI: A New Dataset for the Study of User Engagement in Spontaneous Human-Robot Interactions In Proceedings of ACM International Conference on Multimodal Interaction, Glasgow, Scotland, November 2017 (ICMI17)

SAFE Corpus

Situation Analysis in a Fictional and Emotional Corpus: The SAFE corpus contains 7 h of audiovisual sequences in English extracted from fiction movies from various genres (action, thriller, historical drama). The movies have been chosen according to the following criteria: they illustrate various threat situations and contain realistic portrayals of strong emotional manifestations. A corpus sequence corresponds to a movie section illustrating one type of situation – kidnapping, physical aggression, flood, etc. The corpus was annotated according to a task-dependant and audio-based annotation strategy with various levels of accuracy defined in order to describe emotional diversity in fiction

How to download?

  1. Read the user Licence : User License

References :

C. Clavel, I. Vasilescu, and L. Devillers. Fiction supports for realistic portrayals of fear-type emotional manifestations. Computer Speech and Language, 25(1) :63–83, January 2011.

C. Clavel, I. Vasilescu, L. Devillers, T. Ehrette, and G. Richard. Safe corpus : fear-type emotions detection for surveillance application. In Proc. of LREC, pages 1099–1104, Genoa, 2006.