Core Concepts
This paper presents an enhanced version of the MEDIA benchmark dataset for French Spoken Language Understanding (SLU), with newly added intent annotations. It also provides baseline results for joint intent classification and slot-filling models on this enhanced dataset, using both manual transcriptions and automatic speech recognition outputs.
Abstract
The paper presents an enhanced version of the MEDIA benchmark dataset for French Spoken Language Understanding (SLU). The original MEDIA dataset, distributed by ELRA since 2005, is a French dataset mainly used by the French research community and free for academic research since 2020. It is annotated with semantic concepts (slots) but not with intents.
To extend the use of MEDIA to more tasks and use cases, the authors built an enhanced version of the dataset annotated with intents. They used a semi-automatic approach based on a tri-training algorithm with Transformer-based models to obtain these intent annotations. The resulting enhanced MEDIA dataset contains 11 intent labels, with some utterances associated with multiple intents.
The paper then presents baseline results for joint intent classification and slot-filling models on this enhanced MEDIA dataset, using both manual transcriptions and automatic speech recognition (ASR) outputs. For manual transcriptions, the authors explore different French Transformer models (CamemBERT, FrALBERT, FlauBERT) and compare cascade and end-to-end architectures. The results show that end-to-end models perform better on intent classification, while cascade models are more competitive on the slot-filling task.
For ASR outputs, the authors use a cascade approach with a speech encoder followed by the joint intent and slot-filling model. They also explore different speech encoders, including SAMU-XLSR, SAMU-XLSRIT⊕FR, and LeBenchmark FR 3k large. The results indicate that end-to-end models can achieve state-of-the-art performance on the slot-filling task for the MEDIA 2022 full version.
Overall, this work provides a new enhanced version of the MEDIA dataset and establishes baselines for joint intent classification and slot-filling on this dataset, paving the way for further research in French Spoken Language Understanding.
Stats
The MEDIA dataset represents 1258 official recorded dialogues from 250 different speakers and about 70 hours of conversations.
The MEDIA dataset has 83 attributes and 19 specifiers, resulting in 1121 possible attribute/specifier pairs.
The enhanced MEDIA dataset contains 11 intent labels, with some utterances associated with multiple intents.