The researchers describe a general framework for the automatic generation of scenarios for goal-oriented dialogue systems. They outline a method for preprocessing dialog data in JSON format, compare two intent extraction methods based on BERTopic and Latent Dirichlet Allocation, and evaluate two algorithms for classifying user utterances in goal-oriented dialogue systems using logistic regression and BERT transformer models.
The preprocessing method combines dialog data from the MultiWOZ 2.2 dataset, extracts the utterance, intent, and speaker information, and creates a dataset with 9 intent categories. The BERTopic method is used to identify additional intent categories beyond those in the original dataset.
The comparison of the classification approaches shows that the BERT-based model with the bert-base-uncased configuration outperforms the logistic regression models in terms of Precision (0.80), F1-score (0.78), and Matthews correlation coefficient (0.74). The paper provides examples of intent classification for various user utterances.
The researchers plan to further investigate methods for extracting scenario blocks and developing a model for generating a scenario graph for goal-oriented dialogue systems in different application domains, while maintaining dialog context.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문