Core Concepts
提案されたSpeechDPRは、オープンドメインの音声質問応答のためのエンドツーエンドモデルであり、教師モデルからの知識蒸留により競争力のあるパフォーマンスを達成しました。
Abstract
ABSTRACT
SQA is essential for machines to reply to user's questions by finding the answer span within a given spoken passage.
OpenSQA requires retrieving passages from a spoken archive before performing SQA.
SpeechDPR proposes an end-to-end framework for openSQA retrieval without manual transcriptions.
INTRODUCTION
SQA aims to find the answer span in audio waveforms.
OpenSQA involves finding passages containing answers from a large spoken dataset.
Text-based TQA tasks are usually achieved by cascading text retrievers.
PROPOSED APPROACH
SpeechDPR model includes SSL speech encoder, feature processor, question and passage sentence encoders.
Knowledge distillation from Cascading Teacher model improves training performance.
EXPERIMENTS
Data setup similar to openTQA research with SLUE-SQA-5 and Spoken Wikipedia datasets.
Evaluation based on top-K retrieval accuracy and FF1 score for openSQA tasks.
RESULTS
Retrieval results:
SpeechDPR achieves competitive accuracy compared to cascading baselines.
OpenSQA results:
Similar FF1 scores between SpeechDPR and baselines, but ensemble model outperforms both.
CONCLUSION
SpeechDPR offers a robust solution for semantic retrieval in openSQA tasks without relying on ASR modules directly.
Stats
UASRとTDRの連結モデルに比べて、SpeechDPRは競争力のある精度を達成しました。
初期実験では、SpeechDPRがCascading Teacherから知識を抽出することが重要であることが示されました。