Towards a General-Purpose Encoder for Speech, Audio Tagging, and Speaker Verification
A novel two-stage multi-task learning framework is proposed to build a general-purpose speech and audio encoder that jointly performs automatic speech recognition, audio tagging, and speaker verification.