Robust Target-Speaker Voice Activity Detection Tolerant to Speaker Profile Errors
The proposed Profile-Error-Tolerant Target-Speaker Voice Activity Detection (PET-TSVAD) model is robust to speaker profile errors introduced in the first pass diarization, outperforming the existing TS-VAD models on both the VoxConverse and DIHARD-I datasets.