Grunnleggende konsepter
Integrating contrastive learning on intermediate feature maps within a multi-scale feature aggregation architecture significantly improves speaker verification accuracy by enhancing the discriminative power of speaker embeddings.
Statistikk
MFCon loss achieves a 9.05% improvement in equal error rate (EER) compared to the standard MFA-Conformer on the VoxCeleb-1O test set.
MFCon achieves an EER of 2.52% on the VoxCeleb1-O benchmark.
AM-Softmax achieves an EER of 2.65% on the VoxCeleb1-O benchmark.
AM-SupCon achieves an EER of 2.56% on the VoxCeleb1-O benchmark.
Combining MFCon with AMSupCon achieves an EER of 2.41% on the VoxCeleb1-O benchmark.