Dynamic Cross Attention for Audio-Visual Person Verification
統計
論文では、「Extensive experiments are conducted on the Voxceleb1 dataset」が記載されています。
「Results indicate that the proposed model consistently improves the performance on multiple variants of cross-attention while outperforming the state-of-the-art methods」と述べられています。
「The proposed DCA model adds more flexibility to the CA framework and improves the fusion performance even when the modalities exhibit weak complementary relationships」と述べられています。
「The performance boost of the proposed model is more emphasized in CA [23] than JCA [9] with a relative improvement of 9.3% for CA and 2.9% for JCA in terms of EER」と述べられています。
「The proposed DCA model consistently boosts the performance of both variants of CA: CA [23] and JCA [9], while achieving state-of-the-art performance」と述べられています。
引用
"Audio and visual modalities are often expected to complement each other, which plays a crucial role in outperforming unimodal approaches."
"Dynamic Cross Attention (DCA) model can dynamically select the cross-attended or unattended features on the fly based on strong or weak complementary relationships."
"The proposed DCA model adds more flexibility to the CA framework and improves the fusion performance even when modalities exhibit weak complementary relationships."
"The performance boost of the proposed model is more emphasized in CA [23] than JCA [9] with a relative improvement of 9.3% for CA and 2.9% for JCA in terms of EER."
"The proposed DCA model consistently boosts the performance of both variants of CA: CA [23] and JCA [9], while achieving state-of-the-art performance."