Concetti Chiave
Dual-encoder models can outperform SOTA methods in extreme multi-label classification tasks by using decoupled softmax loss and soft top-k operator-based loss.
Sintesi
The content discusses the use of dual-encoder models for extreme multi-label classification tasks. It highlights the limitations of existing contrastive losses and proposes new loss functions to improve performance. The study includes experiments on synthetic datasets and large benchmarks, showcasing the effectiveness of the proposed approach.
Directory:
- Abstract
- DE models are effective in retrieval tasks but underexplored in XMC.
- Proposed decoupled softmax loss and soft top-k operator-based loss.
- Introduction
- DE models for openQA systems.
- XMC scenarios require memorization and generalization.
- Background: Multi-Label Classification
- Definition of query-document relevance distribution.
- Description of DE models and classification networks.
- Improved Training of Dual-Encoder Models
- Limitations of standard contrastive losses for XMC problems.
- Proposal of DecoupledSoftmax loss and SoftTop-k operator-based loss.
- Experiments
- Comparison with existing XMC methods on various datasets.
- Conclusions & Limitations
Statistiche
Current empirical evidence indicates that DE models fall significantly short on XMC benchmarks, where SOTA methods linearly scale the number of learnable parameters with the total number of classes (documents in the corpus) by employing per-class classification head.
When trained with proposed loss functions, standard DE models alone can match or outperform SOTA methods by up to 2% at Precision@1 even on the largest XMC datasets while being 20× smaller in terms of trainable parameters.
Citazioni
"Our work shows that pure DE models can indeed match or even outperform SOTA XMC methods by up to 2% even on the largest public XMC benchmarks while being 20× smaller in model size."