MatchXML introduces an efficient framework for extreme multi-label text classification, utilizing dense label embeddings and fine-tuned Transformer models. The method outperforms competitors in accuracy and training speed across various datasets.
The content discusses the challenges of eXtreme Multi-label text Classification (XMC) and proposes MatchXML as a solution. It focuses on the generation of dense label embeddings, hierarchical label trees, and text-label matching using bipartite graphs. Experimental results show superior performance compared to existing methods.
Key points include the use of label2vec for semantic dense label embeddings, Hierarchical Label Tree construction, and the formulation of multi-label text classification as a text-label matching problem. MatchXML achieves state-of-the-art accuracies on multiple datasets by combining sparse TF-IDF features with dense vector features.
The proposed method involves training dense label vectors, constructing Hierarchical Label Trees, fine-tuning Transformer models, and utilizing static sentence embeddings. By combining different types of features, MatchXML demonstrates improved performance in extreme multi-label text classification tasks.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Hui Ye,Rajsh... kl. arxiv.org 03-12-2024
https://arxiv.org/pdf/2308.13139.pdfDybere Forespørgsler