Hierarchical text classification is essential in text mining. TELEClass enriches label taxonomy with topical terms, outperforming weakly-supervised methods and LLM-based zero-shot prompting on public datasets.
Hierarchical text classification categorizes documents into multiple classes. TELEClass enhances label taxonomy with class-indicative terms mined from the corpus. It utilizes LLMs for data annotation and creation tailored for the hierarchical label space. The method outperforms previous approaches on two public datasets.
Most earlier works focus on fully or semi-supervised methods requiring human annotated data. TELEClass minimizes supervision by using only class names of each node. Large language models show competitive performance but struggle in hierarchical settings due to structural information loss.
The proposed TELEClass method combines taxonomy enrichment and LLM enhancement for weakly-supervised hierarchical text classification. Experiments demonstrate its superiority over existing methods on public datasets.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы