核心概念
Hierarchical text classification with minimal supervision using taxonomy enrichment and LLM enhancement.
摘要
Hierarchical text classification is a crucial task in natural language processing. TELEClass proposes a method that enriches the label taxonomy with class-indicative terms mined from the corpus to improve classifier training. By leveraging large language models (LLMs) for data annotation and tailored creation, TELEClass outperforms previous weakly-supervised methods on public datasets.
统计
hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy.
most earlier works focus on fully or semi-supervised methods that require human annotated data.
large language models show competitive performance through zero-shot prompting but struggle in hierarchical settings.
TELEClass can outperform previous weakly-supervised methods and LLM-based zero-shot prompting methods on two datasets.