Khái niệm cốt lõi
Active learning techniques in text classification are influenced by various factors, making them effective only in specific circumstances.
Tóm tắt
In this empirical study, the authors evaluate active learning (AL) techniques for text classification over around 1000 experiments. They find that AL is effective only in certain situations due to the influence of factors like text representation and classifier choice. The study emphasizes the importance of considering metrics aligned with real-world expectations when assessing AL techniques.
1. Introduction
Active Learning (AL) aims to optimize labeling budgets.
AL techniques vary across datasets and classifiers.
Choice of text representation and classifier impacts AL effectiveness.
2. Previous Work
AL has seen contributions but faces challenges.
NLP domain poses additional challenges due to varied text representations.
3. Batch Active Learning - Overview
Pseudo-code provided for batch AL setting.
Model selection and calibration are crucial steps.
4. Comparison Methodology
Experiments vary classifiers, representations, batch sizes, seed sizes, and query strategies.
Detailed breakdown of prediction pipelines and query strategies used.
5. Reproducibility Experiments (Appendix)
Replication experiments conducted for CAL, REAL, DAL methods.
Results compared with reported findings from original papers.
Thống kê
In cases where labelling is expensive, using AL is cost-efficient compared to random sampling because the model reaches greater accuracy with a smaller number of labelled instances.
Trích dẫn
"We show that AL is only effective in a narrow set of circumstances."