Core Concepts
Large language models (LLMs) can be effectively used to label data and train smaller, more efficient edge classifiers for computational social science tasks, even with limited human intervention and resources.
Stats
RED-CT outperforms LLM-generated labels in six of eight tests.
RED-CT outperforms base classifiers in all tests.
The system uses 10% or less of expert-labeled data.
Labeling the SemEval2016 dataset with GPT-4 could cost over $30 USD.
Confidence-informed sampling led to an average improvement of 6.75% over the base classifier using GPT-labeled data.