核心概念
Large language models struggle to effectively distill text by removing forbidden variables while preserving other semantic content, posing challenges for computational social science investigations.
統計
"The dataset is approximately balanced in both sentiment and topic labels."
"Sentiment Accuracy ↓"
"Topic Accuracy ↑"
"Train Test Original Reviews LLM Classifier Accuracy"
"Train Test Distilled Reviews Classifier Accuracy"
引用
"No distillation"
"Mean projection"
"Mistral 7B"
"GPT 4"