Kernekoncepter
Performance-Guided Knowledge Distillation (PGKD) leverages the power of large language models (LLMs) to improve the accuracy and efficiency of smaller models for multi-class text classification tasks, particularly with limited labeled data, while significantly reducing inference costs and latency.
Statistik
BERT-base + PGKD is up to 130X faster than LLMs for inference on the same classification task.
BERT-base + PGKD is 25X less expensive than LLMs for inference on the same classification task.
Claude Sonnet inference costs $0.38 per batch, for inputs averaging 1k tokens.
LLaMA 3 8B costs $0.06 per batch of inference.