المفاهيم الأساسية
Mutual information maximization enhances CoT distillation for improved reasoning in smaller models.
الإحصائيات
"Our method outperforms DSS on ANLI, CQA, and SVAMP."
"Our method achieves an ECE of 4.35 in e-SNLI, significantly lower than DSS’s 8.54."
اقتباسات
"Our findings offer insightful guidance for future research on language model distillation."
"Our methodology demonstrably outperforms existing benchmarks across multiple datasets."