Knowledge Distillation

サインイン

インサイト - Knowledge Distillation

Block-wise Logit Distillation for Feature-level Alignment in Knowledge Distillation

This paper proposes a novel knowledge distillation framework called Block-wise Logit Distillation (Block-KD) that bridges the gap between logit-based and feature-based distillation methods, achieving superior performance by implicitly aligning features through a series of intermediate "stepping-stone" models.

서로 다른 이미지 데이터셋으로 학습된 여러 교사 모델의 다단계 특징 증류 기법

서로 다른 데이터셋으로 학습된 여러 교사 모델의 지식을 결합하여 단일 학생 모델로 전이하는 다단계 특징 증류(MLFD) 기법을 제시하며, 이를 통해 단일 데이터셋 학습 모델 대비 성능 향상을 달성할 수 있다.

Knowledge Distillation in High-Dimensional Regression: Analyzing Weak-to-Strong Generalization and Scaling Laws

In high-dimensional linear regression, strategically crafting a "weak teacher" model for knowledge distillation can outperform training with true labels, but it cannot fundamentally change the data scaling law.

Self-guided Iterative Knowledge Distillation (SIKeD) for Improved Mathematical Reasoning in Smaller Language Models

SIKeD, a novel iterative knowledge distillation technique, enhances the mathematical reasoning abilities of smaller language models by addressing the limitations of traditional distillation methods and enabling the models to effectively learn and select from multiple reasoning strategies.

Correlation-Aware Knowledge Distillation (CAKD): Optimizing Knowledge Transfer by Decoupling Kullback-Leibler Divergence

The CAKD framework enhances knowledge distillation in neural networks by decoupling the Kullback-Leibler (KL) divergence loss function, allowing for targeted emphasis on critical elements and improving knowledge transfer efficiency from teacher to student models.

Preview-based Category Contrastive Learning for Knowledge Distillation in Convolutional Neural Networks

This paper introduces PCKD, a novel knowledge distillation method for convolutional neural networks that improves student network performance by transferring knowledge from teacher networks using a category contrastive learning approach and a preview-based learning strategy to handle samples of varying difficulty.

Teacher-Assistant-Student Knowledge Distillation for Cross-Architecture Neural Networks

This paper introduces TAS, a novel knowledge distillation method that uses a hybrid assistant model to bridge the gap between teacher and student networks with different architectures, enabling efficient knowledge transfer in cross-architecture knowledge distillation (CAKD).

Distilling Invariant Representations with Dual Augmentation: Preliminary Findings (Discontinued Project)

Dual augmentation in knowledge distillation, where different augmentations are applied to teacher and student models, improves the transfer of invariant representations, leading to more robust and generalizable student models, especially in same-architecture settings.

Correlation Matching Knowledge Distillation for Efficient Learning from Stronger Teacher Models

Knowledge distillation (KD) methods based on Kullback-Leibler (KL) divergence often struggle to effectively transfer knowledge from larger, more accurate teacher models to smaller student models due to capacity mismatch and the implicit alteration of inter-class relationships. This paper introduces Correlation Matching Knowledge Distillation (CMKD), a novel approach that leverages both Pearson and Spearman correlation coefficients to address these limitations and achieve more efficient and robust distillation from stronger teacher models.

Progressive Distillation in Neural Networks: Accelerating Training through an Implicit Curriculum of Easy-to-Learn Subtasks

Progressive distillation, a technique where a student model learns from intermediate checkpoints of a teacher model, accelerates training by implicitly providing a curriculum of easier-to-learn subtasks, as demonstrated through theoretical analysis and empirical results on sparse parity, probabilistic context-free grammars (PCFGs), and real-world language modeling tasks.

会社概要

プロダクト

リソース