Knowledge Distillation

洞察 - Knowledge Distillation

Graph Knowledge Distillation to Mixture of Experts for Efficient Node Classification

This research proposes a novel knowledge distillation technique using a specialized Mixture-of-Experts (MoE) model, called Routing-by-Memory (RbM), to improve the efficiency of node classification in Graph Neural Networks (GNNs) while maintaining accuracy.

Unnatural Data, Natural Teachers: Exploring Surrogate Datasets for Effective Knowledge Distillation

Successful knowledge distillation depends on sufficient sampling of the teacher model's output space and decision boundaries, and surprisingly, even unconventional datasets like unoptimized synthetic imagery can be effective when these criteria are met.

Multi-perspective Contrastive Logit Distillation for Improved Knowledge Transfer in Neural Networks

Multi-perspective Contrastive Logit Distillation (MCLD) leverages contrastive learning to improve knowledge transfer from teacher to student models in neural networks by comparing logits from multiple perspectives, leading to enhanced performance and transferability without relying heavily on classification task loss.

Dual-Head Knowledge Distillation: Using an Auxiliary Head to Improve Logits Utilization for Enhanced Model Compression

Combining probability-level and logit-level knowledge distillation losses can hinder performance due to conflicting gradients; the proposed Dual-Head Knowledge Distillation (DHKD) method overcomes this by using separate classification heads for each loss, improving knowledge transfer and student model accuracy.

Quantifying Knowledge Distillation in Machine Learning Using Partial Information Decomposition

This research paper introduces a novel information-theoretic framework for quantifying and optimizing the transfer of task-relevant knowledge during knowledge distillation in machine learning.

블랙박스 대형 언어 모델의 지식 증류: 프록시 모델을 활용한 효율적인 지식 전이 방법론 제시

폐쇄된 대형 언어 모델(LLM)에서 지식을 효율적으로 추출하기 위해 프록시 모델을 활용한 지식 증류 기법인 Proxy-KD를 소개합니다. Proxy-KD는 프록시 모델을 블랙박스 LLM에 정렬시킨 후, 이를 활용하여 소형 LLM에 지식을 전이합니다. 실험 결과, Proxy-KD는 기존의 블랙박스 및 화이트박스 지식 증류 기법보다 성능이 뛰어나, 폐쇄된 LLM 활용의 새로운 가능성을 제시합니다.

Over-Parameterization Distillation Framework (OPDF) for Knowledge Distillation Using Matrix Product Operators (MPO)

Over-parameterizing student models during knowledge distillation using Matrix Product Operators (MPO) enhances their performance without increasing inference latency, effectively transferring knowledge from larger teacher models.

Performance-Guided Knowledge Distillation from Large Language Models for Efficient Multi-Class Text Classification

Performance-Guided Knowledge Distillation (PGKD) leverages the power of large language models (LLMs) to improve the accuracy and efficiency of smaller models for multi-class text classification tasks, particularly with limited labeled data, while significantly reducing inference costs and latency.

Block-wise Logit Distillation for Feature-level Alignment in Knowledge Distillation

This paper proposes a novel knowledge distillation framework called Block-wise Logit Distillation (Block-KD) that bridges the gap between logit-based and feature-based distillation methods, achieving superior performance by implicitly aligning features through a series of intermediate "stepping-stone" models.

서로 다른 이미지 데이터셋으로 학습된 여러 교사 모델의 다단계 특징 증류 기법

서로 다른 데이터셋으로 학습된 여러 교사 모델의 지식을 결합하여 단일 학생 모델로 전이하는 다단계 특징 증류(MLFD) 기법을 제시하며, 이를 통해 단일 데이터셋 학습 모델 대비 성능 향상을 달성할 수 있다.

关于

产品

资源