Language Model Optimization

サインイン

インサイト - Language Model Optimization

Efficient Sparse Mixture-of-Experts Models through Expert Pruning and Top-K Adaptation

This work introduces SEER-MoE, a two-stage framework that reduces the memory footprint and compute requirements of pre-trained Mixture-of-Experts (MoE) models. The first stage prunes the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss and reduce the number of activated experts during inference.

Efficient Autoregressive Decoding with Adaptive Feed Forward Skipping in Large Language Models

FFN-SkipLLM is a novel fine-grained skip strategy that can skip ~25-30% of feed-forward network (FFN) blocks in autoregressive large language models (LLMs) with marginal change in performance on knowledge-intensive tasks.

Eigenpruning: Improving Language Model Performance by Removing Singular Values from Weight Matrices

Eigenpruning is a method that removes singular values from weight matrices in large language models (LLMs) to improve their performance on specific tasks. This approach is inspired by interpretability methods that aim to automatically find subnetworks of a model that can effectively solve a given task.

CMAT: A Collaborative Multi-Agent Tuning Framework for Enhancing Small Language Models

The CMAT framework introduces a structured environment where individual agents with specialized roles and capabilities work together to process information, make decisions, and solve complex tasks, enabling more scalable and flexible training of language models.

Optimizing Large Language Model Inference Efficiency through Task Complexity Assessment

Introducing ComplexityNet, a framework that leverages fine-tuned smaller models to accurately assess task complexity and allocate tasks to the most appropriate Large Language Model, reducing computational resource usage by 90% while maintaining high code generation accuracy.

会社概要

製品｜リソース

インサイト