Kernel looping, a novel compiler optimization technique, significantly enhances the inference performance of large language models on reconfigurable dataflow architectures by eliminating synchronization overheads and maximizing memory bandwidth utilization.
Jointly fine-tuning a high-level planner with a low-level language model, using a novel soft-selection method for action embeddings, improves language modeling performance, particularly perplexity.
Addax is a novel optimization algorithm designed for fine-tuning large language models (LLMs) that addresses the memory limitations of traditional methods like Adam while achieving faster convergence and better performance than memory-efficient alternatives like MeZO.
대규모 언어 모델 (LLM)을 사용하여 모듈형 자연어 처리 (NLP) 시스템을 최적화할 때, LLM 가중치 미세 조정과 프롬프트 최적화를 결합한 BetterTogether 전략이 각 방법을 개별적으로 사용하는 것보다 성능이 크게 향상됩니다.
Alternating between prompt optimization and fine-tuning (BetterTogether approach) significantly improves the performance of modular language model pipelines across various NLP tasks.
Introducing Smart, a framework to minimize inference costs of Large Language Models while ensuring accuracy guarantees.