FRUGAL은 고차원 매개 변수 공간에서 효율적인 탐색을 가능하게 하여 대규모 언어 모델의 학습을 위한 메모리 효율적인 최적화 프레임워크를 제시하며, Adam과 같은 고급 알고리즘과 signSGD와 같은 상태 비유지 최적화 방법을 결합하여 메모리 효율성과 성능 간의 균형을 맞춥니다.
FRUGAL, a novel optimization framework, enhances the training of large language models by combining state-full optimization (e.g., Adam) for a select subset of parameters with state-free methods (e.g., signSGD) for the remaining parameters, achieving near-state-of-the-art performance with significantly reduced memory footprint.
Quantizing the eigenvector matrices of preconditioners in second-order optimizers like Shampoo, rather than the preconditioners themselves, significantly reduces memory usage while maintaining comparable performance to 32-bit counterparts.
This paper introduces H-Fac, a novel adaptive optimizer that leverages a memory-efficient factorization approach to address the high memory overhead of traditional deep learning optimizers, achieving sublinear memory costs while maintaining competitive performance.