Efficient Training of Large Language Models from Scratch on a 16 GB GPU Using Quantized GaLore Technique
A new technique called Q-GaLore enables efficient training of 7B parameter Large Language Models (LLMs) from scratch on a 16 GB GPU by combining gradient projection into low-rank subspaces and model weight quantization.