MiniCPM: Efficient Small Language Models with Scalable Training Strategies
MiniCPM, a series of small language models with 1.2B and 2.4B non-embedding parameters, demonstrate capabilities on par with 7B-13B large language models through meticulous model wind tunnel experiments, a novel Warmup-Stable-Decay learning rate scheduler, and a two-stage pre-training strategy.