Incorporating margin values into the training process significantly improves the effectiveness of reward models in capturing human preferences.
TINYLLM proposes a novel knowledge distillation paradigm that learns a small student language model by distilling reasoning capabilities from multiple large teacher language models, enabling the student to outperform the teachers while using significantly fewer parameters.
CroissantLLM is a 1.3B parameter language model pre-trained on a balanced corpus of 1.5T English and French tokens, designed to provide high-performance and resource-efficient bilingual capabilities.