01.AI introduces the Yi model family, showcasing language and multimodal models with advanced capabilities. The models are based on pretrained language models and extended to include chat models, long context models, depth-upscaled models, and vision-language models. The performance of the Yi models is attributed to high-quality data resulting from extensive data engineering efforts. For pretraining, a large corpus of English and Chinese tokens is constructed using a sophisticated data cleaning pipeline. Finetuning involves meticulous polishing of a small-scale instruction dataset. The vision-language model combines chat language with a vision transformer encoder for aligning visual representations with the semantic space of the language model. Continual pretraining extends context length to 200K, demonstrating strong retrieval performance. Increasing the depth of pretrained checkpoints through continual pretraining further enhances performance.
לשפה אחרת
מתוכן המקור
arxiv.org
שאלות מעמיקות