LAVIMO introduces a novel framework for three-modality learning, integrating human-centric videos to enhance alignment between text and motion modalities.