A hierarchical learning framework using large language models and semantic keypoints enables generalizable clothes manipulation across diverse clothes categories and tasks.