Multimodal Large Language Model for 3D Human Pose Estimation and Reasoning
ChatPose is a multimodal Large Language Model (LLM) that can directly generate 3D human poses represented as SMPL parameters from text or image inputs, and reason about human poses using its general world knowledge.