A method for generating realistic and diverse human-object interactions in 3D scenes, controlled by text prompts.