Generating Human-Human Interactions from Textual Descriptions by Leveraging Individual Motion Details
A novel diffusion model architecture (in2IN) that generates human-human motion interactions by conditioning on both the overall interaction description and the individual descriptions of the actions performed by each person involved in the interaction. This enables precise control over the intra-personal dynamics within the generated interactions.