Generating Realistic Images of Actions and Object Transformations from Text Prompts
Given an input image and a text prompt describing an action or a desired final state, our method GenHowTo generates images that preserve the environment from the input image while transforming the objects according to the prompt.