Core Concepts
DEADiff achieves an optimal balance between text controllability and style similarity, addressing the limitations of encoder-based methods in text-to-image models.
Abstract
DEADiff introduces a dual decoupling representation extraction mechanism and a disentangled conditioning mechanism to enhance text control capabilities while maintaining style fidelity. It outperforms state-of-the-art methods in style similarity, image quality, text alignment, and overall user preference.
Key Points:
DEADiff addresses the decline in text controllability of existing encoder-based stylized diffusion models.
The model achieves an optimal balance between stylization capabilities and text control.
It introduces innovative mechanisms for extracting disentangled representations and enhancing text alignment.
Quantitative comparisons show superior performance in style similarity, image quality, and user preference.
Stats
DEADiff attains the best visual stylization results quantitatively and qualitatively.
The method achieves a style similarity of 0.229 and an image quality score of 5.840.
Quotes
"A zebra to the right of a fire hydrant"
"A puppy sitting on a sofa"
"A motorcycle"