The paper examines the effectiveness of knowledge localization across various open-source text-to-image models. It first observes that while causal tracing proves effective for early Stable-Diffusion variants, its generalizability diminishes when applied to newer text-to-image models like DeepFloyd and SD-XL for localizing control points associated with visual attributes.
To address this limitation, the paper introduces LOCOGEN, a method capable of effectively identifying locations within the UNet across diverse text-to-image models. Harnessing these identified locations within the UNet, the paper evaluates the efficacy of closed-form model editing across a range of text-to-image models leveraging LOCOEDIT.
Notably, for specific visual attributes such as "style", the paper discovers that knowledge can even be traced to a small subset of neurons and subsequently edited by applying a simple dropout layer, thereby underscoring the possibilities of neuron-level model editing.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Samyadeep Ba... at arxiv.org 05-03-2024
https://arxiv.org/pdf/2405.01008.pdfDeeper Inquiries