OpenBias, a novel pipeline, identifies and quantifies biases in text-to-image generative models without relying on a predefined set of biases.
A novel dual-branch diffusion model, PanFusion, is proposed to generate high-quality and consistent 360° panoramic images from text prompts by leveraging the global layout guidance of the panorama branch and the rich prior knowledge of perspective image generation in the Stable Diffusion model.
Contrastive Adapter Training (CAT) is a simple yet effective strategy to enhance adapter training in diffusion models, facilitating the preservation of the base model's original knowledge when initiating adapters for personalized image generation.
The core message of this paper is to introduce a novel object-conditioned Energy-Based Attention Map Alignment (EBAMA) method to address the issues of incorrect attribute binding and catastrophic object neglect in text-to-image diffusion models.
λ-ECLIPSE is a resource-efficient prior-training strategy that enables fast and effective multi-subject-driven personalized text-to-image generation without relying on diffusion models.
SAFEGEN, a text-agnostic framework, can effectively mitigate the generation of sexually explicit content by text-to-image models, even under adversarial prompts, by removing unsafe visual representations from the model.
This paper introduces a rich human feedback dataset (RichHF-18K) and a multimodal transformer model (RAHF) to provide detailed and interpretable evaluations of text-to-image generation models. The dataset contains fine-grained scores, implausibility/misalignment image regions, and misaligned keywords, which can be used to train the RAHF model to automatically predict such rich feedback on generated images.
Ranni introduces a semantic panel as a middleware between text and image, enabling accurate translation of natural language descriptions into visual concepts and allowing for intuitive image editing through panel manipulation.
A method for detecting unauthorized data usage in text-to-image diffusion models by planting injected memorization into the models trained on the protected dataset.
SmartControl is a novel text-to-image generation method that can adaptively handle situations where there are disagreements between visual conditions and text prompts, by predicting a local control scale map to relax the constraints in conflicting regions.