The paper proposes a new task called "responsible visual editing" which involves modifying specific concepts within an image to make it more responsible while minimizing changes. The authors divide this task into three subtasks: safety, fairness, and privacy, covering a wide range of risks in real-world scenarios.
To tackle the challenges of responsible visual editing, the authors propose a Cognitive Editor (CoEditor) that harnesses large multimodal models (LMM) through a two-stage cognitive process: (1) a perceptual cognitive process to focus on what needs to be modified and (2) a behavioral cognitive process to strategize how to modify.
The authors also create a transparent and public dataset called AltBear, which uses fictional teddy bears as the protagonists to convey risky content, significantly reducing potential ethical risks compared to using real human images.
Experiments show that CoEditor significantly outperforms baseline models in responsible image editing, validating the effectiveness of comprehending abstract concepts and strategizing modification. The authors also find that the AltBear dataset corresponds well to the harmful content in real images, offering a consistent experimental evaluation and a safer benchmark for future research.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Minheng Ni,Y... at arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.05580.pdfDeeper Inquiries