Concepts de base
Responsible visual editing aims to automatically modify specific harmful concepts within an image to render it more responsible while minimizing changes.
Résumé
The paper proposes a new task called "responsible visual editing" which involves modifying specific concepts within an image to make it more responsible while minimizing changes. The authors divide this task into three subtasks: safety, fairness, and privacy, covering a wide range of risks in real-world scenarios.
To tackle the challenges of responsible visual editing, the authors propose a Cognitive Editor (CoEditor) that harnesses large multimodal models (LMM) through a two-stage cognitive process: (1) a perceptual cognitive process to focus on what needs to be modified and (2) a behavioral cognitive process to strategize how to modify.
The authors also create a transparent and public dataset called AltBear, which uses fictional teddy bears as the protagonists to convey risky content, significantly reducing potential ethical risks compared to using real human images.
Experiments show that CoEditor significantly outperforms baseline models in responsible image editing, validating the effectiveness of comprehending abstract concepts and strategizing modification. The authors also find that the AltBear dataset corresponds well to the harmful content in real images, offering a consistent experimental evaluation and a safer benchmark for future research.
Stats
"With recent advancements in visual synthesis, there is a growing risk of encountering images with detrimental effects, such as hate, discrimination, or privacy violations."
"We are increasingly likely to encounter images that may contain harmful content, such as hate, discrimination, or privacy violations."
"Existing editing models require clear user instructions to make specific adjustments in the images, e.g., editing hat to "change the blue hat into red"."
"In responsible image editing, the concept that needs to be edited is often abstract, e.g., editing violence to "make an image look less violent", making it challenging to locate what needs to be modified and plan how to modify it."
Citations
"We formulate this problem as a new task, responsible visual editing."
"To tackle these challenges, we propose a Cognitive Editor (CoEditor) that harnesses large multimodal models (LMM) through a two-stage cognitive process, (1) a perceptual cognitive process to focus on what needs to be modified and (2) a behavioral cognitive process to strategize how to modify."
"To mitigate the negative implications of harmful images on research, we create a transparent and public dataset, AltBear, which expresses harmful information using teddy bears instead of humans."