Core Concepts
TIP-Editor enables accurate 3D scene editing by leveraging both text prompts and image prompts, achieving high-quality results that closely match the specified appearance and location.
Abstract
The paper presents TIP-Editor, a versatile 3D scene editing framework that allows users to perform various editing operations (e.g., object insertion, object replacement, re-texturing, and stylization) guided by both text prompts and image prompts.
Key highlights:
- TIP-Editor employs a novel stepwise 2D personalization strategy, which features a localization loss in the scene personalization step and a separate novel content personalization step dedicated to the reference image based on LoRA, to enable accurate location and appearance control.
- The framework adopts 3D Gaussian splatting (GS) to represent the 3D scene, which facilitates precise local editing due to its explicit point data structure.
- Extensive experiments demonstrate that TIP-Editor consistently outperforms existing text-driven 3D editing methods in terms of editing quality, visual fidelity, and user satisfaction.
Stats
"Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness."
"Existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description."
"TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box."
"TIP-Editor utilizes explicit and flexible 3D Gaussian splatting (GS) as the 3D representation to facilitate local editing while keeping the background unchanged."
Quotes
"TIP-Editor excels in precise and high-quality localized editing given a 3D bounding box, and allows the users to perform various types of editing on a 3D scene, such as object insertion, whole object replacement, part-level object editing, combination of these editing types (i.e. sequential editing), and stylization."
"The editing process is guided by not only the text but also one reference image, which serves as the complement of the textual description and results in more accurate editing control."