Conceptos Básicos
By decomposing the appearance of a 3D scene into low-frequency and high-frequency components, the proposed method enables high-fidelity and transferable photorealistic editing of 3D scenes based on text instructions.
Resumen
The paper presents a novel approach for high-fidelity and transferable photorealistic editing of 3D scenes represented by neural radiance fields (NeRFs). The key insight is that the low-frequency components of images, which predominantly define the appearance style, exhibit enhanced multi-view consistency after editing compared to their high-frequency counterparts.
The proposed framework comprises two main branches: a high-frequency branch that preserves the content details, and a low-frequency branch that performs the style editing in the feature space. The low-frequency branch first extracts the low-frequency feature from the full scene feature map using a low-pass filter. Then, a stylization network edits the low-frequency feature according to the desired style. Finally, the edited low-frequency component is blended with the high-frequency details from the original scene to obtain the high-fidelity edited image.
This frequency-decomposed approach offers several advantages:
It enables high-fidelity editing by preserving the high-frequency details while performing consistent style transfer in the low-frequency space.
It allows for controllable editing by interpolating between the original and edited low-frequency features.
The trained stylization module can be directly transferred to novel scenes without retraining, significantly reducing the workload of 3D scene editing.
The experiments demonstrate the superior performance of the proposed method in terms of multi-view consistency, image quality, and sharpness compared to previous NeRF editing approaches.
Estadísticas
The paper does not provide any specific numerical data or statistics. The key results are presented through qualitative visualizations and comparisons with baseline methods.