Achieving zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors through ConRF.