The paper introduces AMMNet, a novel framework for semantic scene completion. It addresses limitations in feature learning and overfitting by utilizing cross-modal modulation and adversarial training. Extensive experiments demonstrate superior performance compared to state-of-the-art methods on NYU and NYUCAD datasets.
The study reveals that multi-modal models fail to fully unleash the potential of individual modalities compared to single-modal models. By incorporating cross-modal modulation, AMMNet significantly improves SSC-mIoU by 3.5% on NYU and 3.3% on NYUCAD.
Adversarial training in AMMNet effectively prevents overfitting, leading to steadily increasing performance on both training and validation sets. The proposed framework outperforms existing methods by large margins, showcasing its effectiveness in semantic scene completion.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Fengyun Wang... a las arxiv.org 03-13-2024
https://arxiv.org/pdf/2403.07560.pdfConsultas más profundas