The paper introduces AMMNet, a novel framework for semantic scene completion. It addresses limitations in feature learning and overfitting by utilizing cross-modal modulation and adversarial training. Extensive experiments demonstrate superior performance compared to state-of-the-art methods on NYU and NYUCAD datasets.
The study reveals that multi-modal models fail to fully unleash the potential of individual modalities compared to single-modal models. By incorporating cross-modal modulation, AMMNet significantly improves SSC-mIoU by 3.5% on NYU and 3.3% on NYUCAD.
Adversarial training in AMMNet effectively prevents overfitting, leading to steadily increasing performance on both training and validation sets. The proposed framework outperforms existing methods by large margins, showcasing its effectiveness in semantic scene completion.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Fengyun Wang... às arxiv.org 03-13-2024
https://arxiv.org/pdf/2403.07560.pdfPerguntas Mais Profundas