The paper introduces AMMNet, a novel framework for semantic scene completion. It addresses limitations in feature learning and overfitting by utilizing cross-modal modulation and adversarial training. Extensive experiments demonstrate superior performance compared to state-of-the-art methods on NYU and NYUCAD datasets.
The study reveals that multi-modal models fail to fully unleash the potential of individual modalities compared to single-modal models. By incorporating cross-modal modulation, AMMNet significantly improves SSC-mIoU by 3.5% on NYU and 3.3% on NYUCAD.
Adversarial training in AMMNet effectively prevents overfitting, leading to steadily increasing performance on both training and validation sets. The proposed framework outperforms existing methods by large margins, showcasing its effectiveness in semantic scene completion.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Fengyun Wang... lúc arxiv.org 03-13-2024
https://arxiv.org/pdf/2403.07560.pdfYêu cầu sâu hơn