Concepts de base
The author proposes the Adversarial Modality Modulation Network (AMMNet) to address ineffective feature learning and overfitting in semantic scene completion, achieving significant performance improvements.
Résumé
The paper introduces AMMNet, a novel framework for semantic scene completion. It addresses limitations in feature learning and overfitting by utilizing cross-modal modulation and adversarial training. Extensive experiments demonstrate superior performance compared to state-of-the-art methods on NYU and NYUCAD datasets.
The study reveals that multi-modal models fail to fully unleash the potential of individual modalities compared to single-modal models. By incorporating cross-modal modulation, AMMNet significantly improves SSC-mIoU by 3.5% on NYU and 3.3% on NYUCAD.
Adversarial training in AMMNet effectively prevents overfitting, leading to steadily increasing performance on both training and validation sets. The proposed framework outperforms existing methods by large margins, showcasing its effectiveness in semantic scene completion.
Stats
Performance drops of “multi-modal encoders” compared to single-modal counterparts validate insufficient unleashing of modalities in joint training.
Deep SSC models trained with limited scene data are prone to overfitting.
Employing the multi-modal RGB encoder led to a performance drop of 0.37% in terms of SSC-mIoU compared to utilizing the single-modal RGB encoder.
Adopting the multi-modal TSDF encoder incurred a 0.51% decrease in SSC-mIoU compared to the single-modal TSDF encoder.
Baseline model achieved optimal validation score but later epochs led to increasing divergence between training and validation sets.
Citations
"Our method demonstrates significantly enhanced encoder capabilities."
"Diverging training/validation curves indicate overfitting issues."
"Adversarial training scheme dynamically stimulates continuous evolution of models."