Core Concepts
The way attention values are normalized in Slot Attention significantly impacts its ability to generalize to unseen numbers of objects and slots, with alternative normalizations potentially leading to better performance than the original weighted mean method.
Stats
The weighted sum normalization with 11 slots achieves a higher F-ARI score than the baseline and layer normalization on the MOVi-C10 dataset.
Models trained on the filtered MOVi-C6 dataset with 7 slots and the weighted sum normalization outperform those trained on the full MOVi-C10 dataset with 11 slots, suggesting potential computational benefits.
The batch normalization variant achieves the highest F-ARI score when evaluated on the MOVi-D dataset, demonstrating superior zero-shot transfer performance compared to other normalization methods.