How Sparse and Hierarchical Data Structures Enable Deep Networks to Learn Efficiently
Incorporating sparsity into hierarchical generative models naturally leads to classification tasks that are insensitive to the exact position of local features, implying insensitivity to discrete versions of diffeomorphisms. This correlation between insensitivity to diffeomorphisms and good performance is explained by the fact that a hierarchical representation, crucial for achieving high performance, is learnt precisely at the same number of training points at which insensitivity to diffeomorphisms is achieved.