Core Concepts
The degree of feature collapse increases during the forward propagation of Residual Neural Networks, forming a progressive feedforward collapse phenomenon.
Abstract
The paper investigates the behavior of features in intermediate layers of Residual Neural Networks (ResNets), proposing a novel conjecture called progressive feedforward collapse (PFC). PFC claims that the degree of feature collapse increases during the forward propagation of ResNets.
The key highlights are:
The authors extend the statistics used in Neural Collapse (NC) to intermediate layers and define three PFC metrics to measure the degree of collapse: variability collapse, convergence to simplex equiangular tight frame, and nearest class center accuracy.
Under the geodesic curve assumption, which models the forward propagation of ResNets as a straight line in the Wasserstein space, the authors prove that the PFC metrics monotonically decrease across depth at the terminal phase of training.
To better illustrate the PFC phenomenon, the authors propose a novel surrogate model called the multilayer unconstrained feature model (MUFM). MUFM treats intermediate-layer features as optimization variables and connects them using an optimal transport regularizer, in contrast to the unconstrained feature model (UFM) which only considers the last-layer features.
Empirical results on various datasets support the PFC conjecture, showing that the PFC metrics indeed decrease monotonically across layers. The authors also demonstrate the trade-off between data and simplex equiangular tight frame in the optimal solutions of MUFM.
Overall, the study extends the understanding of Neural Collapse to the intermediate layers of ResNets, providing insights into how ResNets transform the input data to the final simplex equiangular tight frame during forward propagation.
Stats
The paper does not provide any specific numerical data or statistics. The analysis is primarily theoretical and supported by empirical observations.
Quotes
"At the terminal phase of ResNet training, NC emerges at the last-layer features. There exists an order in the degree of collapse (measured by three PFC metrics) of each layer before effective depth."
"Under the geodesic curve assumption, the PFC metrics monotonically decrease along the straight line. For each layer, these metrics start at random initialization and converge to PFC at the final epoch."
"Increasing the coefficient of the optimal transport regularizer λ makes the solutions of MUFM move away from the simplex equiangular tight frame and become closer to the initial data."