The core message of this article is to propose two novel analyses, Principal Relevant Component Analysis (PRCA) and Disentangled Relevant Subspace Analysis (DRSA), that can extract subspaces from the activations of a neural network that are maximally relevant to its prediction strategy. These disentangled subspaces enable more informative and structured explanations of the model's decision-making process.
Neural networks exhibit permutation symmetry, where reordering neurons in each layer does not change the underlying function they compute. This contributes to the non-convexity of the networks' loss landscapes. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no loss barriers between trained networks if they are permuted appropriately. This work refines these arguments into three distinct claims of increasing strength, and provides empirical evidence for the strongest claim.
This paper proposes a method to accurately compute the local Lipschitz constant of feedforward neural networks with ReLU activation functions, and derives a condition to verify the exactness of the computed upper bound.
A compositional approach to efficiently estimate tight upper bounds on the Lipschitz constant of deep feedforward neural networks by decomposing the large matrix verification problem into smaller sub-problems that can be solved layer-by-layer.
The Fisher information matrix characterizes the local geometry of the parameter space in neural networks. Due to its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. This work examines two such estimators, deriving bounds on their accuracy and sample complexity based on the variances associated with different parameter groups and the non-linearity of the network.
The authors develop new local and global notions of topological complexity for fully-connected feedforward ReLU neural network functions, drawing on algebraic topology and a piecewise-linear version of Morse theory.
The Boolean Mean Dimension (BMD) can be used as a proxy for the sensitivity and complexity of neural network functions. The BMD exhibits a peak around the interpolation threshold, coinciding with the generalization error peak, and then decreases as the model becomes more overparameterized.