Core Concepts
This paper extends a geometric framework for studying deep neural networks to handle piecewise differentiable activation functions like ReLU, convolutional layers, residual blocks, and recurrent networks. It introduces random walk-based algorithms to explore the equivalence classes of the input and representation manifolds, which can have varying dimensions due to the non-differentiability.
Abstract
The paper extends a previous geometric framework for analyzing deep neural networks to handle more complex architectures and non-differentiable activation functions like ReLU.
Key highlights:
Generalizes the previous results to convolutional layers, residual blocks, and recurrent networks by using Cartesian products and tensor products of maps.
Relaxes the smoothness assumption on the activation functions, allowing piecewise differentiable functions like ReLU.
Introduces random walk-based algorithms to explore the equivalence classes of the input and representation manifolds, which can have varying dimensions due to the non-differentiability.
Provides numerical experiments on image classification and thermodynamic problems to validate the developed theory.
The core idea is to model the neural network as a sequence of maps between manifolds, where the output manifold is equipped with a Riemannian metric. The pullback of this metric through the network induces a (possibly degenerate) metric on the input and representation manifolds. This allows studying the geometry of the data and the network's transformations.
The key challenge addressed is that with non-differentiable activations, the equivalence classes (points mapped to the same output) can have varying dimensions. The random walk-based algorithms are proposed to efficiently explore these classes without suffering from the curse of dimensionality.