Core Concepts
Deep neural networks can be effectively represented and analyzed as complex networks, providing insights into their internal dynamics and decision-making processes beyond the traditional input-output relationship.
Abstract
This paper proposes a unified framework for representing and analyzing deep neural networks (DNNs) through the lens of Complex Network Theory (CNT). The authors introduce and formalize a set of CNT metrics that can be applied to various DNN architectures, including Fully Connected (FC), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and AutoEncoders (AEs).
The key contributions are:
Extension of existing CNT metrics to account for the effect of input data, moving beyond a purely topological analysis. The new metrics include Neurons Strength and Neurons Activation, which capture the influence of the input distribution on the network's internal dynamics.
Adaptation of the CNT metrics to handle the unique characteristics of CNNs and RNNs, which leverage architectural inductive biases like local connectivity and sequential processing.
Extensive experiments on MNIST and CIFAR10 datasets, analyzing the patterns and trends observed in the CNT metrics across different architectures, activation functions, and network depths.
The results reveal distinct dynamics and training patterns for the various DNN architectures. For example, CNNs exhibit localized learning through multimodal Nodes and Neurons Strength, while AEs in deeper models suggest learning dynamics that diverge from shallow architectures. The authors also find that non-linear activations lead to asymmetric distributions, enabling complex pattern learning. Overall, the proposed framework provides a method rooted in physics for interpreting DNNs, offering insights beyond the traditional input-output relationship and the topological CNT analysis.
Stats
The authors report the following key statistics:
For Fully Connected networks, there is an abundance of negative or close to zero weights, indicating an over-parametrized network.
Convolutional Neural Networks exhibit multimodal Nodes and Neurons Strength, suggesting localized learning of patterns like edge detection.
Recurrent Neural Networks show a pronounced Kurtosis in the Neurons Strength distribution, and their Neurons Activation is distributed not just at the extremes of the sigmoid function, but also significantly in the middle.
Deeper AutoEncoder architectures do not seem to leverage the same "building blocks" as shallow models, suggesting that data compression is a dynamic process heavily influenced by the number of layers.
Quotes
"CNNs' Neurons Activation centres around zero, with low variance compared to other architectures. Conversely, non-linear activations skew the distribution of the metrics toward zero. Both ReLU- and sigmoid-activated networks leverage negatively activated neurons to discriminate between different classes."
"Unlike FCs and RNNs, CNNs with non-linear activations learn discriminative patterns for classification, mostly in negative regions of the activation function."