The paper presents a novel viewpoint on neural network feature learning, framing them as a mixture of simple experts where each expert corresponds to a path through the network. This allows the authors to introduce the concept of "active path regions" which are simpler and more interpretable than the commonly studied "activation pattern regions".
The key insights are:
Neural networks, including ReLU networks, can be viewed as a mixture of simple experts where each expert is an indicator function for a region in the input space described as an intersection of half-spaces.
The authors introduce a new architecture called the Deep Linearly Gated Network (DLGN) which sits between deep linear networks and ReLU networks. Unlike deep linear networks, DLGNs can learn non-linear features, and unlike ReLU networks these features are ultimately simple - each feature is an indicator function for a half-space region.
Analyzing the "overlap kernel" of the active path regions reveals that neural networks, both ReLU and DLGN, learn features that focus on the lower frequency regions of the target function early in training. This provides a plausible mechanism for how the neural tangent kernel changes during training to become better suited for the task.
The simple structure of DLGN active regions allows for a comprehensive global visualization of the learned features, unlike the local visualizations typically used for ReLU networks.
Overall, the paper provides a novel and interpretable perspective on feature learning in neural networks, bridging the gap between the two extreme viewpoints of neural networks as kernel methods versus intricate hierarchical feature learners.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Mahesh Lorik... klo arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04312.pdfSyvällisempiä Kysymyksiä