Worschech, R., & Rosenow, B. (2024). Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra. arXiv preprint arXiv:2410.09005.
This paper investigates the impact of power-law data spectra, a common characteristic of real-world datasets, on the learning dynamics and generalization error of two-layer neural networks. The authors aim to theoretically analyze how neural scaling laws, which describe the relationship between network performance and factors like training data size and model complexity, are affected by these realistic data structures.
The study employs a student-teacher framework with both networks being two-layer neural networks. The authors utilize techniques from statistical mechanics, specifically analyzing one-pass stochastic gradient descent. They model data with power-law spectra using Gaussian-distributed inputs with covariance matrices exhibiting this property. The analysis focuses on the generalization error and its dependence on the power-law exponent of the data covariance matrix.
The presence of power-law spectra in data significantly influences the learning dynamics and generalization performance of two-layer neural networks. The derived analytical expressions and observed scaling laws provide valuable insights into how these networks learn from realistic data structures.
This work contributes to the theoretical understanding of neural scaling laws, moving beyond simplified data assumptions to incorporate the complexities of real-world datasets. The findings have implications for optimizing network architectures and hyperparameters for improved learning and generalization.
The study focuses on two-layer networks, and further research is needed to explore the impact of power-law data spectra on deeper architectures. Additionally, investigating the effects of other realistic data properties, such as non-Gaussian distributions, would be beneficial.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor