Core Concepts
Neural networks exhibit different frequency preferences when trained with data loss versus model loss for solving partial differential equations, leading to a loss jump phenomenon and multi-stage descent behavior.
Abstract
The paper investigates the impact of different loss functions on the training dynamics and convergence properties of neural networks for solving partial differential equations (PDEs). The authors discover a "loss-jump" phenomenon: when switching the loss function from data loss to model loss, which incorporates higher-order derivative information, the neural network solution significantly deviates from the exact solution immediately.
The authors further observe a "multi-stage descent" phenomenon, where the neural network's ability to constrain low-frequency information is weak when using model loss functions. Through theoretical analysis, they show that neural networks trained with model loss exhibit a frequency bias that differs from the well-established frequency principle observed under data loss. Within a certain frequency range, high frequencies converge faster under model loss, while low and medium frequencies are prioritized when considering the entire spectrum.
These findings shed light on the complex interplay between loss functions, frequency bias, and convergence behavior of neural networks in solving PDEs. The authors suggest that understanding these mechanisms can contribute to the development of more robust and efficient neural network-based PDE solvers and advance the theoretical foundations of this field.
Stats
The neural network used has 3 hidden layers, each with 320 neurons, and employs tanh and cubic activation functions.
The Poisson problem has the form: Δu(x) = -sin(x) - 100sin(10x), x ∈ [0, 2π], u(0) = u(2π) = 0.
The Burgers equation has the form: ut + uux - 0.01/π uxx = 0, x ∈ [-1, 1], t ∈ [0, 1], u(x, 0) = -sin(x), u(x, t) = 0 at x = -1, 1.
The heat equation has the form: ut - uxx = 0, x ∈ [0, 1], t ∈ [0, 1], u(x, 0) = sin(πx), u(x, t) = 0 at x = 0, 1.
The diffusion equation has the form: ut - uxx = e^(-t)(3/2sin(2x) + 8/3sin(3x) + 15/4sin(4x) + 63/8sin(8x)), x ∈ [-π, π], t ∈ [0, 1], u(x, 0) = sin(x) + 1/2sin(2x) + 1/3sin(3x) + 1/4sin(4x) + 1/8sin(8x), u(x, t) = 0 at x = -π, π.
The wave equation has the form: utt - uxx = 0, x ∈ [-π, π], t ∈ [0, 10], u(x, 0) = sin(x), u(x, t) = sin(t) at x = -π, π.
Quotes
"When switching the loss function from the data loss to the model loss, which includes different orders of derivative information, the neural network solution significantly deviates from the exact solution immediately."
"The multi-stage descent phenomenon suggests that model loss imposes weak constraints on low-frequency errors, which could have implications for the accuracy and reliability of the trained networks."
"Within a certain frequency range, high frequencies converge faster under model loss, while low and medium frequencies are prioritized when considering the entire spectrum."