The paper investigates the impact of different loss functions on the training dynamics and convergence properties of neural networks for solving partial differential equations (PDEs). The authors discover a "loss-jump" phenomenon: when switching the loss function from data loss to model loss, which incorporates higher-order derivative information, the neural network solution significantly deviates from the exact solution immediately.
The authors further observe a "multi-stage descent" phenomenon, where the neural network's ability to constrain low-frequency information is weak when using model loss functions. Through theoretical analysis, they show that neural networks trained with model loss exhibit a frequency bias that differs from the well-established frequency principle observed under data loss. Within a certain frequency range, high frequencies converge faster under model loss, while low and medium frequencies are prioritized when considering the entire spectrum.
These findings shed light on the complex interplay between loss functions, frequency bias, and convergence behavior of neural networks in solving PDEs. The authors suggest that understanding these mechanisms can contribute to the development of more robust and efficient neural network-based PDE solvers and advance the theoretical foundations of this field.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Zhiwei Wang,... alle arxiv.org 05-07-2024
https://arxiv.org/pdf/2405.03095.pdfDomande più approfondite