The paper focuses on constructing global minimizers for the L2 cost function in deep learning networks without using gradient descent. It analyzes the geometric structure of DL networks with multiple hidden layers, ReLU activation, and L2 cost function. The study aims to provide a mathematical understanding of the network's properties by direct construction. By considering clustered training inputs and specific conditions, the authors determine degenerate global and local minima of the cost function. The recursive application of truncation maps is used to curate training inputs, revealing an intrinsic structure akin to renormalization maps in quantum field theory.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Thom... في arxiv.org 03-15-2024
https://arxiv.org/pdf/2309.10639.pdfاستفسارات أعمق