Core Concepts
This paper proposes a fully zeroth-order stochastic approximation method for solving bilevel optimization problems, where neither the upper/lower objective values nor their unbiased gradient estimates are available. The authors use Gaussian smoothing to estimate the first- and second-order partial derivatives of the functions with two independent block of variables, and establish non-asymptotic convergence analysis and sample complexity bounds for the proposed algorithm.
Abstract
The paper focuses on solving stochastic bilevel optimization problems where neither the objective function values nor their gradients are available. The authors make the following key contributions:
They generalize the Gaussian convolution technique to functions with two block-variables and establish relationships between such functions and their smooth Gaussian approximations. This allows them to exploit zeroth-order derivative estimates over just one block.
They provide the first fully zeroth-order stochastic approximation method for solving bilevel optimization problems, without assuming the availability of unbiased first/second order derivatives for the upper or lower level objective functions.
They provide a detailed non-asymptotic convergence analysis of the proposed method and present sample complexity results, which are the first established for a fully zeroth-order method for solving stochastic bilevel optimization problems.
The paper first lays out the necessary assumptions and notations. It then focuses on developing the required analysis tools by applying Gaussian smoothing techniques to functions with two block-variables. This includes estimating the first and second-order partial derivatives and bounding the discrepancies with their true values.
The paper then shifts its focus to the zeroth-order approximation of the bilevel optimization problem. It addresses issues such as the approximation error and the efficient evaluation of the gradient of the upper-level objective function.
The proposed solution algorithm is presented in Section 4, utilizing the tools and results from the previous sections to analyze the inner and outer loops of the bilevel programming scheme. The authors provide sample complexity results pertaining to the overall algorithm performance.