Core Concepts
The core message of this paper is to design a risk-averse learning algorithm that achieves sub-linear dynamic regret in online convex optimization problems with time-varying distributions, using Conditional Value at Risk (CVaR) as the risk measure.
Abstract
The paper investigates online convex optimization in non-stationary environments, where the distribution of the random cost function changes over time. It proposes a risk-averse learning algorithm that minimizes the CVaR of the cost function.
Key highlights:
- The algorithm uses a zeroth-order optimization approach to estimate the CVaR gradient, as the exact gradient is generally unavailable.
- It employs a restarting procedure to enable the algorithm to adapt to the changing distributions.
- The distribution variation is quantified using the Wasserstein distance metric.
- The dynamic regret of the algorithm is analyzed for both convex and strongly convex cost functions, showing sub-linear bounds in terms of the distribution variation.
- The number of samples used to estimate the CVaR gradient is controlled by a tuning parameter, which affects the regret bound.
- Numerical experiments on dynamic pricing in a parking lot are provided to demonstrate the efficacy of the proposed algorithm.
Stats
The paper does not contain any explicit numerical data or statistics. The analysis focuses on theoretical regret bounds.