Belangrijkste concepten
The core message of this paper is to devise an efficient online learning algorithm for a Stackelberg pricing game between a supplier (leader) and a retailer (follower) in a Newsvendor supply chain setting, where the demand parameters are initially unknown.
Samenvatting
The paper introduces a Stackelberg game framework for modeling the economic interaction between a supplier (leader) and a retailer (follower) in a Newsvendor supply chain setting. The key highlights are:
- Proof of the existence of a unique Stackelberg equilibrium under perfect information for the Newsvendor pricing game.
- Development of an online learning algorithm that leverages stochastic linear contextual bandits to learn the demand parameters, while integrating established economic theory.
- Derivation of convergence properties of the online learning algorithm to an approximate Stackelberg equilibrium, and theoretical guarantees for bounds on finite-time regret.
- Demonstration of the theoretical results through economic simulations, showing the learning algorithm outperforming baseline algorithms in terms of finite-time cumulative regret.
The authors address the challenge of optimizing for both environmental and strategic regret, particularly when facing stochastic environmental parameters and uncertainty in agent strategies. They propose innovative approaches for optimization and bounding functions to derive theoretical worst-case bounds on regret.
Statistieken
The paper does not contain any explicit numerical data or statistics. It focuses on the theoretical analysis and development of the online learning algorithm.
Citaten
"We introduce the application of online learning in a Stackelberg game pertaining to a system with two learning agents in a dyadic exchange network, consisting of a supplier and retailer, specifically where the parameters of the demand function are unknown."
"We prove the existence of a unique Stackelberg equilibrium when extending this to a two-player pricing game."
"A novel algorithm based on contextual linear bandits with a measurable uncertainty set is used to provide a confidence bound on the parameters of the stochastic demand. Consequently, optimal finite time regret bounds on the Stackelberg regret, along with convergence guarantees to an approximate Stackelberg equilibrium, are provided."