Concetti Chiave
Adversarial MDPs with hard constraints are studied, introducing algorithms for regret minimization and constraint satisfaction.
Sintesi
The study focuses on online learning in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints. Two scenarios are considered: one addressing cumulative positive constraints violation and sublinear regret, and the other ensuring constraints satisfaction at every episode. The algorithms designed, BV-OPS and S-OPS, provide solutions for handling adversarial losses and constraints in non-stationary environments. The work expands the applicability of algorithms to various real-world applications.
Statistiche
VT ≤ O(L|X|√|A|T ln(T|X||A|m/δ))
RT ≤ O(L|X|√|A|T ln(T|X||A|/δ))
RT ≤ O(ΨL3|X|√|A|T ln(T|X||A|m/δ))
Citazioni
"We study online learning problems in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints."
"Our algorithms can deal with general non-stationary environments subject to requirements much stricter than those manageable with state-of-the-art algorithms."