Conceptos Básicos
Principal-Agent Bandit Games introduce Incentivized Learning to optimize utility.
Estadísticas
"Nearly optimal (with respect to a horizon T) learning algorithms for the principal’s regret in both multi-armed and linear contextual settings."
"The overall algorithm achieves both nearly optimal distribution-free and instance-dependent regret bounds."
"Contextual IPA achieves a O(d √ T log(T)) regret bound."
Citas
"The principal aims to iteratively learn an incentive policy to maximize her own total utility."
"Our work focuses on the blend of mechanism design and learning."
"The overall algorithm achieves nearly optimal regret bounds."