Core Concepts
Theoretical investigation and algorithm proposal for offline multitask representation learning in reinforcement learning.
Abstract
The study explores the benefits of offline multitask representation learning in RL, proposing the MORL algorithm. The research delves into downstream RL scenarios, including reward-free, offline, and online settings. The theoretical results highlight the advantages of using learned representations from upstream tasks. The algorithm aims to improve sample complexity and learning efficiency by leveraging shared representations among tasks.
Stats
For each task t, we have access to offline dataset D = S[t∈[T],h∈[H] D(t)h], where D(t)h = {(s(i,t)h, a(i,t)h, r(i,t)h, s(i,t)h+1)}i∈[n] collected using behavior policy πb[t].
We generalize single task relative condition number to multitask setting with C∗t,h(πt, πb[t]) ≤ maxs,a dπtP(∗,t)(s,a)/dπb[t]P(∗,t)(s,a).
MORL improves suboptimality gap by O(H√d) compared to single-task offline representation learning in low-rank MDPs.
In reward-free setting, exploration phase requires sampling at most eO(H4d3ǫ2) episodes for an ǫ-optimal policy during planning phase.
Approximate feature representation learned from MORL improves downstream reward-free RL sample complexity by eO(HdK).