Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency
The core message of this article is that by exploiting the low-rank structure in the state transition of POMDPs, it is possible to learn a minimal but sufficient representation of the observation and state histories, enabling sample-efficient reinforcement learning in POMDPs with infinite observation and state spaces.