Główne pojęcia
LPT, a novel generative model, effectively performs planning in offline reinforcement learning by leveraging a latent variable to connect trajectory generation with final returns, achieving temporal consistency and outperforming existing methods in challenging tasks.
Kong, D., Xu, D., Zhao, M., Pang, B., Xie, J., Lizarraga, A., Huang, Y., Xie, S., & Wu, Y. N. (2024). Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference. Advances in Neural Information Processing Systems, 38.
This paper introduces the Latent Plan Transformer (LPT), a novel approach to planning in offline reinforcement learning (RL) settings where only trajectory-return pairs are available, without access to step-wise rewards. The authors aim to address the challenge of temporal consistency in such settings and demonstrate LPT's effectiveness in complex planning tasks.