Khái niệm cốt lõi
This paper introduces State-Action Distillation (SAD), a novel approach for In-Context Reinforcement Learning (ICRL) that effectively addresses the limitations of previous methods by enabling ICRL under random policies and random contexts, eliminating the need for optimal or well-trained policies during pretraining.
Thống kê
SAD outperforms the best baseline by 180.86% in the offline evaluation and by 172.8% in the online evaluation on average across five ICRL benchmark environments.
In the Darkroom environment, SAD surpasses the best baseline by 149.3% in the offline evaluation and 41.7% in the online evaluation.
In the Darkroom-Large environment, SAD outperforms the best baseline by 266.8% in the offline evaluation and 24.7% in the online evaluation.
In the Miniworld environment, SAD surpasses the best baseline by 122.1% in the offline evaluation and 21.7% in the online evaluation.