The content presents an investigation of the Decision Transformer (DT) architecture as a foundation model for controlling partially observable nonlinear dynamical systems. The key highlights are:
The control problem is formulated as predicting the current optimal action based on a sequence of past observations, actions, and rewards, eliminating the need for a separate state estimator design.
DT is initialized using a pre-trained GPT-2 language model and then trained on self-generated offline control datasets via Low-Rank Adaptation (LoRA).
Comprehensive experiments are conducted on five distinct control tasks, ranging from maneuvering aerospace systems to controlling partial differential equations (PDEs).
In the single-task setting, DT consistently matches or surpasses the performance of unknown expert behavior policies and state-of-the-art offline reinforcement learning methods.
In the multi-task setting with significantly perturbed system dynamics, DT exhibits remarkable zero-shot generalization abilities to new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data (e.g., 10 rollout trajectories).
The experimental findings confirm DT's ability to capture parameter-agnostic structures of control tasks and excel in few-shot learning, suggesting its potential as a foundational controller for general control applications.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問