Core Concepts
Reward-conditioning improves VLN agent performance on suboptimal datasets.
Abstract
The study introduces VLN-ORL, utilizing suboptimal offline trajectories for training VLN agents. It proposes a reward-conditioned approach to improve agent performance, demonstrating significant enhancements in complex environments. Various noise models are explored to generate suboptimal datasets for evaluation. The proposed reward token allows flexible conditioning of VLN agents during training and testing. Empirical studies show substantial performance improvements, especially in challenging scenarios like the Random dataset. Ablation studies confirm the effectiveness of the reward-conditioned model across different subsets of validation sets. K-fold validation results further validate the performance gains of the reward-conditioning approach.
Stats
Our experiments demonstrate that the reward-conditioned approach leads to significant performance improvements.
The reward token relates to the presumed reward at each state, which can be either sparse or dense.
The proposed reward token allows flexible conditioning of VLN agents during training and testing.
Quotes
"The proposed reward-conditioned approach leads to significant performance improvements."
"The reward token allows flexible conditioning of VLN agents during training and testing."