Core Concepts
Agents can bootstrap large nonlinear models to learn the parameters of a low-capacity model, which can then be efficiently updated online using logistic regression to align with a person's preferences during collaborative tasks.
Abstract
The paper proposes a method called BLR-HAC (Bootstrapped Logistic Regression for Human Agent Collaboration) that combines the strengths of pretrained large, nonlinear models and low-capacity models trained online via logistic regression to enable efficient learning in human-robot collaborations.
The key insights are:
- Agents assisting people need to have well-initialized policies that can adapt quickly to align with their partners' reward functions. Initializing policies to maximize performance with unknown partners can be achieved by bootstrapping nonlinear models using imitation learning over large, offline datasets. However, these large models require prohibitive computation to fine-tune in-situ.
- In contrast, online logistic regression using low-capacity models performs rapid inference and fine-tuning updates, allowing effective use of immediate in-task behavior for reward function alignment. However, these low-capacity models cannot be bootstrapped as effectively by offline datasets and thus have poor initializations.
- BLR-HAC addresses this by first pretraining a large nonlinear model to learn the parameters of a low-capacity model, which is then updated online using logistic regression. This allows the agent to benefit from both good zero-shot performance and fast online adaptation.
The paper evaluates BLR-HAC in a simulated surface rearrangement task, where an agent assists a person in rearranging objects. The results show that BLR-HAC outperforms baseline low-capacity models and large, nonlinear models trained with behavior cloning in zero-shot coordination. It also achieves similar performance to a fine-tuned transformer model but requires a fraction of the compute.
Stats
The paper does not provide any specific numerical data or statistics. It focuses on describing the proposed algorithm and evaluating its performance through simulated experiments.
Quotes
The paper does not contain any direct quotes that are particularly striking or support the key arguments.