Concetti Chiave
PLUNDER, a novel probabilistic programmatic imitation learning algorithm, can synthesize interpretable and adaptable policies from unlabeled and noisy demonstrations.
Sintesi
The paper introduces PLUNDER, a novel Programmatic Imitation Learning (PIL) algorithm that addresses the limitations of existing PIL methods. Key insights:
Inferring action labels from unlabeled demonstrations is a latent variable estimation problem, which can be solved using an Expectation-Maximization (EM) approach.
Synthesizing a probabilistic policy, rather than a deterministic one, allows PLUNDER to model the uncertainties inherent in real-world demonstrations.
The PLUNDER algorithm iterates between an E-step, where it samples posterior action label sequences, and an M-step, where it synthesizes a new probabilistic policy that maximizes the likelihood of the sampled action labels. To improve scalability, PLUNDER uses an incremental synthesis technique that narrows the search space.
PLUNDER is evaluated on five standard imitation learning tasks, including autonomous vehicle and robotic arm environments. It outperforms four state-of-the-art IL techniques, achieving 95% accuracy in matching the given demonstrations and 90% success rate in completing the tasks, which are 19% and 17% higher than the next-best baseline, respectively.
Statistiche
The vehicle's acceleration cannot exceed amax ≈ 13m/s^2 or drop below amin ≈ -20m/s^2.