The paper introduces PLUNDER, a novel Programmatic Imitation Learning (PIL) algorithm that addresses the limitations of existing PIL methods. Key insights:
The PLUNDER algorithm iterates between an E-step, where it samples posterior action label sequences, and an M-step, where it synthesizes a new probabilistic policy that maximizes the likelihood of the sampled action labels. To improve scalability, PLUNDER uses an incremental synthesis technique that narrows the search space.
PLUNDER is evaluated on five standard imitation learning tasks, including autonomous vehicle and robotic arm environments. It outperforms four state-of-the-art IL techniques, achieving 95% accuracy in matching the given demonstrations and 90% success rate in completing the tasks, which are 19% and 17% higher than the next-best baseline, respectively.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Jimmy Xin,Li... at arxiv.org 04-08-2024
https://arxiv.org/pdf/2303.01440.pdfDeeper Inquiries