The paper proposes a method called Preference-based Action Representation Learning (PbARL) to address the challenge of personalization in human-robot interaction (HRI). Existing preference-based reinforcement learning (PbRL) approaches often require training a personalized robot policy from scratch, resulting in inefficient use of human feedback.
PbARL aims to overcome this limitation by leveraging pre-trained robot policies that capture common task structures and basic interaction skills. The key idea is to learn a latent action space that maximizes the mutual information between the pre-trained source domain and the target user preference-aligned domain, without altering the pre-trained policy. This is achieved by training a mutual information action encoder, implemented as a conditional variational autoencoder (cVAE), with carefully designed loss functions that balance task performance preservation and personalization.
PbARL requires minimal prior knowledge from the source domain, using only transition tuples obtained by testing the pre-trained policy. This enhances the practicality of the method in real-world HRI scenarios. Extensive experiments on the Assistive Gym benchmark and a real-world user study (N=8) demonstrate that PbARL can lead to greater user satisfaction by improving personalization levels while preserving original task performance, compared to state-of-the-art approaches.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések