The paper presents an unsupervised algorithm to learn discrete action prototypes for robots based on the effects they produce in the environment. The approach consists of three main stages:
Motion sampling: The robot samples random motions from its continuous motion space and stores the resulting (motion, effect) tuples.
Effect region clustering: The collected effects are clustered using k-means to find distinct effect classes. The number of clusters is determined based on the silhouette score.
Action prototype generation: For each effect class, the algorithm generates a small number of representative action prototypes using the Robust Growing Neural Gas (RGNG) algorithm. The number of prototypes per class is determined based on the variability of the effects.
The algorithm is evaluated on a simulated stair-climbing reinforcement learning task called "Up The Stairs". Compared to uniform and random discretization methods, the effect-driven discretization approach shows faster convergence and higher maximum reward when used with a Deep Q-Network agent. The performance is still lower than a Soft Actor-Critic agent operating in the continuous action space, but the effect-based discretization achieves this with an 85 times smaller network size.
The paper discusses the advantages of the effect-centric approach, such as the ability to discover meaningful action prototypes in an unsupervised manner. It also highlights the challenges in dealing with continuous effect spaces and the need for feature selection to maintain a good balance between performance and generalization.
翻譯成其他語言
從原文內容
arxiv.org
深入探究