The study investigates how robots can develop compositionality and generalization in language and action through interactive learning. The key findings are:
Generalization performance in learning unlearned linguistic compositions improves as the variety of task compositions used in training increases. This is attributed to the emergence of more consistent relational structures among different concepts combining actions and object nouns in the linguistic latent state space.
The linguistic latent representations of actional concepts develop by preserving similarity among corresponding sensorimotor patterns, indicating that the compositional structure in language is significantly influenced by sensorimotor learning.
Ablation studies show that visual attention and working memory are essential for the model to accurately generate visuo-proprioceptive sequences to achieve linguistically represented goals.
The proposed model integrates vision, proprioception, and language within a predictive coding and active inference framework, enabling the robot to learn associations between linguistic expressions and corresponding sensorimotor behaviors. The model is evaluated through simulation experiments with a robot arm performing object manipulation tasks.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Prasanna Vij... at arxiv.org 04-01-2024
https://arxiv.org/pdf/2403.19995.pdfDeeper Inquiries