Combining Multiple Modalities to Effectively Communicate Manipulation Tasks to a Robot
A context-aware model that robustly merges uncertain information from multiple modalities, such as gestures and language, to determine the user's intended manipulation task and its parameters, while considering the feasibility of the action in the current scene.