Alapfogalmak
This study proposes a multimodal interaction system that integrates speech and handheld controller inputs within immersive Virtual Reality environments to enable intuitive and efficient communication between construction workers and robots.
Kivonat
This study introduces a multimodal interaction system that combines speech and handheld controller inputs to facilitate communication between construction workers and robots in a Virtual Reality (VR) environment. The key highlights are:
-
Integration of speech and VR controller inputs:
- Speech commands are captured and processed using an automatic speech recognition system.
- VR controllers enable users to select and interact with objects in the virtual environment.
- Building Information Modeling (BIM) data is leveraged to retrieve semantic information about the selected objects.
-
Bidirectional communication using a chat interface:
- A chat interface powered by a Large Language Model (GPT-4) enables bidirectional communication between the human operator and the robot.
- The chat system verifies the human operator's instructions, seeks clarification when needed, and provides feedback on the robot's understanding and execution of tasks.
-
Evaluation through a drywall installation case study:
- Twelve construction workers participated in the user study, evaluating the proposed multimodal interaction system.
- The results showed low workload, high usability, and a strong preference for the multimodal interaction approach over speech-only interaction.
- The chat system demonstrated a high accuracy rate (92.73%) in detecting and correcting intentional errors in instructions, highlighting the potential of advanced AI assistants in enhancing human-robot collaboration.
The proposed multimodal interaction system integrates diverse software components, including BIM, Robot Operating System (ROS), and a game engine, to enable intuitive and efficient communication between construction workers and robots. The successful implementation and evaluation of the system suggest that such technological integration can substantially advance the integration of robotic assistants in the construction industry.
Statisztikák
Speech interaction commands averaged 8.27 words per instruction, with the longest command containing 19 words and the shortest 5 words.
Multimodal interaction commands were more concise, averaging 6.65 words per instruction, with the longest command containing 18 words and the shortest 3 words.
The chat system accurately identified 51 out of 55 (92.73%) intentional errors in instructions provided by the participants.
Idézetek
"It seemed quicker."
"More accurate input, less opportunity to miss-speak or be misunderstood."
"I feel like it is easier for me to use my words and hands at the same time when I am working."