Tactility enhances human and robotic perception. The Touch-Language-Vision (TLV) dataset aligns touch, language, and vision for semantic understanding. TLV-Link fine-tunes a training framework with minimal parameter adjustments. Multimodal alignment is crucial for robotics and AI advancements. Vision-based tactile sensors like GelSight capture detailed information. Existing tactile datasets lack rich textual descriptions hindering cross-modal alignment. TLV bridges this gap with sentence-level descriptions for 20,000 synchronized observations. TLV-Link shows promise in tactile classification tasks with significant performance improvements.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Ning Cheng,Y... lúc arxiv.org 03-18-2024
https://arxiv.org/pdf/2403.09813.pdfYêu cầu sâu hơn