The author proposes SeCG, a semantic-enhanced relational learning model based on graph attention, to improve 3D visual grounding by addressing challenges in understanding multiple referred objects. The approach involves enhancing cross-modal alignment and leveraging prior semantic knowledge for better localization performance.
提案されたSeCGは、複数の参照オブジェクトを含む記述の理解を向上させるために、グラフ注意力を活用した意味強化ビジュアルグラウンディングモデルです。
MiKASA verbessert die Genauigkeit der Objekterkennung und das Verständnis räumlicher Beziehungen in 3D-Umgebungen.
A novel interpretable framework for 3D visual grounding that predicts a chain of anchor objects to localize the final target object, enhancing performance and data efficiency.