MiKASA Transformer enhances 3D visual grounding with scene-aware object encoding and multi-key-anchor technique.
MiKASA Transformer combines scene-aware object encoding and multi-key-anchor technique to enhance 3D visual grounding accuracy and explainability.