Exploiting auxiliary captions significantly boosts video grounding performance by providing context information and improving cross-modal interactions.