Diversification along the intersection region of adversarial trajectory enhances transferability in vision-language attacks.