LCV2: Efficient Pretraining-Free Framework for Grounded Visual Question Answering
Efficient pretraining-free framework LCV2 connects VQA and visual grounding models using a Large Language Model, achieving competitive performance without extensive pre-training.