Exploring the zero-shot capabilities of foundation models in Visual Question Answering tasks through an adaptive multi-agent system.