HAMMR: A Hierarchical and Compositional Approach for Solving a Broad Range of Visual Question Answering Tasks
HAMMR is a hierarchical and compositional approach that leverages specialized agents to solve a broad range of visual question answering tasks, outperforming naive extensions of existing LLM+tools methods by 19.5% and achieving state-of-the-art results.