Visual Question Answering

insight - Visual Question Answering

A Survey of Datasets and Algorithms for Visual Question Answering

This survey paper provides a comprehensive overview of existing datasets and algorithms in the field of Visual Question Answering (VQA), categorizing and analyzing their strengths, weaknesses, and specific focuses.

Multimodal Commonsense Knowledge Distillation Framework for Enhanced Visual Question Answering Using Graph Convolutional Networks

This paper proposes a novel graph-based multimodal commonsense knowledge distillation framework to enhance Visual Question Answering (VQA) by integrating commonsense knowledge, visual features, and question representations into a unified graph structure processed by a Graph Convolutional Network (GCN).

SimpsonsVQA: A Cartoon Image Dataset for Visual Question Answering and Answer Assessment

This paper introduces SimpsonsVQA, a novel dataset based on The Simpsons cartoon imagery, designed to advance Visual Question Answering (VQA) research beyond photorealistic images and address challenges in question relevance and answer correctness assessment, particularly for educational applications.

Development of ChitroJera: A Culturally Relevant Visual Question Answering Dataset for the Bangla Language

This paper introduces ChitroJera, a new large-scale, culturally relevant visual question answering (VQA) dataset for the Bangla language, addressing the lack of such resources and enabling the development of more effective VQA models for this under-resourced language.

EchoSight: A Novel Multimodal Retrieval-Augmented Generation Framework for Knowledge-Based Visual Question Answering

EchoSight, a novel retrieval-augmented vision-language system, excels in knowledge-based visual question answering by employing a dual-stage search mechanism that integrates visual-only retrieval with multimodal reranking, significantly improving accuracy over existing VLMs.

DIETCOKE: A Novel Approach to Zero-Shot Knowledge-Based Visual Question Answering by Ensembling Multiple Question-Answering Strategies

DIETCOKE, a novel method for zero-shot knowledge-based visual question answering (VQA), leverages the strengths of multiple question-answering strategies and rationale-based ensembles to achieve state-of-the-art performance on challenging K-VQA datasets.

Comprehensive Analysis of Visual Question Answering: Datasets, Methods, and Emerging Trends

Visual Question Answering (VQA) is a rapidly evolving field that combines elements of computer vision and natural language processing to generate answers to questions about visual inputs. This survey provides a comprehensive overview of the VQA domain, including its applications, problem definitions, datasets, methods, and emerging trends.

Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion

The RankVQA model leverages a ranking-inspired hybrid training strategy and sophisticated multimodal fusion techniques to significantly enhance the performance of Visual Question Answering systems.

Enhancing Visual Question Answering through Comparative Analysis and Convolutional Textual Feature Extraction

Employing convolutional layers to extract multi-scale local textual features can improve performance on Visual Question Answering tasks compared to complex sequential models.

Advancing Visual Question Answering: Exploring Generative Adversarial Networks, Autoencoders, and Attention Mechanisms

This study explores innovative methods, including Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms, to improve the performance of Visual Question Answering (VQA) systems.

About

Products

Resources