Dutta Barua, D., Sourove, M.S.U.R., Ishmam, M.F., Haider, F., Shifat, F.T., Fahim, M., & Alam, M.F. (Year). ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla.
This research paper introduces a new visual question answering (VQA) dataset for the Bangla language called ChitroJera. The authors aim to address the lack of substantial and culturally relevant VQA datasets for Bangla, a low-resource language with a significant number of speakers.
The researchers collected image-caption pairs from existing Bangla datasets (BanglaLekhaImageCaptions, Bornon, and BNATURE) ensuring regional relevance. After preprocessing and caption selection, they used OpenAI GPT-4 Turbo to generate question-answer pairs based on the images and captions. Linguistic experts then validated and corrected the generated QA pairs. The dataset was split into training, validation, and test sets (80:10:10), with a maximum of two questions per image to ensure diversity.
The authors successfully developed ChitroJera, a large-scale, culturally relevant VQA dataset for Bangla, addressing a significant gap in resources for this language. Their experiments demonstrate the potential of dual-encoder models and the superior performance of LLMs, particularly GPT-4 Turbo, in Bangla VQA tasks. The study emphasizes the importance of culturally relevant datasets and the need for further research in Bangla VQA.
This research significantly contributes to the field of VQA by providing a valuable resource for developing and evaluating VQA models for Bangla. It paves the way for future research in Bangla NLP and computer vision, potentially leading to applications like visual assistance for the visually impaired and enhanced accessibility for Bangla speakers.
The study acknowledges the limited size of the pretraining dataset for dual-encoder models and suggests exploring larger datasets for improved performance. Future research could focus on developing more sophisticated fusion techniques for dual-encoder models and investigating the textual bias observed in VQA models. Additionally, exploring other VQA tasks beyond simple question answering could further advance the field.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問