Existing text-to-visualization models exhibit inadequate robustness against input variations, and a novel Retrieval-Augmented Generation (RAG) framework called GRED is proposed to address this challenge.
While both text and voice instructions often cover the basic chart elements and element organization, voice descriptions exhibit a greater variety of command formats, element characteristics, and complex linguistic features compared to text instructions.