Core Concepts
While both text and voice instructions often cover the basic chart elements and element organization, voice descriptions exhibit a greater variety of command formats, element characteristics, and complex linguistic features compared to text instructions.
Abstract
The researchers conducted a user study to collect 100 free-form voice instructions for chart creation and compared them to 200 text descriptions from the NLV Corpus and 200 synthetic text descriptions from the nvBench dataset.
The analysis revealed the following key insights:
Voice Instructions:
Participants used a variety of input strategies, including commands (79%), commands and questions (3%), commands and queries (14%), queries (1%), and questions (3%).
The voice instructions covered 5 main types of elements: chart elements (82% of descriptions), element characteristics (24%), element organization (28%), format of command (38%), and linguistic features (61%).
Voice prompts were generally longer and more conversational, reflecting natural speech patterns.
Text Instructions:
Participants predominantly specified "chart elements" (98%), such as label, chart type, implicit title, and axis.
They rarely mentioned "element characteristics" (7%) and "element organization" (5%).
Synthetic Text Instructions:
Focused on "chart elements" (93%) and "element organization" (68%), including chart type, label, implicit title, axis, scale, and order.
The findings indicate inherent semantic differences between spoken and text-based prompts for chart authoring, highlighting the need for tailored design approaches to accommodate the comprehension and processing of the spoken language's natural flow and complexity in voice-based chart authoring systems.
Stats
The average description word count for the voice dataset was 175.41 ± 114.12, the NLV Corpus dataset was 10.06 ± 4.58, and the nvBench dataset was 25.19 ± 7.74.
Quotes
"While both text and voice instructions tend to include basic chart elements and element organization, voice prompts have a variety of command formats, element characteristics, and complex linguistic features."
"Voice-based systems should be able to effectively parse and interpret unique linguistic structures, and recognize and perform on a variety of command formats."