Core Concepts
Introducing Choice-75 dataset for decision branching in script learning.
Abstract
The Choice-75 dataset challenges intelligent systems to make decisions based on descriptive scenarios, containing 75 scripts and over 600 scenarios. It focuses on event-to-event relationships and script generation, emphasizing the importance of understanding how events interconnect. The dataset includes goals, options, scenarios, and ground-truth choices, with difficulty levels based on human judgment. Human-in-the-loop data generation was used to create challenging examples for the dataset. State-of-the-art language models were tested on the dataset, showing decent performance but room for improvement in hard scenarios.
Stats
Choice-75 contains 75 scripts and over 600 scenarios.
Fleiss’ kappa coefficient for annotator agreement is 0.59.
Human accuracy is 0.74 compared to model accuracy of 0.60.
Quotes
"We propose Choice-75, the first benchmark that challenges intelligent systems to make decisions given descriptive scenarios."
"Although they demonstrate overall decent performance, there is still notable headroom in hard scenarios."