Core Concepts
The authors propose the Torrance Test for Creative Writing (TTCW) to evaluate creativity in human-written stories compared to those generated by Large Language Models (LLMs).
Abstract
The study introduces TTCW, a protocol based on the Torrance Test of Creative Thinking, to assess creativity in short stories. It involves 10 experts evaluating 48 stories written by professionals and LLMs. Results show a significant gap in creativity between expert-written and LLM-generated stories.
Researchers explore the potential of LLMs in creative writing tasks but face challenges in objectively evaluating creativity. The TTCW protocol is designed based on TTCT dimensions: Fluency, Flexibility, Originality, and Elaboration. Experts assess 48 stories using TTCW tests, revealing LLMs' lower performance.
The study highlights the importance of narrative pacing, scene vs exposition balance, language proficiency, originality in theme/content/thought/form, character development, and rhetorical complexity. Findings suggest that expert-written stories outperform LLM-generated ones across all dimensions.
Stats
Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals.
Experts reach moderate agreement on individual tests but strong agreement when considering all tests in aggregate.
GPT4 is more likely to pass tests associated with Originality, while Claude V1.3 excels in Fluency, Flexibility, and Elaboration.
Quotes
"Originality suggests that the piece isn’t cliche." - Expert W3
"Does each character feel developed at the appropriate complexity level?" - Expert W2