toplogo
Entrar

Evaluating Creativity in Human-Written Stories vs. Large Language Model-Generated Stories


Conceitos essenciais
The authors propose the Torrance Test for Creative Writing (TTCW) to evaluate creativity in human-written stories compared to those generated by Large Language Models (LLMs).
Resumo
The study introduces TTCW, a protocol based on the Torrance Test of Creative Thinking, to assess creativity in short stories. It involves 10 experts evaluating 48 stories written by professionals and LLMs. Results show a significant gap in creativity between expert-written and LLM-generated stories. Researchers explore the potential of LLMs in creative writing tasks but face challenges in objectively evaluating creativity. The TTCW protocol is designed based on TTCT dimensions: Fluency, Flexibility, Originality, and Elaboration. Experts assess 48 stories using TTCW tests, revealing LLMs' lower performance. The study highlights the importance of narrative pacing, scene vs exposition balance, language proficiency, originality in theme/content/thought/form, character development, and rhetorical complexity. Findings suggest that expert-written stories outperform LLM-generated ones across all dimensions.
Estatísticas
Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals. Experts reach moderate agreement on individual tests but strong agreement when considering all tests in aggregate. GPT4 is more likely to pass tests associated with Originality, while Claude V1.3 excels in Fluency, Flexibility, and Elaboration.
Citações
"Originality suggests that the piece isn’t cliche." - Expert W3 "Does each character feel developed at the appropriate complexity level?" - Expert W2

Principais Insights Extraídos De

by Tuhin Chakra... às arxiv.org 03-11-2024

https://arxiv.org/pdf/2309.14556.pdf
Art or Artifice? Large Language Models and the False Promise of  Creativity

Perguntas Mais Profundas

How can LLMs be improved to better assess creative writing?

To enhance the ability of Large Language Models (LLMs) in assessing creative writing, several improvements can be considered. Firstly, incorporating more diverse training data that includes a wide range of literary works and styles could help LLMs develop a deeper understanding of creative writing elements. Additionally, fine-tuning the models specifically for creative writing tasks through targeted training objectives and prompts could improve their performance in this domain. Implementing feedback mechanisms where LLMs receive input from human experts on their assessments and adjust their criteria accordingly could also enhance their evaluation capabilities.

What implications does this study have for AI-assisted creative writing tools?

This study sheds light on the current limitations of using Large Language Models (LLMs) as assessors for creativity in writing tasks. The findings suggest that while LLMs show potential in generating content, they struggle to accurately evaluate the creativity of written pieces compared to human experts. This has significant implications for AI-assisted creative writing tools, indicating the need for further refinement and development to ensure accurate assessment of creativity in text generation tasks. It underscores the importance of human oversight and collaboration in leveraging AI technologies for creative endeavors.

How might cultural differences impact the evaluation of creativity using TTCW?

Cultural differences can influence the evaluation of creativity using frameworks like Torrance Test for Creative Writing (TTCW). Different cultures may have varying norms, values, storytelling traditions, and interpretations of what constitutes creativity in literature. These cultural nuances can impact how individuals perceive originality, fluency, flexibility, elaboration, and other dimensions assessed by TTCW tests. Evaluators from diverse cultural backgrounds may bring unique perspectives to the assessment process but must also consider potential biases or preferences shaped by their cultural context when evaluating creative works across different genres or styles.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star