Core Concepts
The increasing adoption of Large Language Models (LLMs) in Human-Computer Interaction (HCI) research poses new challenges for ensuring reproducibility, which must be carefully navigated to maintain the credibility and validity of HCI findings.
Abstract
This article explores the impact of Large Language Models (LLMs) on reproducibility in Human-Computer Interaction (HCI) research. It highlights several key issues that the HCI community must address:
Learning from Past Reproducibility Challenges: The article draws parallels between p-hacking in quantitative research and the potential for "prompt hacking" when using LLMs. It emphasizes the need to proactively address these new reproducibility pitfalls, rather than repeating past mistakes.
Bias Across User Experience Research: LLMs can amplify existing biases in HCI research by reflecting the perspectives and experiences of the limited data used in their training. The article suggests strategies to mitigate this, such as using multiple diverse LLMs and critically examining the interplay between LLM biases and human subject data.
LLMs for Cross-Validation and Analysis Support: While LLMs present opportunities to support data analysis and validation across HCI's diverse research methods, the article cautions against over-reliance on LLMs, which could introduce new reproducibility issues.
Defining New Reporting Requirements and Educating the Community: The article proposes establishing clear documentation requirements for LLM usage in HCI research, providing educational resources for the community, and incentivizing the development of transparent and accessible LLMs.
Addressing the Risk of Increased Research Pressure: The article acknowledges the potential for LLMs to increase publication pressure, which could lead to premature adoption and suboptimal practices. It suggests managing expectations, quickly developing and communicating best practices, and educating peer reviewers to mitigate these risks.
Overall, the article advocates for a comprehensive and proactive approach to addressing reproducibility challenges posed by the increasing use of LLMs in HCI research, with the goal of maintaining the credibility and validity of HCI findings.
Stats
ChatGPT reached 1 million users within five days after its release and currently has over 180 million users.
Quotes
"By using LLMs, we might make UCD cheaper and hence more widely applicable; at the same time, though, we put pressure on the field to move this way to stay competitive. Hence, the transparency about how UCD is conducted and to what extent models are used is critical."