Core Concepts
The rapid proliferation of Large Language Model (LLM)-generated text on the web may lead to a "Spiral of Silence" effect, where LLM-generated content consistently outperforms human-authored content in search rankings, diminishing the presence and impact of human contributions online.
Abstract
This study investigates the short-term and long-term effects of integrating LLM-generated text into Retrieval-Augmented Generation (RAG) systems, using Open Domain Question Answering (ODQA) as a case study.
In the short term, the introduction of LLM-generated text generally improves the retrieval accuracy of RAG systems, but the impact on question-answering (QA) performance is mixed, depending on the dataset and retrieval strategy.
In the long term, however, a marked decrease in retrieval effectiveness is observed, while the QA performance remains relatively stable. Further analysis reveals a bias in search systems towards LLM-generated texts, which consistently rank higher than human-written content. As LLM-generated texts increasingly dominate the search results, the visibility and influence of human-authored web content diminish, fostering a digital "Spiral of Silence" effect.
This trend risks creating an imbalanced information ecosystem, where the unchecked proliferation of erroneous LLM-generated content may result in the marginalization of accurate information. The authors urge the academic community to take heed of this potential issue, ensuring a diverse and authentic digital information landscape.
Stats
The introduction of LLM-generated text into the dataset led to an average increase of 21.4% in Acc@5 retrieval accuracy for the NQ dataset and 19.4% for the PopQA dataset from the first iteration to the last.
Quotes
"The rapid proliferation of Large Language Model (LLM)-generated text on the web may lead to a "Spiral of Silence" effect, where LLM-generated content consistently outperforms human-authored content in search rankings, diminishing the presence and impact of human contributions online."
"As LLM-generated texts increasingly dominate the search results, the visibility and influence of human-authored web content diminish, fostering a digital "Spiral of Silence" effect."