Основные понятия
Large language models can be effectively used to assess the novelty of scholarly publications by leveraging retrieval of similar papers to simulate the review process.
Аннотация
The paper introduces a scholarly novelty benchmark called SchNovel to evaluate the ability of large language models (LLMs) to assess the novelty of research papers. The benchmark consists of 15,000 pairs of papers across six fields sampled from the arXiv dataset, with publication dates spanning 2 to 10 years apart. The authors assume that the more recently published paper in each pair is more novel.
The paper also proposes a novel method called RAG-Novelty, which simulates the review process taken by human reviewers by leveraging the retrieval of similar papers to assess novelty. Extensive experiments are conducted to provide insights into the capabilities of different LLMs in assessing novelty and demonstrate that RAG-Novelty outperforms recent baseline models.
The key findings include:
- Pairwise comparison approaches outperform pointwise evaluation methods in assessing novelty.
- The performance of LLMs varies across different research fields, with computer science showing the highest accuracy and mathematics and physics exhibiting lower accuracy.
- The publication start year has less impact on LLM performance compared to the gap between the publication years of the papers.
- Metadata such as author affiliation can introduce biases in LLM-based novelty assessment.
- RAG-Novelty, which leverages retrieval of similar papers, significantly outperforms recent baseline models in assessing novelty.
Статистика
The paper reports the accuracy of different LLMs in assessing the novelty of research papers across various fields, publication start years, and year gaps between the paper pairs.
Цитаты
"Recent studies have evaluated the creativity/novelty of large language models (LLMs) primarily from a semantic perspective, using benchmarks from cognitive science. However, accessing the novelty in scholarly publications is a largely unexplored area in evaluating LLMs."
"To further improve novelty assessment, we propose RAG-Novelty, a retrieval-augmented generation method. This method assumes that more novel papers will retrieve more recently published works, enhancing the novelty prediction."