The content delves into the susceptibility of modern sequence-to-sequence relevance models to adversarial attacks through prompt injection and document rewriting. The study reveals how these attacks can manipulate model rankings and emphasizes the need for safeguards against such vulnerabilities in production systems.
The authors analyze the impact of query-independent prompt injection and LLM-based document rewriting on various relevance models, showcasing significant manipulations in model rankings. The experiments conducted on TREC Deep Learning track datasets demonstrate the effectiveness of these adversarial attacks, especially on neural relevance models like monoT5.
Furthermore, the study highlights the implications for using neural relevance models in production without robust defenses against such attacks. It warns against relying solely on prompt-based models for automated relevance judgments and ground truth generation in retrieval evaluation.
Til et andet sprog
fra kildeindhold
arxiv.org
Vigtigste indsigter udtrukket fra
by Andr... kl. arxiv.org 03-13-2024
https://arxiv.org/pdf/2403.07654.pdfDybere Forespørgsler