The content delves into the susceptibility of modern sequence-to-sequence relevance models to adversarial attacks through prompt injection and document rewriting. The study reveals how these attacks can manipulate model rankings and emphasizes the need for safeguards against such vulnerabilities in production systems.
The authors analyze the impact of query-independent prompt injection and LLM-based document rewriting on various relevance models, showcasing significant manipulations in model rankings. The experiments conducted on TREC Deep Learning track datasets demonstrate the effectiveness of these adversarial attacks, especially on neural relevance models like monoT5.
Furthermore, the study highlights the implications for using neural relevance models in production without robust defenses against such attacks. It warns against relying solely on prompt-based models for automated relevance judgments and ground truth generation in retrieval evaluation.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문