The content discusses creating adversarial examples for math word problems to test large language models. By changing numeric values while maintaining coherence and difficulty, the method significantly degrades LLMs' performance. The study investigates shared vulnerabilities among LLMs and proposes cost-effective attack strategies.
Large language models (LLMs) have transformed education, raising concerns about evaluating students fairly in their presence. Efforts like plagiarism detection struggle with LLM-generated content. Adversarial attacks on LLMs aim to generate unsolvable math problems while preserving original structure and difficulty. By editing numeric values in math word problems, the method challenges LLMs without altering coherence or complexity. Experiments show a significant decrease in LLMs' math problem-solving ability using adversarial examples.
The study introduces a new paradigm for fair evaluation in education by creating unsolvable math problems for large language models (LLMs). By leveraging abstract syntax trees, the method generates adversarial examples that degrade LLMs' performance while maintaining original problem structure and difficulty. The research identifies vulnerabilities among LLMs and proposes cost-effective strategies to attack high-cost models.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Roy Xie,Chen... ב- arxiv.org 02-29-2024
https://arxiv.org/pdf/2402.17916.pdfשאלות מעמיקות