Adversarial Text Attacks on NLP Models Using Multiple Techniques
This paper explores three distinct adversarial attack mechanisms - BERT-on-BERT attack, PWWS attack, and Fraud Bargain's Attack (FBA) - to assess the vulnerability of text classifiers like BERT to adversarial perturbations in the input text. The analysis reveals that the PWWS attack emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.