Core Concepts
VertAttack exploits the inability of current text classifiers to recognize vertically written words, allowing an attacker to significantly reduce the accuracy of these classifiers while preserving the meaning for human readers.
Abstract
The paper presents VertAttack, a novel adversarial attack that exploits the limitation of current text classifiers in processing vertically written text. Unlike humans, who can easily read text written both horizontally and vertically, most state-of-the-art text classifiers are only able to process text in a horizontal manner, failing to recognize vertically written words.
VertAttack works in two main steps: 1) Word Selection, where it identifies the most important words for the classifier's decision, and 2) Word Transformation, where it rewrites these words vertically. This approach allows VertAttack to significantly reduce the accuracy of four different transformer-based text classifiers across five datasets, with drops of up to 90 percentage points.
The paper also examines the transferability of VertAttack, finding that it can effectively attack classifiers even when the feedback classifier differs from the target classifier. Additionally, a human study confirms that humans can still understand the meaning of the perturbed texts, with 77% accuracy compared to 81% for the original texts.
The authors also investigate initial defenses against VertAttack, such as whitespace removal and text segmentation, and find that a more sophisticated "reverse" defense can mitigate the attack, but is hindered by VertAttack's ability to add chaff characters to further disguise the text.
Overall, VertAttack demonstrates a significant vulnerability in current text classifiers and highlights the need for more robust algorithms that can process text in a manner closer to human understanding.
Stats
VertAttack is able to drop RoBERTa's accuracy on the SST2 dataset from 94% to 13%.
VertAttack causes up to 90 percentage point drops in classification accuracy across the tested datasets and classifiers.
Humans are able to correctly classify 77% of the perturbed texts, compared to 81% of the original texts.
Quotes
"VertAttack exploits the current limitation of classifiers' inability to read text vertically. Specifically, VertAttack perturbs input text by changing information rich words from horizontally to vertically written."
"We find that VertAttack is able to greatly drop the accuracy of 4 different transformer models on 5 datasets. For example, on the SST2 dataset, VertAttack is able to drop RoBERTa's accuracy from 94 to 13%."
"Furthermore, since VertAttack does not replace the word, meaning is easily preserved. We verify this via a human study and find that crowdworkers are able to correctly label 77% perturbed texts perturbed, compared to 81% of the original texts."