Taxonomy and Techniques for Trojan Attacks on Neural Models of Source Code
Trojans are vulnerabilities in neural models of source code that can cause the models to output attacker-determined predictions when a special trigger is present in the input. This survey establishes a taxonomy of trojan concepts, analyzes recent works in explainable AI and trojan AI for code, and identifies actionable insights that can guide future research in securing neural models of code against trojan attacks.