Trojans in Large Language Models of Code: A Critical Review and Taxonomy of Trigger-Based Attacks
Trojans or backdoors in neural models of code can enable adversaries to intentionally insert hidden triggers that cause the model to behave in unintended or malicious ways. This work presents a comprehensive taxonomy of trigger-based trojans in large language models of code, and a critical review of recent state-of-the-art poisoning techniques.