Core Concepts
NMT models can effectively generate offensive PowerShell code for security applications from natural language descriptions, with fine-tuning and specialized training data providing significant performance improvements.
Abstract
This research study explores the use of Neural Machine Translation (NMT) models to automatically generate offensive PowerShell code from natural language descriptions. The key findings are:
Zero-shot learning experiments showed that existing NMT models have limited ability to generate valid PowerShell code, often defaulting to other programming languages like Python. Fine-tuning the models on a specialized dataset of offensive PowerShell code significantly improved their performance.
The impact of pre-training and fine-tuning varied across different NMT models. While pre-training generally improved the performance of CodeT5+ and CodeGPT, especially with a limited number of fine-tuning epochs, CodeGen did not consistently benefit from pre-training.
Static analysis of the generated code showed high syntax accuracy, indicating the models' strong capability to generate syntactically correct PowerShell code. However, a significant number of warnings were identified, suggesting potential issues or suboptimal coding practices.
Execution analysis revealed that despite textual differences, the generated code closely aligned with the intended malicious activities in the ground truth, in terms of events occurring in the system (e.g., filesystem, network, registry).
The fine-tuned models outperformed the publicly available ChatGPT model across all evaluation metrics, demonstrating the advantage of specializing the models on the offensive PowerShell code generation task.
Stats
"Start-Process ${WebBrowserPassViewPath} ; Start-Sleep -Second 4 ; Stop-Process -Name "WebBrowserPassView""
"powershell.exe -ExecutionPolicy Bypass -Command " Invoke-Mimikatz ""
"Invoke-ATHCompiledHTMLHelp -InfoTechStorageHandler $ { infotech_storage_handler } -HHFilePath $ { hh_file_path } -CHMFilePath $ { chm_file_path }"
"$wininit = Get-Process wininit | Invoke-TokenManipulation -CreateProcess 'cmd.exe'"
Quotes
"NMT models can effectively generate offensive PowerShell code for security applications from natural language descriptions, with fine-tuning and specialized training data providing significant performance improvements."
"The fine-tuned models outperformed the publicly available ChatGPT model across all evaluation metrics, demonstrating the advantage of specializing the models on the offensive PowerShell code generation task."