Core Concepts
Language Models (LLMs) are being explored for their capabilities and challenges in the context of privilege escalation in Linux, showcasing varying levels of success based on different models.
Abstract
The article explores the intersection of Language Models (LLMs) and penetration testing, specifically focusing on Linux privilege escalation. It introduces a benchmark for evaluating LLMs' performance in this context, highlighting strengths and weaknesses. The study includes various LLMs like GPT-3.5-turbo, GPT-4, and locally-run Llama2 models. Results show GPT-4 excels in detecting file-based exploits while local models struggle. Challenges for LLMs include maintaining focus during testing and coping with errors. The impact of prompt designs, in-context learning, and high-level guidance is analyzed.
1. Introduction
Penetration testing plays a crucial role in identifying vulnerabilities.
Linux privilege escalation involves exploiting bugs to gain elevated access.
Large Language Models (LLMs) are explored for automating tasks in pen-testing.
2. Background
Large Language Models (LLMs) have transformed understanding.
Cloud-based commercial LLMs like GPT family are widely used.
Local LLMs like Llama2 aim to reduce privacy impact and costs.
3. Building a Privilege-Escalation Benchmark
A novel benchmark is created to evaluate LLMs' performance.
Test cases cover common privilege escalation scenarios.
Vulnerability classes include SUID/sudo files, docker vulnerabilities, information disclosure, and cron-based exploits.
4. Prototype - Wintermute
Wintermute supervises privilege escalation attempts using SSH connections to target VMs.
Implemented prompts include next-command and update-state for querying LLMs.
5. Evaluation
Different models like GPT-3.5-turbo, GPT-4, and locally-run Llama2 are tested.
Impact of high-level guidance on exploitation rates is analyzed.
Stats
We analyze the impact of different prompt designs on the effectiveness of various Language Models (LLMs) for Linux privilege escalation attacks.