Основные понятия
Large Language Models can be leveraged to accurately infer finite state machines from complex network protocol implementations, enabling enhanced security analysis and protocol understanding.
Аннотация
The paper introduces PROTOCOLGPT, a novel approach that utilizes Large Language Models (LLMs) to infer finite state machines (FSMs) from network protocol implementations. The key highlights are:
-
Motivation and Challenges:
- Different implementations of the same protocol can have significantly varied state machines, highlighting the importance of extracting FSMs from actual implementations rather than just protocol specifications.
- LLMs show potential for inferring protocol state information from source code, but face limitations in directly generating complete FSMs due to the complexity and size of protocol implementations.
-
PROTOCOLGPT Methodology:
- Code Preprocessing: Filters and partitions the protocol implementation code to isolate the sections relevant to the state machine, making it more amenable to LLM processing.
- FSM Extraction: Employs a step-by-step prompt engineering approach to guide the LLM in systematically extracting protocol states, message types, and state transition relationships.
- Ensures the LLM's output adheres to a predefined, machine-readable FSM format.
-
Evaluation:
- Tested PROTOCOLGPT on six widely-used network protocols: IKEv2, TLS1.3, TLS1.2, BGP, RTSP, and L2TP.
- Achieved an average precision of over 90% in extracting protocol state machines, outperforming existing approaches like RFCNLP.
- Identified significant differences in the state machines across various implementations of the same protocol.
- Demonstrated that integrating the FSMs inferred by PROTOCOLGPT with the protocol fuzzer AFLNet can enhance code coverage by 10% compared to using FSMs from RFCNLP.
The paper showcases the potential of LLMs in accurately inferring protocol state machines, which can greatly benefit security analysis, protocol understanding, and testing of network protocol implementations.
Статистика
"The token count within the implementations of protocols such as IKEv2, TLS1.3, TLS1.2, RTSP, BGPv4, and L2TP significantly surpasses the input capabilities of the GPT-4 model."
"The average precision and recall of state transitions extracted by PROTOCOLGPT exceed 90%."
"Fuzzers enhanced by PROTOCOLGPT achieve a 10% increase in code coverage compared to those using FSMs inferred by RFCNLP."
Цитаты
"Finite state machine serves as a fundamental cornerstone in applications from vulnerability mining and software engineering to network protocols."
"The state machines extracted form specific protocol implementations instead of RFCs are more precise and important for protocol security analysis."
"Integrating this approach with protocol fuzzing has notably enhanced AFLNet's code coverage by 10% over RFCNLP, showcasing the considerable potential of LLMs in advancing network protocol security analysis."