Hartsock, A., Pereira, L.M., & Fink, G. (2024). Towards Characterizing Cyber Networks with Large Language Models. arXiv preprint arXiv:2411.07089v1.
This paper explores the application of large language models (LLMs) for threat hunting in cybersecurity, aiming to characterize network entities and their behaviors by analyzing Zeek network traffic logs.
The researchers developed CLEM (Cyber Log Embeddings Model), a tool that leverages a BERT model trained on Zeek connection logs. CLEM analyzes network traffic in overlapping time windows, deliberately overfitting to each window to capture specific behavioral patterns. The model generates embeddings for IP addresses and connections, which are then dimensionally reduced using UMAP for visualization and clustering. The effectiveness of CLEM's clustering is evaluated using the Adjusted Rand Index (ARI) by comparing it to expert-annotated labels.
CLEM successfully clustered network entities based on their behavior, showing significant correlation with expert-derived classifications. The model demonstrated an ARI of 0.82 for connection embeddings in the PNNL dataset, indicating a strong agreement between CLEM's unsupervised clustering and expert knowledge.
The research suggests that LLMs like BERT hold significant promise for threat hunting applications. By generating behavioral embeddings from network logs, CLEM can identify anomalous activity that deviates from established patterns, providing valuable insights for cybersecurity professionals.
This research introduces a novel approach to threat hunting that leverages the power of LLMs for behavioral analysis. CLEM's ability to identify anomalies based on deviations from learned patterns offers a valuable tool for detecting and mitigating cyber threats.
The study acknowledges the need for further research with larger and more diverse datasets to validate CLEM's effectiveness in real-world scenarios. Future work will focus on developing a "Network Storm Tracker" to visualize network behavior over time and improve the interpretation of embedding movements for threat hunters.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Alaric Harts... at arxiv.org 11-12-2024
https://arxiv.org/pdf/2411.07089.pdfDeeper Inquiries