Konsep Inti
Large Language Models can be effectively utilized to automatically generate a wide variety of convincing honeytokens, which can serve as deceptive security mechanisms to detect and deter cyber attacks.
Abstrak
The paper investigates the use of Large Language Models (LLMs) for the automated generation of honeytokens, which are false pieces of information designed to lure and expose unauthorized access attempts.
The authors first identified 7 different types of honeytokens that can be generated, including configuration files, databases, log files, and honeywords (fake passwords). They then developed a modular approach to construct prompts for LLMs, using 4 distinct building blocks: generator instructions, user input, special instructions, and output format.
To quantitatively evaluate the performance, the authors focused on two honeytoken types - robots.txt files and honeywords. For robots.txt, they compared the generated files against a dataset of the top 1000 websites, assessing factors like the number of allow/disallow entries and path segments. For honeywords, they used a tool to measure the flatness (distinguishability) of the generated passwords compared to real ones.
The results show that LLMs, particularly GPT-3.5 and GPT-4, are capable of generating a wide variety of convincing honeytokens across different domains. The authors found that the optimal prompt structure varied across different LLMs, and that honeywords generated by GPT-3.5 were less distinguishable from real passwords compared to previous methods.
Overall, the work demonstrates the potential of leveraging generic LLMs to automatically create deceptive honeytokens, which can enhance cyber security defenses by luring and exposing attackers.
Statistik
The robots.txt files of the top 1000 websites had an average of 10.27 ± 35.13 allow entries and 76.35 ± 228.98 disallow entries.
The honeywords generated by GPT-3.5 had a 15.15% success rate in being distinguished from real passwords, compared to 29.29% for previous methods.
Kutipan
"The findings of this work demonstrate that generic LLMs are capable of creating a wide array of honeytokens using the presented prompt structures."
"Honeywords generated by GPT-3.5 were found to be less distinguishable from real passwords compared to previous methods of automated honeyword generation."