https://github.com/aielte-research/HackSynth

LLM Agent and Evaluation Framework for Autonomous Penetration Testing
https://github.com/aielte-research/HackSynth

ai ctf ctf-tools cybersecurity llms penetration-testing

Last synced: 4 months ago
JSON representation

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Host: GitHub
URL: https://github.com/aielte-research/HackSynth
Owner: aielte-research
License: agpl-3.0
Created: 2024-11-27T16:02:49.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-12-03T23:53:22.000Z (5 months ago)
Last Synced: 2024-12-04T00:28:58.228Z (5 months ago)
Topics: ai, ctf, ctf-tools, cybersecurity, llms, penetration-testing
Language: Python
Homepage:
Size: 1.91 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

awesome-llm-agent-security - HackSynth - Vulnerability assessment<br>- Attack simulation<br>- Security validation | (📚 Research & Publications / 🔒 OWASP Top 10 for AI Agents (Non official))
awesome_ai_agents - Hacksynth - LLM Agent and Evaluation Framework for Autonomous Penetration Testing (Building / Testing)
awesome-llm-agent-security - HackSynth - Vulnerability assessment<br>- Attack simulation<br>- Security validation | (📚 Research & Publications / 🔒 OWASP Top 10 for AI Agents (Non official))

README

# HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
The paper can be found on [arXiv](https://arxiv.org/abs/2412.01778).

## Introduction

We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing.
HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively.
To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire.
These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.

## Using the repository
- You will have to create a Hugging Face and a Neptune.ai account
- Copy your API keys to the `.env` file, and set the desired CUDA devices, based on the `.env_example`
- [Set up the PicoCTF benchmark](picoctf_bench/README.md)
- [Set up the OverTheWire benchmark](overthewire_bench/README.md)
- Start the HackSynth Agent
- Install the environment:
```
python -m venv cyber_venv
source cyber_venv/bin/activate
pip install -r requirements.txt
```
- Start the benchmark with the following:
```
python run_bench.py -b benchmark.json -c config.json
```
The `benchmark.json` should be one of the generated `benchmark_solved.json` files, or an equivalently structured file.
The configuration files used by us for the measurements in the paper are also available in the configs folder.

## How to Cite
If you use this code in your work or research, please cite the corresponding paper:
```bibtex
@misc{muzsai2024hacksynthllmagentevaluation,
title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing},
author={Lajos Muzsai and David Imolai and András Lukács},
year={2024},
eprint={2412.01778},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2412.01778},
}
```
## Contributors
- Lajos Muzsai (muzsailajos@protonmail.com)
- David Imolai (david@imol.ai)
- András Lukács (andras.lukacs@ttk.elte.hu)

## License
The project uses the GNU AGPLv3 license.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aielte-research/HackSynth

Awesome Lists containing this project

README