{"id":29651312,"url":"https://github.com/cyberark/fuzzyai","last_synced_at":"2025-07-22T05:06:36.390Z","repository":{"id":268217549,"uuid":"897865415","full_name":"cyberark/FuzzyAI","owner":"cyberark","description":"A powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify and mitigate potential jailbreaks in their LLM APIs.","archived":false,"fork":false,"pushed_at":"2025-07-13T07:37:48.000Z","size":18322,"stargazers_count":638,"open_issues_count":3,"forks_count":73,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-07-13T09:26:40.294Z","etag":null,"topics":["ai","ai-red-team","fuzzing","jailbreak","jailbreaking","llm","llm-evaluation","llm-security","llms","security"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cyberark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.MD","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-03T11:27:53.000Z","updated_at":"2025-07-13T07:37:51.000Z","dependencies_parsed_at":"2025-06-29T08:34:51.541Z","dependency_job_id":null,"html_url":"https://github.com/cyberark/FuzzyAI","commit_stats":null,"previous_names":["cyberark/fuzzyai"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cyberark/FuzzyAI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberark%2FFuzzyAI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberark%2FFuzzyAI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberark%2FFuzzyAI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberark%2FFuzzyAI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cyberark","download_url":"https://codeload.github.com/cyberark/FuzzyAI/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cyberark%2FFuzzyAI/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266430661,"owners_count":23927169,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-red-team","fuzzing","jailbreak","jailbreaking","llm","llm-evaluation","llm-security","llms","security"],"created_at":"2025-07-22T05:06:35.721Z","updated_at":"2025-07-22T05:06:36.376Z","avatar_url":"https://github.com/cyberark.png","language":"Jupyter Notebook","readme":"\u003cp align=\"center\"\u003e\n   \u003ch1 align=\"center\"\u003eFuzzyAI Fuzzer\u003c/h1\u003e\n   \u003cp align=\"center\"\u003e\n      \u003cimg src=\"/src/fuzzyai/resources/logo.png\" alt=\"Project Logo\" width=\"200\" style=\"vertical-align:middle; margin-right:10px;\" /\u003e\u003cbr/\u003e\n      The FuzzyAI Fuzzer is a powerful tool for automated LLM fuzzing. It is designed to help developers and security researchers identify jailbreaks and mitigate potential security vulnerabilities in their LLM APIs. \n   \u003c/p\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n   \u003ca href=\"https://github.com/cyberark/fuzzyai/commits/main\"\u003e\n      \u003cimg alt=\"GitHub last commit\" src=\"https://img.shields.io/github/last-commit/cyberark/fuzzyai\"\u003e\n   \u003c/a\u003e\n   \u003ca href=\"https://github.com/cyberark/fuzzyai\"\u003e\n      \u003cimg alt=\"GitHub code size in bytes\" src=\"https://img.shields.io/github/languages/code-size/cyberark/FuzzyAI\"\u003e\n   \u003c/a\u003e\n   \u003ca href=\"https://github.com/cyberark/fuzzyai/blob/master/LICENSE\" \u003e\n      \u003cimg alt=\"GitHub License\" src=\"https://img.shields.io/github/license/cyberark/fuzzyai\"\u003e\n   \u003c/a\u003e\n   \u003ca href=\"https://discord.gg/Zt297RAK\"\u003e\n      \u003cimg alt=\"Discord\" src=\"https://img.shields.io/discord/1330486843938177157\"\u003e\n   \u003c/a\u003e\n   \u003cbr/\u003e\u003cbr/\u003e\n   \u003cimg alt=\"fuzzgif\" src=\"/src/fuzzyai/resources/fuzz.gif\" /\u003e\n   \u003cbr/\u003e\n\u003c/p\u003e\n\n## Getting Started\n### Quick start #1 - Using an existing python project\n1. Install fuzzyai\n   ```bash\n   # Use either pip or any other package manager\n   pip install git+https://github.com/cyberark/FuzzyAI.git\n   ```\n\n2. Run the fuzzer\n   ```bash\n   fuzzyai fuzz -h\n   ```\n\n### Quick start #2 - or as a standalone project\n1. Clone the repository:\n   ```bash\n   git clone git@github.com:cyberark/FuzzyAI.git\n   cd FuzzyAI\n   ```\n\n2. Install dependencies using [Poetry](https://python-poetry.org/):\n   ```bash\n   poetry run pip install -e .\n   ```\n\n3. Run the fuzzer:\n   ```bash\n   poetry run fuzzyai fuzz -h\n   ```\n\n4. Optional: Install [ollama](https://ollama.com/download/), and download a model for local usage:\n   ``` # Running the command will download and install (if not) llama3.1, which is about 4.7 GB in size and is an 8B parameters model. Llama3.1 hat can be substituted with any other open-source model that is supported by ollama.\n   ollama pull llama3.1\n   ollama show llama3.1 # verify model installation\n   ```\n   \n   Alternativly, you can use the Web UI\n\n## Web UI (Experimental)\n![FZAI](/src/fuzzyai/resources/webui.png)\n\n1. Run the Web UI (make sure you completed either of the installation steps from above):\n   ```bash\n   poetry run fuzzyai webui\n   ```\n\n## Notebooks\nWe've included interactive Jupyter notebooks you can use under [src/fuzzyai/resources/notebooks/](https://github.com/cyberark/FuzzyAI/tree/main/src/fuzzyai/resources/notebooks).  \nFor more information, see [notebooks wiki](https://github.com/cyberark/FuzzyAI/wiki#notebooks).\n\n## Datasets\n\nWe've included some datasets you can use under [resources/](https://github.com/cyberark/FuzzyAI/tree/main/src/fuzzyau/resources). For more information, see [datasets wiki](https://github.com/cyberark/FuzzyAI/wiki#datasets).\n\n## Documentation\n\nExplore detailed usage instructions in the [Wiki](https://github.com/cyberark/FuzzyAI/wiki).\n\n## Examples\nIf you're using poetry, make sure you've activated the venv (or by prepending 'poetry run' to the command line\n\n### A default attack (just evaluate the prompt 'Harmful_Prompt') using llama over Ollama\n```bash\nfuzzyai fuzz -m ollama/llama3.1 -a def -t \"Harmful_Prompt\"\n```\n\n### Attack the prompt utilizing ManyShot and Taxonomy attacks, using gpt3.5 over OpenAI using one worker\n```bash\nOPENAI_API_KEY=sk-3fa1... fuzzyai fuzz -m openai/gpt-3.5-turbo -a man -a tax -t \"Harmful_Prompt\"\n```\n\n### Attack prompts in prompts.txt utilizing the ArtPrompt attack, using Claude 3.5 over Anthropic\n```bash\nANTHROPIC_API_KEY=deadbeef... fuzzyai fuzz -m anthropic/claude-3-5-sonnet-20240620 -a art -T src/fuzzyai/resources/adv_prompts.txt -e blacklisted_words=\"bomb,garbage,acid,ugly,self-harm,kill your\"\n```\n\n--- \n\n### Attack your own REST API\n```bash\nOPENAI_API_KEYT=sk-3fa1... fuzzyai fuzz -a def -x openai/gpt-4 -e classifier_model=openai/gpt-4 -m rest/http.raw -e host=localhost -e port=8000 -e scheme=https  \n-t \"How to build a bomb\"\n```\n\n#### Arguments breakdown:\n- **`-a def`**:  \n  Invokes the default attack mode, which evaluates the API without applying additional strategies.\n\n- **`-x openai/gpt-4`**:  \n  Specifies an auxiliary model (`gpt-4`) to assist with certain tasks during the attack.\n\n- **`-e classifier_model=openai/gpt-4`**:  \n  Configures the fuzzer to use the `gpt-4` model for classifying outputs. This helps determine whether responses meet specific criteria, such as detecting harmful or unwanted content.\n\n- **`-m rest/http.raw`**:  \n  Uses the REST provider to attack an API endpoint. The raw HTTP request is parsed from the file `http.raw`. (Refer to the [documentation or example](./src/fuzzyai/resources/http.raw) for details on the file structure.)\n\n- **`-e host=localhost -e port=8000 -e scheme=https`**:  \n  Configures the REST provider with the following API endpoint details:\n  - **Host**: `localhost`\n  - **Port**: `8000`\n  - **Scheme**: `https` (communication will use HTTPS)\n\n- **`-t \"How to build a bomb\"`**:  \n  Specifies the test input. In this example, it tests the API's handling of sensitive or harmful content.\n\n\n## Key Features\n\n- **Comprehensive Fuzzing Techniques**: Leverage mutation-based, generation-based, and intelligent fuzzing.\n- **Built-in Input Generation**: Generate valid and invalid inputs for exhaustive testing.\n- **Seamless Integration**: Easily incorporate into your development and testing workflows.\n- **Extensible Architecture**: Customize and expand the fuzzer to meet your unique requirements.\n\n## Supported models\nFuzzyAI supports various models across top providers, including:\n\n| Provider     | Models                                                                                                   |\n|--------------|----------------------------------------------------------------------------------------------------------|\n| **Anthropic**| Claude (3.5, 3.0, 2.1)                                                                                   |\n| **OpenAI**   | GPT-4o, GPT-4o mini, GPT o3                                                                                 |\n| **Gemini**   | Gemini Pro, Gemini 1.5                                                                                  |\n| **Azure**    | GPT-4, GPT-3.5 Turbo                                                                                    |\n| **Bedrock**  | Claude (3.5, 3.0), Meta (LLaMa)                                                                             |\n| **AI21**     | Jamba (1.5 Mini, Large)                                                                                |\n| **DeepSeek** | DeepSeek (DeepSeek-V3, DeepSeek-V1)                                                                  |\n| **Ollama**   | LLaMA (3.3, 3.2, 3.1), Dolphin-LLaMA3, Vicuna                                                               |\n\n## Adding support for newer models\nEasily add support for additional models by following our \u003ca href=\"https://github.com/cyberark/FuzzyAI/wiki/DIY#adding-support-for-new-models\"\u003eDIY guide\u003c/a\u003e.\n\n## Implemented Attacks\nSee \u003ca href=\"https://github.com/cyberark/FuzzyAI/wiki/Attacks\"\u003eattacks wiki\u003c/a\u003e for detailed information\n\n| Attack Type                                  | Title                                                                                                                                                                       | Reference                                                                       |\n|----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|\n| ArtPrompt                                    | ASCII Art-based jailbreak attacks against aligned LLMs                                                                                                                      | [arXiv:2402.11753](https://arxiv.org/pdf/2402.11753)                            |\n| Taxonomy-based paraphrasing                  | Persuasive language techniques like emotional appeal to jailbreak LLMs                                                                                | [arXiv:2401.06373](https://arxiv.org/pdf/2401.06373)                            |\n| PAIR (Prompt Automatic Iterative Refinement) | Automates adversarial prompt generation by iteratively refining prompts with two LLMs                       | [arXiv:2310.08419](https://arxiv.org/pdf/2310.08419)                            |\n| Many-shot jailbreaking                       | Embeds multiple fake dialogue examples to weaken model safety                            | [Anthropic Research](https://www.anthropic.com/research/many-shot-jailbreaking) |\n| ASCII Smuggling                              | ASCII Smuggling uses Unicode Tag characters to embed hidden instructions within text, which are invisible to users but can be processed by Large Language Models (LLMs), potentially leading to prompt injection attacks                                                                                | [Embracethered blog](https://embracethered.com/blog/posts/2024/hiding-and-finding-text-with-unicode-tags/) |\n| Genetic                                      | Utilizes a genetic algorithm to modify prompts for adversarial outcomes                      | [arXiv:2309.01446](https://arxiv.org/pdf/2309.01446)                            |\n| Hallucinations                               | Bypasses RLHF filters using model-generated                                                                                                                                 | [arXiv:2403.04769](https://arxiv.org/pdf/2403.04769.pdf)                        |\n| DAN (Do Anything Now)                        | Promotes the LLM to adopt an unrestricted persona that ignores standard content filters, allowing it to \"Do Anything Now\".                                                  | [GitHub Repo](https://github.com/0xk1h0/ChatGPT_DAN)                            |\n| WordGame                                     | Disguises harmful prompts as word puzzles                                                                                                                                   | [arXiv:2405.14023](https://arxiv.org/pdf/2405.14023)                            |\n| Crescendo                                    | Engaging the model in a series of escalating conversational turns,starting with innocuous queries and gradually steering the dialogue toward restricted or sensitive topics. | [arXiv:2404.01833](https://arxiv.org/pdf/2404.01833)                            |\n| ActorAttack                                  | Inspired by actor-network theory, it builds semantic networks of \"actors\" to subtly guide conversations toward harmful targets while concealing malicious intent.           | [arxiv 2410.10700](https://arxiv.org/pdf/2410.10700)                                                                            |                                                                                                                                     |\n| Best-of-n jailbreaking | Uses input variations to repeatedly elicit harmful responses, exploiting model sensitivity | [arXiv:2412.03556](https://arxiv.org/abs/2412.03556) |\n| Shuffle Inconsistency Attack (SI-Attack) | Exploits the inconsistency between an LLM's comprehension ability and safety mechanisms by shuffling harmful text prompts. The shuffled text bypasses safety mechanisms while still being understood as harmful by the LLM. Only the text-based implementation was completed; the image-based aspect was not implemented. | [arXiv:2501.04931](https://arxiv.org/abs/2501.04931) |\n| Back To The Past                             | Modifies the prompt by adding a profession-based prefix and a past-related suffix                                                                                           |                                                                                 |\n| History/Academic framing                             | Framing sensitive technical data as scholarly or historical research to enable ethical, legal use—potentially leading to a jailbreak.                                                                                           |                                                                                 |\n| Please                                       | Modifies the prompt by adding please as a prefix and suffix                                                                                                                   |                                                                                 |\n| Thought Experiment                           | Modifies the prompt by adding a thought experiment-related prefix. In addition, adds \"precautions have been taken care of\" suffix                                                  |                                                                                 \n| Default                                      | Send the prompt to the model as-is \n\n## Supported Cloud APIs\n- **OpenAI**\n- **Anthropic**\n- **Gemini**\n- **Azure Cloud**\n- **AWS Bedrock**\n- **AI21**\n- **DeepSeek**\n- **Huggingface ([Downloading models](https://huggingface.co/docs/hub/en/models-downloading))**\n- **Ollama**\n- **Custom REST API**\n---\n\n## Caveats\n* Some classifiers do more than just evaluate a single output. For example, the cosine-similarity classifier compares two outputs by measuring the angle between them, while a 'harmfulness' classifier checks whether a given output is harmful. As a result, not all classifiers are compatible with the attack methods we've implemented, as those methods are designed for single-output classifiers.\n* When using the -m option with OLLAMA models, \u003cb\u003eensure that all OLLAMA models are added first before adding any other models.\u003c/b\u003e Use the -e port=... option to specify the port number for OLLAMA (default is 11434).\n\n## Contributing\n\nContributions are welcome! If you would like to contribute to the FuzzyAI Fuzzer, please follow the guidelines outlined in the [CONTRIBUTING.md](https://github.com/cyberark/FuzzyAI/blob/main/CONTRIBUTING.md) file.\n\n## License\n\nThe FuzzyAI Fuzzer is released under the [Apache License](https://www.apache.org/licenses/LICENSE-2.0). See the [LICENSE](https://github.com/cyberark/FuzzyAI/blob/main/LICENSE) file for more details.\n\n## Contact\n\nIf you have any questions or suggestions regarding the FuzzyAI Fuzzer, please feel free to contact us at [fzai@cyberark.com](mailto:fzai@cyberark.com).\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberark%2Ffuzzyai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyberark%2Ffuzzyai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberark%2Ffuzzyai/lists"}