{"id":13492376,"url":"https://github.com/meta-llama/PurpleLlama","last_synced_at":"2025-03-28T10:32:05.066Z","repository":{"id":211290309,"uuid":"728399986","full_name":"meta-llama/PurpleLlama","owner":"meta-llama","description":"Set of tools to assess and improve LLM security.","archived":false,"fork":false,"pushed_at":"2025-02-14T21:34:34.000Z","size":16412,"stargazers_count":2926,"open_issues_count":5,"forks_count":486,"subscribers_count":57,"default_branch":"main","last_synced_at":"2025-03-01T03:32:16.876Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/meta-llama.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-06T21:29:41.000Z","updated_at":"2025-02-28T18:28:25.000Z","dependencies_parsed_at":"2024-05-06T14:14:40.323Z","dependency_job_id":"aa1510fb-9d25-4e8b-8c01-f18586d5f2b2","html_url":"https://github.com/meta-llama/PurpleLlama","commit_stats":{"total_commits":278,"total_committers":32,"mean_commits":8.6875,"dds":0.8309352517985611,"last_synced_commit":"7aef978f57f6bfef6c3be498d3dc6578449f47b0"},"previous_names":["facebookresearch/purplellama","meta-llama/purplellama"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2FPurpleLlama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2FPurpleLlama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2FPurpleLlama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meta-llama%2FPurpleLlama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/meta-llama","download_url":"https://codeload.github.com/meta-llama/PurpleLlama/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245535504,"owners_count":20631297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T19:01:05.464Z","updated_at":"2025-03-28T10:32:00.043Z","avatar_url":"https://github.com/meta-llama.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","Building","Repos","Alignment \u0026 Training","Tools","LLM安全","Agent Security","Tools and Code","Attack Techniques \u0026 Red Teaming","LLM Guardrails / Security","Uncategorized","8. MLOps / LLMOps \u0026 Production"],"sub_categories":["大语言对话模型及数据","Tools","Guardrails \u0026 Output Safety","LLM Evaluations and Benchmarks","Red Teaming and Prompt Security","LLM \u0026 GenAI Red Teaming","Uncategorized"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/facebookresearch/PurpleLlama/blob/main/logo.png\" width=\"400\"/\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n        🤗 \u003ca href=\"https://huggingface.co/meta-Llama\"\u003e Models on Hugging Face\u003c/a\u003e\u0026nbsp | \u003ca href=\"https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai\"\u003e Blog\u003c/a\u003e\u0026nbsp |  \u003ca href=\"https://ai.meta.com/llama/purple-llama\"\u003eWebsite\u003c/a\u003e\u0026nbsp | \u003ca href=\"https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/\"\u003eCyberSec Eval Paper\u003c/a\u003e\u0026nbsp\u0026nbsp | \u003ca href=\"https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/\"\u003eLlama Guard Paper\u003c/a\u003e\u0026nbsp\n\u003cbr\u003e\n\n---\n\n# Purple Llama\n\nPurple Llama is an umbrella project that over time will bring together tools\nand evals to help the community build responsibly with open generative AI\nmodels. The initial release will include tools and evals for Cyber Security and\nInput/Output safeguards but we plan to contribute more in the near future.\n\n## Why purple?\n\nBorrowing a [concept](https://www.youtube.com/watch?v=ab_Fdp6FVDI) from the\ncybersecurity world, we believe that to truly mitigate the challenges which\ngenerative AI presents, we need to take both attack (red team) and defensive\n(blue team) postures. Purple teaming, composed of both red and blue team\nresponsibilities, is a collaborative approach to evaluating and mitigating\npotential risks and the same ethos applies to generative AI and hence our\ninvestment in Purple Llama will be comprehensive.\n\n## License\n\nComponents within the Purple Llama project will be licensed permissively enabling both research and commercial usage.\nWe believe this is a major step towards enabling community collaboration and standardizing the development and usage of trust and safety tools for generative AI development.\nMore concretely evals and benchmarks are licensed under the MIT license while any models use the corresponding Llama Community license. See the table below:\n\n| **Component Type** |            **Components**            |                                          **License**                                           |\n| :----------------- | :----------------------------------: | :--------------------------------------------------------------------------------------------: |\n| Evals/Benchmarks   | Cyber Security Eval (others to come) |                                              MIT                                               |\n| Safeguard             |             Llama Guard              | [Llama 2 Community License](https://github.com/facebookresearch/PurpleLlama/blob/main/LICENSE) |\n| Safeguard             |             Llama Guard 2            | [Llama 3 Community License](https://github.com/meta-llama/llama3/blob/main/LICENSE) |\n| Safeguard             |             Llama Guard 3-8B            | [Llama 3.2 Community License](LICENSE) |\n| Safeguard             |             Llama Guard 3-1B            | [Llama 3.2 Community License](LICENSE) |\n| Safeguard             |             Llama Guard 3-11B-vision            | [Llama 3.2 Community License](LICENSE) |\n| Safeguard             |             Prompt Guard            | [Llama 3.2 Community License](LICENSE) |\n| Safeguard          |             Code Shield              | MIT |\n\n\n## System-Level Safeguards\n\nAs we outlined in Llama 3’s\n[Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/), we\nrecommend that all inputs and outputs to the LLM be checked and filtered in\naccordance with content guidelines appropriate to the application.\n\n### Llama Guard\n\nLlama Guard 3 consists of a series of high-performance input and output moderation models designed to support developers to detect various common types of violating content.\n\nThey were built by fine-tuning Meta-Llama 3.1 and 3.2 models and optimized to support the detection of the MLCommons standard hazards taxonomy, catering to a range of developer use cases.\nThey support the release of Llama 3.2 capabilities, including 7 new languages, a 128k context window, and image reasoning. Llama Guard 3 models were also optimized to detect helpful cyberattack responses and prevent malicious code output by LLMs to be executed in hosting environments for Llama systems using code interpreters.\n\n\n### Prompt Guard\nPrompt Guard is a powerful tool for protecting LLM powered applications from malicious prompts to ensure their security and integrity.\n\nCategories of prompt attacks include prompt injection and jailbreaking:\n\n* Prompt Injections are inputs that exploit the inclusion of untrusted data from third parties into the context window of a model to get it to execute unintended instructions.\n* Jailbreaks are malicious instructions designed to override the safety and security features built into a model.\n\n### Code Shield\n\nCode Shield adds support for inference-time filtering of insecure code produced by LLMs. Code Shield offers mitigation of insecure code suggestions risk, code interpreter abuse prevention, and secure command execution. [CodeShield Example Notebook](https://github.com/meta-llama/PurpleLlama/blob/main/CodeShield/notebook/CodeShieldUsageDemo.ipynb).\n\n\n\n## Evals \u0026 Benchmarks\n\n### Cybersecurity\n\n#### CyberSec Eval v1\nCyberSec Eval v1 was what we believe was the first industry-wide set of cybersecurity safety evaluations for LLMs. These benchmarks are based on industry guidance and standards (e.g., CWE and MITRE ATT\u0026CK) and built in collaboration with our security subject matter experts. We aim to provide tools that will help address some risks outlined in the [White House commitments on developing responsible AI](https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/), including:\n* Metrics for quantifying LLM cybersecurity risks.\n* Tools to evaluate the frequency of insecure code suggestions.\n* Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyberattacks.\n\nWe believe these tools will reduce the frequency of LLMs suggesting insecure AI-generated code and reduce their helpfulness to cyber adversaries. Our initial results show that there are meaningful cybersecurity risks for LLMs, both with recommending insecure code and for complying with malicious requests. See our [Cybersec Eval paper](https://ai.meta.com/research/publications/purple-llama-cyberseceval-a-benchmark-for-evaluating-the-cybersecurity-risks-of-large-language-models/) for more details.\n\n#### CyberSec Eval 2\nCyberSec Eval 2 expands on its predecessor by measuring an LLM’s propensity to abuse a code interpreter, offensive cybersecurity capabilities, and susceptibility to prompt injection. You can read the paper [here](https://ai.meta.com/research/publications/cyberseceval-2-a-wide-ranging-cybersecurity-evaluation-suite-for-large-language-models/).\n\nYou can also check out the 🤗 leaderboard [here](https://huggingface.co/spaces/facebook/CyberSecEval).\n\n#### CyberSec Eval 3\nThe newly released CyberSec Eval 3 features three additional test suites: visual prompt injection tests, spear phishing capability tests, and autonomous offensive cyber operations tests.\n\n## Getting Started\n\nAs part of the [Llama reference system](https://github.com/meta-llama/llama-agentic-system), we’re integrating a safety layer to facilitate adoption and deployment of these safeguards.\nResources to get started with the safeguards are available in the [Llama-recipe GitHub repository](https://github.com/meta-llama/llama-recipes).\n\n## FAQ\n\nFor a running list of frequently asked questions, for not only Purple Llama\ncomponents but also generally for Llama models, see the FAQ\n[here](https://ai.meta.com/llama/faq/).\n\n## Join the Purple Llama community\n\nSee the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-llama%2FPurpleLlama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmeta-llama%2FPurpleLlama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeta-llama%2FPurpleLlama/lists"}