https://github.com/promptfoo/promptfoo
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
https://github.com/promptfoo/promptfoo
ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation llm-evaluation-framework llmops pentesting prompt-engineering prompt-testing prompts rag red-teaming testing vulnerability-scanners
Last synced: about 2 months ago
JSON representation
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
- Host: GitHub
- URL: https://github.com/promptfoo/promptfoo
- Owner: promptfoo
- License: mit
- Created: 2023-04-28T15:48:49.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-08T00:53:33.000Z (about 1 year ago)
- Last Synced: 2025-03-08T01:24:43.599Z (about 1 year ago)
- Topics: ci, ci-cd, cicd, evaluation, evaluation-framework, llm, llm-eval, llm-evaluation, llm-evaluation-framework, llmops, pentesting, prompt-engineering, prompt-testing, prompts, rag, red-teaming, testing, vulnerability-scanners
- Language: TypeScript
- Homepage: https://promptfoo.dev
- Size: 308 MB
- Stars: 5,756
- Watchers: 21
- Forks: 475
- Open Issues: 188
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- jimsghstars - promptfoo/promptfoo - Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command (TypeScript)
- awesome-ChatGPT-repositories - promptfoo - Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/Mixtral/Llama with CI/CD (Prompts)
- awesome-llm-tools - Promptfoo
- awesome-llm-services - Promptfoo
- StarryDivineSky - promptfoo/promptfoo
- awesome-mistral - Promptfoo - teaming. (Tooling & Dev Experience / Development Tools)
- awesome-LLM-security - PromptFoo - A security testing framework for comprehensive red teaming, pentesting, and vulnerability scanning of LLMs. (βοΈ LLM And GenAI Security Testing Tools)
- awesome-langchain - Promptfoo
- Awesome-AI-Security - promptfoo
- AiTreasureBox - promptfoo/promptfoo - 03-23_18280_121](https://img.shields.io/github/stars/promptfoo/promptfoo.svg)|Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.| (Repos)
- awesome-ai-ml-testing - promptfoo - Testing and evaluation framework for LLM prompts. (π€ LLM & Chatbot Testing)
- Awesome-Jailbreak-on-LLMs - link
- awesome-MLSecOps - Promptfoo Scanner - source LLM red teaming tool | (Open Source Security Tools)
- awesome-safety-critical-ai - `promptfoo/promptfoo` - friendly local tool for testing LLM applications (<a id="tools"></a>π οΈ Tools / Bleeding Edge βοΈ)
- Awesome-Prompt-Engineering - [Github
- awesome-langchain-zh - Promptfoo
- AwesomeResponsibleAI - Promptfoo
- awesome-ai-coding-tools - Promptfoo - source tool for testing, evaluating, and red-teaming LLM prompts and applications. (AI Frameworks and SDKs)
- awesome-production-machine-learning - Promptfoo - Promptfoo is a developer-friendly local tool for testing LLM applications. (Evaluation and Monitoring)
- Awesome-LLM-RAG-Application - promptfoo
- Awesome-LLM4Security - promptfoo
- awesome-machine-learning - promptfoo - Open-source LLM evaluation and red teaming framework. Test prompts, models, agents, and RAG pipelines. Run adversarial attacks (jailbreaks, prompt injection) and integrate security testing into CI/CD. (Tools / General-Purpose Machine Learning)
- Awesome-AI-Evaluation-Guide - Promptfoo - CLI tool for prompt testing with cost tracking and regression detection (Tools & Platforms / Open Source Frameworks)
- llmops - PromptFoo - Test and evaluate LLM outputs (What's New / π Recently Added (January 2026))
- awesome-testing - promptfoo - Open-source framework for testing and red teaming LLM applications. Compare prompts, test RAG architectures, run multi-turn adversarial attacks, and catch security vulnerabilities with CI/CD integration. (Software / AI & LLM Testing)
- awesome-prompt-engineering - PromptFoo - Test and evaluate LLM outputs (Tools & Frameworks / Prompt Testing & Optimization)
- awesome-learning - Promptfoo - Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs.
- fucking-awesome-machine-learning - promptfoo - Open-source LLM evaluation and red teaming framework. Test prompts, models, agents, and RAG pipelines. Run adversarial attacks (jailbreaks, prompt injection) and integrate security testing into CI/CD. (Tools / General-Purpose Machine Learning)
- awesome-opensource-ai - Promptfoo - LLM testing and red-teaming framework. (π Contents / π 8. MLOps / LLMOps & Production)
- awesome - promptfoo/promptfoo - Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic. (TypeScript)
- awesome-ai-cybersecurity - promptfoo - Open-source LLM red teaming and vulnerability scanner. 100+ attack types, 250k+ users. (Securing AI SaaS / Application Security)
- awesome-rainmana - promptfoo/promptfoo - Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and (TypeScript)
- awesome-ai - Promptfoo
- awesome-production-llm - promptfoo
- awesome-ai-eval - **Promptfoo** - Local-first CLI and dashboard for evaluating prompts, RAG flows, and agents with cost tracking and regression detection. (Tools / Evaluators and Test Harnesses)
- awesome-llm-tools - PromptFoo - teaming prompts | Evaluation | (3. Prompt Optimization / Rust)
- awesome-gpt-security - promptfoo - LLM red teaming and evaluation framework. Includes modelaudit for scanning ML models for malicious code, backdoors, and serialization attacks. CI/CD integration (GPT Security / Standard)
- awesome-production-agentic-systems - promptfoo - promptfoo is an LLM red teaming and evaluation framework for testing jailbreaks, prompt injection, and vulnerabilities with adversarial attacks and CI/CD integration. (Agent Security)
- awesome-agentic-ai - Promptfoo - Compare prompts, models, and configurations with reproducible tests. (Evaluation, Observability & Safety / Evaluation & Observability)
- awesome-hacking-lists - promptfoo/promptfoo - Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command (TypeScript)
- awesome-agent-cortex - Promptfoo - Testing and evaluation framework for LLM prompts. (Prompt Engineering / Codex Resources)
- awesome-llm - Promptfoo - εΌεθ εε₯½η LLM ζ΅θ―ε·₯ε ·οΌη¨δΊθ―δΌ° Prompt 质ιε樑εθΎεΊοΌι²ζ’εε½γ (ζη€Ίε·₯η¨δΈδΌε (Prompt Engineering) / ζ¨ηη½ε ³ (Inference Gateways))
- Awesome-AI-For-Security - promptfoo - Open-source LLM red teaming tool for finding and fixing vulnerabilities. 100+ attack types, 250k+ users. (Tools & Frameworks / Security Testing)
- awesome-ai-offensive-security - Promptfoo - A developer-first framework for AI red teaming and evaluations with flexible configuration and Python integration. (AI Red Teaming (Testing AI Targets))
- awesome-harness-engineering - promptfoo/promptfoo
- awesome-llmops - PromptFoo
- awesome-ai-security - promptfoo - _Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration._ (Attack Techniques & Red Teaming / LLM & GenAI Red Teaming)
- awesome - promptfoo/promptfoo - Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and (TypeScript)
- my-awesome - promptfoo/promptfoo - cd,cicd,evaluation,evaluation-framework,llm,llm-eval,llm-evaluation,llm-evaluation-framework,llmops,pentesting,prompt-engineering,prompt-testing,prompts,rag,red-teaming,testing,vulnerability-scanners pushed_at:2026-04 star:18.9k fork:1.6k Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic. (TypeScript)
- awesome - promptfoo/promptfoo - Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic. (TypeScript)
README
# Promptfoo: LLM evals & red teaming
[](https://npmjs.com/package/promptfoo)
[](https://npmjs.com/package/promptfoo)
[](https://github.com/promptfoo/promptfoo/actions/workflows/main.yml)

[](https://discord.gg/promptfoo)
`promptfoo` is a developer-friendly local tool for testing LLM applications. Stop the trial-and-error approach - start shipping secure, reliable AI apps.
## Quick Start
```sh
# Install and initialize project
npx promptfoo@latest init
# Run your first evaluation
npx promptfoo eval
```
See [Getting Started](https://www.promptfoo.dev/docs/getting-started/) (evals) or [Red Teaming](https://www.promptfoo.dev/docs/red-team/) (vulnerability scanning) for more.
## What can you do with Promptfoo?
- **Test your prompts and models** with [automated evaluations](https://www.promptfoo.dev/docs/getting-started/)
- **Secure your LLM apps** with [red teaming](https://www.promptfoo.dev/docs/red-team/) and vulnerability scanning
- **Compare models** side-by-side (OpenAI, Anthropic, Azure, Bedrock, Ollama, and [more](https://www.promptfoo.dev/docs/providers/))
- **Automate checks** in [CI/CD](https://www.promptfoo.dev/docs/integrations/ci-cd/)
- **Share results** with your team
Here's what it looks like in action:

It works on the command line too:

It also can generate [security vulnerability reports](https://www.promptfoo.dev/docs/red-team/):

## Why promptfoo?
- π **Developer-first**: Fast, with features like live reload and caching
- π **Private**: Runs 100% locally - your prompts never leave your machine
- π§ **Flexible**: Works with any LLM API or programming language
- πͺ **Battle-tested**: Powers LLM apps serving 10M+ users in production
- π **Data-driven**: Make decisions based on metrics, not gut feel
- π€ **Open source**: MIT licensed, with an active community
## Learn More
- π [Full Documentation](https://www.promptfoo.dev/docs/intro/)
- π [Red Teaming Guide](https://www.promptfoo.dev/docs/red-team/)
- π― [Getting Started](https://www.promptfoo.dev/docs/getting-started/)
- π» [CLI Usage](https://www.promptfoo.dev/docs/usage/command-line/)
- π¦ [Node.js Package](https://www.promptfoo.dev/docs/usage/node-package/)
- π€ [Supported Models](https://www.promptfoo.dev/docs/providers/)
## Contributing
We welcome contributions! Check out our [contributing guide](https://www.promptfoo.dev/docs/contributing/) to get started.
Join our [Discord community](https://discord.gg/promptfoo) for help and discussion.