{"id":37231322,"url":"https://github.com/kmcallorum/prompt-optimizer","last_synced_at":"2026-01-15T03:42:55.913Z","repository":{"id":332090142,"uuid":"1132721262","full_name":"kmcallorum/prompt-optimizer","owner":"kmcallorum","description":"A CLI tool and Python library for optimizing LLM prompts through systematic testing, version control, and performance metrics. Think pytest for prompts.","archived":false,"fork":false,"pushed_at":"2026-01-12T16:17:06.000Z","size":127,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-12T19:13:37.040Z","etag":null,"topics":["anthropic","cli","llm","openai","optimization","prompt-engineering","python","testing"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kmcallorum.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-01-12T11:06:35.000Z","updated_at":"2026-01-12T16:17:09.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/kmcallorum/prompt-optimizer","commit_stats":null,"previous_names":["kmcallorum/prompt-optimizer"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/kmcallorum/prompt-optimizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmcallorum%2Fprompt-optimizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmcallorum%2Fprompt-optimizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmcallorum%2Fprompt-optimizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmcallorum%2Fprompt-optimizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kmcallorum","download_url":"https://codeload.github.com/kmcallorum/prompt-optimizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kmcallorum%2Fprompt-optimizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28442322,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-15T00:55:22.719Z","status":"online","status_checked_at":"2026-01-15T02:00:08.019Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","cli","llm","openai","optimization","prompt-engineering","python","testing"],"created_at":"2026-01-15T03:42:55.360Z","updated_at":"2026-01-15T03:42:55.902Z","avatar_url":"https://github.com/kmcallorum.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# prompt-optimizer-cli\n\n[![PyPI](https://img.shields.io/pypi/v/prompt-optimizer-cli.svg)](https://pypi.org/project/prompt-optimizer-cli/)\n[![CI](https://github.com/kmcallorum/prompt-optimizer/actions/workflows/ci.yml/badge.svg)](https://github.com/kmcallorum/prompt-optimizer/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/kmcallorum/prompt-optimizer/graph/badge.svg)](https://codecov.io/gh/kmcallorum/prompt-optimizer)\n[![Snyk Security](https://snyk.io/test/github/kmcallorum/prompt-optimizer/badge.svg)](https://snyk.io/test/github/kmcallorum/prompt-optimizer)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n[![Type Checked](https://img.shields.io/badge/type%20checked-mypy-blue.svg)](https://mypy-lang.org/)\n[![pytest-agents](https://img.shields.io/badge/pytest--agents-enabled-brightgreen.svg)](https://github.com/kmcallorum/prompt-optimizer)\n\nA CLI tool and Python library for optimizing LLM prompts through systematic testing, version control, and performance metrics. Think \"pytest for prompts\" - test multiple prompt variations, measure quality, and automatically select the best performer.\n\n## Features\n\n- **Prompt Testing**: Run multiple prompt variations against test cases\n- **Quality Metrics**: Score outputs on accuracy, conciseness, tone, and cost\n- **LLM-as-Judge**: AI-powered evaluation using any LLM as a judge\n- **Prometheus Metrics**: Built-in observability with Prometheus metrics\n- **Version Control**: Track prompt evolution with history and diffs\n- **Auto-Selection**: Identify and select the best-performing prompt variant\n- **CLI \u0026 Library**: Use as a command-line tool or Python import\n- **Multi-LLM Support**: Works with Anthropic Claude, OpenAI GPT, and local Ollama models\n\n## Quick Start\n\n```bash\n# Install from PyPI\npip install prompt-optimizer-cli\n\n# Initialize a project\nprompt-optimizer init\n\n# Optimize a prompt\nprompt-optimizer optimize prompts/example.yaml \\\n    --test-cases tests/example_tests.yaml \\\n    --strategies concise,detailed \\\n    --llm claude-sonnet-4 \\\n    --output results.json\n```\n\n## Installation\n\n### From PyPI\n\n```bash\npip install prompt-optimizer-cli\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/kmcallorum/prompt-optimizer.git\ncd prompt-optimizer\npip install -e .\n```\n\n### With Development Dependencies\n\n```bash\npip install -e \".[dev]\"\n```\n\n### Using Docker\n\n```bash\ndocker-compose build\ndocker-compose run prompt-optimizer --help\n```\n\n## Usage\n\n### CLI Commands\n\n```bash\n# Initialize new project with example files\nprompt-optimizer init\n\n# Test a prompt against test cases\nprompt-optimizer test prompt.yaml --test-cases tests.yaml --llm claude-sonnet-4\n\n# Optimize with multiple strategies\nprompt-optimizer optimize prompt.yaml \\\n    --strategies concise,detailed,cot \\\n    --test-cases tests.yaml \\\n    --llm claude-sonnet-4 \\\n    --output results.json\n\n# Use LLM-as-judge for AI-powered evaluation\nprompt-optimizer optimize prompt.yaml \\\n    --test-cases tests.yaml \\\n    --llm claude-sonnet-4 \\\n    --judge gpt-4o \\\n    --output results.json\n\n# Compare two prompts\nprompt-optimizer compare prompt1.yaml prompt2.yaml --test-cases tests.yaml\n\n# View prompt history\nprompt-optimizer history my-prompt\n\n# Generate report from results\nprompt-optimizer report results.json --format html --output report.html\n\n# Display a prompt file\nprompt-optimizer show prompt.yaml\n```\n\n### Python Library\n\n```python\nfrom prompt_optimizer import Prompt, TestCase, optimize_prompt\n\n# Define a prompt\nprompt = Prompt(\n    template=\"Summarize this text in {{ length }}: {{ text }}\",\n    variables={\"length\": \"one sentence\", \"text\": \"\"},\n    system_message=\"You are a helpful summarization assistant.\",\n    name=\"summarizer\",\n)\n\n# Define test cases\ntest_cases = [\n    TestCase(\n        input_variables={\n            \"text\": \"Long article text here...\",\n            \"length\": \"one sentence\"\n        },\n        expected_properties={\"length\": \"\u003c30 words\"}\n    )\n]\n\n# Run optimization\nresults = optimize_prompt(\n    prompt,\n    test_cases,\n    strategies=[\"concise\", \"detailed\"],\n    llm=\"claude-sonnet-4\"\n)\n\nprint(f\"Best variant: {results.best_variant.strategy}\")\nprint(f\"Score: {results.best_weighted_score:.2%}\")\n```\n\n## File Formats\n\n### Prompt File (YAML)\n\n```yaml\ntemplate: |\n  Answer the following question: {{ question }}\n\n  Requirements:\n  - Be concise\n  - Be accurate\n\nsystem_message: \"You are a helpful AI assistant.\"\n\nvariables:\n  question: \"\"\n\nmetadata:\n  author: \"developer\"\n  version: \"1.0\"\n  tags: [\"qa\", \"concise\"]\n```\n\n### Test Cases (YAML)\n\n```yaml\nname: \"QA Test Suite\"\n\ntest_cases:\n  - input_variables:\n      question: \"What is the capital of France?\"\n    expected_output: \"Paris\"\n    expected_properties:\n      tone: \"neutral\"\n      length: \"\u003c20 words\"\n\n  - input_variables:\n      question: \"Explain quantum computing\"\n    expected_properties:\n      length: \"50-150 words\"\n      includes: [\"qubits\", \"superposition\"]\n```\n\n## Supported LLMs\n\n| Provider | Models | Environment Variable |\n|----------|--------|---------------------|\n| Anthropic | claude-sonnet-4, claude-opus-4 | `ANTHROPIC_API_KEY` |\n| OpenAI | gpt-4o, gpt-4-turbo, gpt-3.5-turbo | `OPENAI_API_KEY` |\n| Ollama | llama3, mistral, etc. | N/A (local) |\n\nSpecify the LLM with the `--llm` flag:\n\n```bash\nprompt-optimizer optimize prompt.yaml --llm claude-sonnet-4\nprompt-optimizer optimize prompt.yaml --llm gpt-4o\nprompt-optimizer optimize prompt.yaml --llm ollama:llama3\n```\n\n## Optimization Strategies\n\n| Strategy | Description |\n|----------|-------------|\n| `concise` | Makes responses shorter and more direct |\n| `detailed` | Adds context and thorough explanations |\n| `cot` | Adds chain-of-thought reasoning |\n| `structured` | Formats output with sections and bullet points |\n| `few_shot` | Adds example-based prompting |\n\n## Evaluation Criteria\n\nBuilt-in scoring functions:\n\n- **accuracy**: Compares output to expected result using sequence matching\n- **conciseness**: Scores based on word count and length constraints\n- **includes**: Checks for required keywords in response\n\nCustom evaluators can be added:\n\n```python\nfrom prompt_optimizer.evaluator import EVALUATORS\n\ndef custom_scorer(response: str, test_case: TestCase) -\u003e float:\n    # Your scoring logic\n    return 0.8\n\nEVALUATORS[\"custom\"] = custom_scorer\n```\n\n## LLM-as-Judge\n\nUse an LLM to evaluate response quality instead of rule-based scoring:\n\n```bash\n# Use GPT-4 as judge while testing with Claude\nprompt-optimizer optimize prompt.yaml \\\n    --test-cases tests.yaml \\\n    --llm claude-sonnet-4 \\\n    --judge gpt-4o\n```\n\n```python\nfrom prompt_optimizer import optimize_prompt, Prompt, TestCase\n\nresults = optimize_prompt(\n    prompt=my_prompt,\n    test_cases=test_cases,\n    llm=\"claude-sonnet-4\",\n    judge_llm=\"gpt-4o\",  # AI-based evaluation\n)\n```\n\nThe LLM judge evaluates responses on:\n- **accuracy** - How well the response matches expected output\n- **relevance** - How on-topic the response is\n- **coherence** - How well-structured and logical the response is\n- **completeness** - Whether all aspects of the prompt are addressed\n- **conciseness** - Whether the response is appropriately brief\n\n## Prometheus Metrics\n\nBuilt-in observability for production deployments:\n\n```bash\n# Start metrics server\nprompt-optimizer metrics --port 8000\n\n# Metrics available at http://localhost:8000/metrics\n```\n\n```python\nfrom prompt_optimizer import init_metrics, start_http_server\n\n# Initialize and start metrics server\ninit_metrics()\nstart_http_server(8000)\n\n# Run optimizations - metrics are automatically recorded\nresults = optimize_prompt(...)\n```\n\nAvailable metrics:\n- `prompt_optimizer_optimizations_total` - Total optimization runs\n- `prompt_optimizer_optimization_duration_seconds` - Optimization duration\n- `prompt_optimizer_variants_evaluated_total` - Variants evaluated\n- `prompt_optimizer_test_cases_run_total` - Test cases run\n- `prompt_optimizer_llm_requests_total` - LLM API requests\n- `prompt_optimizer_llm_tokens_total` - Tokens used (input/output)\n- `prompt_optimizer_llm_cost_usd_total` - Total cost in USD\n- `prompt_optimizer_best_variant_score` - Best variant score\n\n## Configuration\n\nEnvironment variables:\n\n```bash\nexport ANTHROPIC_API_KEY=your-api-key\nexport OPENAI_API_KEY=your-api-key\n```\n\n## Development\n\n```bash\n# Install dev dependencies\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Run with coverage\npytest --cov=src/prompt_optimizer --cov-report=html\n\n# Lint\nruff check src tests\n\n# Type check\nmypy src\n```\n\n## Project Structure\n\n```\nprompt-optimizer/\n├── src/prompt_optimizer/\n│   ├── __init__.py\n│   ├── cli.py              # Click-based CLI\n│   ├── core.py             # Core optimization logic\n│   ├── prompt.py           # Prompt models\n│   ├── evaluator.py        # Scoring functions\n│   ├── storage.py          # Version control\n│   ├── reporters.py        # Result reporting\n│   └── llm_clients/        # LLM integrations\n├── tests/\n├── examples/\n├── Dockerfile\n└── docker-compose.yml\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkmcallorum%2Fprompt-optimizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkmcallorum%2Fprompt-optimizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkmcallorum%2Fprompt-optimizer/lists"}