{"id":13772591,"url":"https://github.com/preset-io/promptimize","last_synced_at":"2025-05-15T05:00:15.127Z","repository":{"id":155182465,"uuid":"623548255","full_name":"preset-io/promptimize","owner":"preset-io","description":"Promptimize is a prompt engineering evaluation and testing toolkit.","archived":false,"fork":false,"pushed_at":"2025-03-31T19:53:38.000Z","size":131,"stargazers_count":456,"open_issues_count":17,"forks_count":34,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-04-14T10:42:33.226Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/preset-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-04T15:34:08.000Z","updated_at":"2025-04-14T06:58:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"62340f49-ed67-4c2b-b2a3-30b8d7eddb64","html_url":"https://github.com/preset-io/promptimize","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preset-io%2Fpromptimize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preset-io%2Fpromptimize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preset-io%2Fpromptimize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preset-io%2Fpromptimize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/preset-io","download_url":"https://codeload.github.com/preset-io/promptimize/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254276444,"owners_count":22043866,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T17:01:05.840Z","updated_at":"2025-05-15T05:00:14.833Z","avatar_url":"https://github.com/preset-io.png","language":"Python","funding_links":[],"categories":["🤖 LLM \u0026 Chatbot Testing","Testing Frameworks","Building","Python"],"sub_categories":["Prompt Engineering"],"readme":"# 💡 ¡promptimize! 💡\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![PyPI version](https://badge.fury.io/py/promptimize.svg)](https://badge.fury.io/py/promptimize)\n\n\u003cimg src=\"https://user-images.githubusercontent.com/487433/229948453-36cbc2d1-e71f-4e87-9111-ab428bc96f4c.png\" width=300/\u003e\n\nPromptimize is a prompt engineering **evaluation** and **testing** toolkit.\n\nIt accelerates and provides structure around prompt engineering at scale\nwith confidence, bringing some of the ideas behind test-driven\ndevelopment (TDD) to engineering prompts.\n\nWith promptimize, you can:\n\n- Define your \"prompt cases\" (think \"test cases\" but specific to evaluating\n  prompts) as code and associate them with evaluation functions\n- Generate prompt variations dynamically\n- Execute and rank prompts test suites across different\n  engines/models/temperature/settings and compare results, brining\n  the hyperparameter tuning mindset to prompt engineering\n- Get reports on your prompts' performance as you iterate. Answer question\n  around how different prompt suites are performing against one-another.\n  Which individual cases or categories of cases improved? regressed?\n- Minimize API calls! only re-assess what changed as you change it\n- Perform human if and where needed, introspected failed cases, overriding\n  false negatives\n\nIn essence, promptimize provides a programmatic way to execute and fine-tune\nyour prompts and evaluation functions in Python, allowing you to iterate\nquickly and with confidence.\n\n## Hello world - the simplest prompt examples\n[more examples on GitHub](https://github.com/preset-io/promptimize/tree/main/examples)\n```python\n# Brining some \"prompt generator\" classes - note that you can derive and extend those\nfrom promptimize.prompts import PromptCase\n\n# Bringing some useful eval function that help evaluating and scoring responses\n# eval functions have a handle on the prompt object and are expected\n# to return a score between 0 and 1\nfrom promptimize import evals\n\n# Promptimize will scan the target folder and find all Prompt objects\n# and derivatives that are in the python modules\nsimple_prompts = [\n\n    # Prompting \"hello there\" and making sure there's \"hi\" or \"hello\"\n    # somewhere in the answer\n    PromptCase(\"hello there!\", lambda x: evals.any_word(x, [\"hi\", \"hello\"])),\n    PromptCase(\n        \"name the top 50 guitar players!\", lambda x: evals.all_words(x, [\"frank zappa\"])\n    ),\n]\n```\n\n### The CLI's `run` command\n```\n$ promptimize run --help\nUsage: promptimize run [OPTIONS] PATH\n\n  run some prompts\n\nOptions:\n  -v, --verbose             Trigger more verbose output\n  -f, --force               Force run, do not skip\n  -h, --human               Human review, allowing a human to review and force\n                            pass/fail each prompt case\n  -r, --repair              Only re-run previously failed\n  -x, --dry-run             DRY run, don't call the API\n  --shuffle                 Shuffle the prompts in a random order\n  -s, --style [json|yaml]   json or yaml formatting\n  -m, --max-tokens INTEGER  max_tokens passed to the model\n  -l, --limit INTEGER       limit how many prompt cases to run in a single\n                            batch\n  -t, --temperature FLOAT   max_tokens passed to the model\n  -e, --engine TEXT         model as accepted by the openai API\n  -k, --key TEXT            The keys to run\n  -o, --output PATH\n  -s, --silent\n```\n\nLet's run those examples and produce a report `./report.yaml`\n```\n$ promptimize run examples/ --output ./report.yaml\n```\n\n```yaml\n💡 ¡promptimize! 💡\n# ----------------------------------------\n# (1/2) [RUN] prompt: prompt-115868ef\n# ----------------------------------------\nkey: prompt-115868ef\nuser_input: hello there!\nprompt_hash: 115868ef\nresponse: Hi there! How are you doing today?\nexecution:\n  api_call_duration_ms: 883.8047981262207\n  run_at: '2023-04-25T02:21:40.443077'\n  score: 1.0\n\n# ----------------------------------------\n# (2/2) [RUN] prompt: prompt-5c085656\n# ----------------------------------------\nkey: prompt-5c085656\nuser_input: name the top 10 guitar players!\nprompt_hash: 5c085656\nresponse: |-\n  1. Jimi Hendrix\n  2. Eric Clapton\n  {{ ... }}\n  11. Carlos Santana\nweight: 2\nexecution:\n  api_call_duration_ms: 2558.135747909546\n  run_at: '2023-04-25T02:21:43.007529'\n  score: 0.0\n\n# ----------------------------------------\n# Suite summary\n# ----------------------------------------\nsuite_score: 0.3333333333333333\ngit_info:\n  sha: 2cf28498ba0f\n  branch: main\n  dirty: true\n```\n\n## Problem + POV\n\nThousands of product builders are currently trying to figure out how to\nbring the power of AI into the products and experiences they are building.\nThe probabilistic (often semi-random, sometimes hectic) nature of LLMs\nmakes this a challenge.\n\nPrompt engineering is a huge piece of the puzzle in terms of how to do this\nright, especially given the complexity, risks, and drawbacks around\nmodel tuning.\n\nWe believe product builders need to tame AI through proper, rigorous\n**prompt engineering**. This allows making the probabilistic nature of\nAI more deterministic, or somewhat predictable, and allows builders to apply\na hyperparameter tuning-type mindset and approach to prompt engineering.\n\nAny prompt-generator logic that's going to be let loose in the wild inside\na product should be thoroughly tested and evaluated with \"prompt cases\" that\ncover the breath of what people may do in a product.\n\nIn short, Promptimize allows you to test prompts at industrial scale,\nso that you can confidently use them in the products you are building.\n\n## Information Architecture\n\n- **Prompt:** A Prompt instance is a certain test case, a single prompt\n  with an associated set of evaluation functions to rate its success.\n- **Evaluation:** An evaluation function that reads the response and returns\n  a success rate between `0` and `1`.\n- **Suite:** A Suite is a collection of Prompt; it's able to run things,\n  accumulate results, and print reports about its collection of use cases.\n- **Report**: a report is the compiled results of running a certain prompt\n  `Suite` or set of suites. Reports can be consumed, compared, and expanded.\n\n## Principles\n\n- **Configuration as code:** All prompt cases, suites, and evaluations are\n  defined as code, which makes it easy to dynamically generate all sorts\n  of use cases and suites.\n- **Expressive**: a clean DSL that's to-the-point -\u003e user prompt + assertions.\n  the actually prompt creation logic lives in the derivative class of `PromptCase`,\n  so that we can have clean, dense files that contain nice `Suite`s\n- **Support the iteration mindset:** making it easy for people to try things,\n  get suggestions from the AI, adapt, compare, and push forward\n- **Extensibility:** the toolkit is designed to be extremely hackable and\n  extensible. Hooks, extensions, high API surface.\n- **AI-powered:** the framework offers ways to expand your suites based\n  on the examples that exists. Use AI to generate more prompt cases!\n\n\n## Interesting features / facts\n\nListing out a few features you should know about that you can start using as your\nsuites of prompts become larger / more complex\n\n* evaluation functions are assumed to return a value between 0 and 1.\n  contrarily to unit tests, prompt cases aren't boolean\n* prompts can be assigned a `weight` (default 1) this enables you to define\n  which prompts are more important than others for reporting purposes and suite evaluation\n* prompts can be assigned a `category`, this can be used in the reporting.\n  That helps understanding which categories are performing better than\n  others, or are most affected by iterations\n* The `Prompt` class `pre_run` and `post_run` hooks if you want to do\n  post-processing for instance. An example of that would be if you do a prompt\n  that expects GPT to generate code, and you'd like actually say run that code\n  and test it. In our SQL implementation, we run the SQL against the database\n  for instance and get a pandas dataframe back, and allow doing assertions\n  on the dataframe itself\n\n\n\n## Getting started\n\nTo install the Promptimize package, use the following command:\n```bash\npip install promptimize\n```\n\nFirst you'll need an openai API key, let's set it as an env var\n```bash\nexport OPENAI_API_KEY=sk-{{ REDACTED }}\n```\n\nFind the examples executed below\n[here](https://github.com/preset-io/promptimize/blob/main/examples/readme_examples.py)\n\n```bash\n# Clone the repo\ngit clone git@github.com:preset-io/promptimize.git\ncd promptimize\n\n# NOTE: CLI is `promptimize`, but `p9e` is a shorter synonym, can be used interchangibly\n# First let's run some of the examples\np9e run ./examples\n\n# Now the same but with verbose output\np9e run ./examples --verbose --output ./report.yaml\n\n```\n## Langchain\n\nHow does promptimize relate to `langchain`?\n\nWe think langchain is amazing and promptimize uses langchain under the\nhood to interact with openai, and has integration with langchain\n(see `LangchainPromptCase`, and the upcoming `LangchainChainPromptCase`\nand `LangchainAgntPromptCase`)\nWhile you don't have to use\nlangchain, and could use promptimize on top of any python prompt generation\nwhether it'd be another library or some home grown thing.\n\n\n## Context\n\n\u003cimg src=\"https://user-images.githubusercontent.com/487433/230508578-456a7040-1184-433a-a555-dceb7c28c32c.png\" width=\"75\" title=\"Max\"/\u003e\n\nWhere is `promptimize` coming from!? I'm (Maxime Beauchemin) a startup\nfounder at \u003ca href=\"https://www.preset.io\"\u003ePreset\u003c/a\u003e working on brining AI to BI\n(data exploration,\nand visualization). At Preset, we use `promptimize` to generate\ncomplex SQL based on natural language, and to suggest charts to users. We\nderive the `SimpleQuery` class to make it fitted to our specific use\ncases in our own prompt engineering repo. It's not my first open source project\nas the creator of\n[Apache Superset](https://github.com/apache/superset/) and\n[Apache Airflow](https://github.com/apache/airflow/)\n\n\n## Contribute\n\nThis project is in its super early stages as of `0.2.0`, and contributions,\ncontributors, and maintainers are highly encouraged. While it's a great time\nto onboard and influence the direction of the project, things are still\nevolving quickly. To get involved, open a GitHub issue\nor submit a pull request!\n\n## Links\n* [Blog - Mastering AI-Powered Product Development: Introducing Promptimize for Test-Driven Prompt Engineering](https://preset.io/blog/introducing-promptimize/)\n* [Preset Blog](https://preset.io/blog/)\n* [Promptimize DOCS](https://preset-io.github.io/promptimize/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreset-io%2Fpromptimize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpreset-io%2Fpromptimize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreset-io%2Fpromptimize/lists"}