{"id":30957293,"url":"https://github.com/dannylee1020/openpo","last_synced_at":"2025-09-11T13:45:22.990Z","repository":{"id":262553078,"uuid":"879973759","full_name":"dannylee1020/openpo","owner":"dannylee1020","description":"Building synthetic data for preference tuning","archived":false,"fork":false,"pushed_at":"2024-12-26T23:02:32.000Z","size":11253,"stargazers_count":27,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-10T07:40:58.706Z","etag":null,"topics":["ai","ai-feedback","dpo","evaluation","finetuning","huggingface","llm","llm-evaluation","python","rlaif","rlhf","synthetic-data","synthetic-data-generation"],"latest_commit_sha":null,"homepage":"https://docs.openpo.dev","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dannylee1020.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-28T22:09:54.000Z","updated_at":"2025-04-20T06:04:36.000Z","dependencies_parsed_at":"2024-12-18T17:34:38.020Z","dependency_job_id":"72bdeb33-5ca3-457c-af9d-50faad81b4c0","html_url":"https://github.com/dannylee1020/openpo","commit_stats":null,"previous_names":["dannylee1020/peony","dannylee1020/openpo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dannylee1020/openpo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dannylee1020%2Fopenpo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dannylee1020%2Fopenpo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dannylee1020%2Fopenpo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dannylee1020%2Fopenpo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dannylee1020","download_url":"https://codeload.github.com/dannylee1020/openpo/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dannylee1020%2Fopenpo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274648319,"owners_count":25324299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-11T02:00:13.660Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-feedback","dpo","evaluation","finetuning","huggingface","llm","llm-evaluation","python","rlaif","rlhf","synthetic-data","synthetic-data-generation"],"created_at":"2025-09-11T13:45:21.145Z","updated_at":"2025-09-11T13:45:22.943Z","avatar_url":"https://github.com/dannylee1020.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenPO 🐼\n[![PyPI version](https://img.shields.io/pypi/v/openpo.svg)](https://pypi.org/project/openpo/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Documentation](https://img.shields.io/badge/docs-docs.openpo.dev-blue)](https://docs.openpo.dev)\n\nOpenPO simplifies building synthetic dataset with AI feedback and state-of-art evaluation methods.\n\n| Resources | Notebooks |\n|----------|----------|\n| Building dataset with OpenPO and PairRM  |📔 [Notebook](https://colab.research.google.com/drive/1G1T-vOTXjIXuRX3h9OlqgnE04-6IpwIf?usp=sharing) |\n| Using OpenPO with Prometheus 2 | 📔 [Notebook](https://colab.research.google.com/drive/1dro0jX1MOfSg0srfjA_DZyeWIWKOuJn2?usp=sharing) |\n| Evaluating with LLM Judge| 📔 [Notebook](https://colab.research.google.com/drive/1_QrmejW2Ym8yzP5RLJbLpVNA_FsEt2ZG?usp=sharing) |\n| Building dataset using vLLM| 📔 [Notebook](https://colab.research.google.com/drive/1GKHpOv4jRaWhwSDKCEZpl_kIfIyHGs73?usp=sharing) |\n\n\n## Key Features\n\n- 🤖 **Multiple LLM Support**: Collect diverse set of outputs from 200+ LLMs\n\n- ⚡ **High Performance Inference**: Native vLLM support for optimized inference\n\n- 🚀 **Scalable Processing**: Built-in batch processing capabilities for efficient large-scale data generation\n\n- 📊 **Research-Backed Evaluation Methods**: Support for state-of-art evaluation methods for data synthesis\n\n- 💾 **Flexible Storage:** Out of the box storage providers for HuggingFace and S3.\n\n\n## Installation\n### Install from PyPI (recommended)\nOpenPO uses pip for installation. Run the following command in the terminal to install OpenPO:\n\n```bash\npip install openpo\n\n# to use vllm\npip install openpo[vllm]\n\n# for running evaluation models\npip install openpo[eval]\n```\n\n\n\n### Install from source\nClone the repository first then run the follow command\n```bash\ncd openpo\npoetry install\n```\n\n## Getting Started\nset your environment variable first\n\n```bash\n# for completions\nexport HF_API_KEY=\u003cyour-api-key\u003e\nexport OPENROUTER_API_KEY=\u003cyour-api-key\u003e\n\n# for evaluations\nexport OPENAI_API_KEY=\u003cyour-openai-api-key\u003e\nexport ANTHROPIC_API_KEY=\u003cyour-anthropic-api-key\u003e\n```\n\n### Completion\nTo get started with collecting LLM responses, simply pass in a list of model names of your choice\n\n\u003e [!NOTE]\n\u003e OpenPO requires provider name to be prepended to the model identifier.\n\n```python\nimport os\nfrom openpo import OpenPO\n\nclient = OpenPO()\n\nresponse = client.completion.generate(\n    models = [\n        \"huggingface/Qwen/Qwen2.5-Coder-32B-Instruct\",\n        \"huggingface/mistralai/Mistral-7B-Instruct-v0.3\",\n        \"huggingface/microsoft/Phi-3.5-mini-instruct\",\n    ],\n    messages=[\n        {\"role\": \"system\", \"content\": PROMPT},\n        {\"role\": \"system\", \"content\": MESSAGE},\n    ],\n)\n```\n\nYou can also call models with OpenRouter.\n\n```python\n# make request to OpenRouter\nclient = OpenPO()\n\nresponse = client.completion.generate(\n    models = [\n        \"openrouter/qwen/qwen-2.5-coder-32b-instruct\",\n        \"openrouter/mistralai/mistral-7b-instruct-v0.3\",\n        \"openrouter/microsoft/phi-3.5-mini-128k-instruct\",\n    ],\n    messages=[\n        {\"role\": \"system\", \"content\": PROMPT},\n        {\"role\": \"system\", \"content\": MESSAGE},\n    ],\n\n)\n```\n\nOpenPO takes default model parameters as a dictionary. Take a look at the documentation for more detail.\n\n```python\nresponse = client.completion.generate(\n    models = [\n        \"huggingface/Qwen/Qwen2.5-Coder-32B-Instruct\",\n        \"huggingface/mistralai/Mistral-7B-Instruct-v0.3\",\n        \"huggingface/microsoft/Phi-3.5-mini-instruct\",\n    ],\n    messages=[\n        {\"role\": \"system\", \"content\": PROMPT},\n        {\"role\": \"system\", \"content\": MESSAGE},\n    ],\n    params={\n        \"max_tokens\": 500,\n        \"temperature\": 1.0,\n    }\n)\n\n```\n\n### Evaluation\nOpenPO offers various ways to synthesize your dataset.\n\n\n#### LLM-as-a-Judge\nTo use single judge to evaluate your response data, use `evaluate.eval`\n\n```python\nclient = OpenPO()\n\nres = openpo.evaluate.eval(\n    models=['openai/gpt-4o'],\n    questions=questions,\n    responses=responses,\n)\n```\n\nTo use multi judge, pass multiple judge models\n\n```python\nres_a, res_b = openpo.evaluate.eval(\n    models=[\"openai/gpt-4o\", \"anthropic/claude-sonnet-3-5-latest\"],\n    questions=questions,\n    responses=responses,\n)\n\n# get consensus for multi judge responses.\nresult = openpo.evaluate.get_consensus(\n    eval_A=res_a,\n    eval_B=res_b,\n)\n```\n\u003cbr\u003e\n\nOpnePO supports batch processing for evaluating large dataset in a cost-effective way.\n\n\u003e [!NOTE]\n\u003e Batch processing is an asynchronous operation and could take up to 24 hours (usually completes much faster).\n\n```python\ninfo = openpo.batch.eval(\n    models=[\"openai/gpt-4o\", \"anthropic/claude-sonnet-3-5-latest\"],\n    questions=questions,\n    responses=responses,\n)\n\n# check status\nstatus = openpo.batch.check_status(batch_id=info.id)\n```\n\nFor multi-judge with batch processing:\n\n```python\nbatch_a, batch_b = openpo.batch.eval(\n    models=[\"openai/gpt-4o\", \"anthropic/claude-sonnet-3-5-latest\"],\n    questions=questions,\n    responses=responses,\n)\n\nresult = openpo.batch.get_consensus(\n    batch_A=batch_a_result,\n    batch_B=batch_b_result,\n)\n```\n\n\n#### Pre-trained Models\nYou can use pre-trained open source evaluation models. OpenPo currently supports two types of models: `PairRM` and `Prometheus2`.\n\n\u003e [!NOTE]\n\u003e Appropriate hardware with GPU and memory is required to make inference with pre-trained models.\n\nTo use PairRM to rank responses:\n\n```python\nfrom openpo import PairRM\n\npairrm = PairRM()\nres = pairrm.eval(prompts, responses)\n```\n\nTo use Prometheus2:\n\n```python\nfrom openpo import Prometheus2\n\npm = Prometheus2(model=\"prometheus-eval/prometheus-7b-v2.0\")\n\nfeedback = pm.eval_relative(\n    instructions=instructions,\n    responses_A=response_A,\n    responses_B=response_B,\n    rubric='reasoning',\n)\n```\n\n\n### Storing Data\nUse out of the box storage class to easily upload and download data.\n\n```python\nfrom openpo.storage import HuggingFaceStorage\nhf_storage = HuggingFaceStorage()\n\n# push data to repo\npreference = {\"prompt\": \"text\", \"preferred\": \"response1\", \"rejected\": \"response2\"}\nhf_storage.push_to_repo(repo_id=\"my-hf-repo\", data=preference)\n\n# Load data from repo\ndata = hf_storage.load_from_repo(path=\"my-hf-repo\")\n```\n\n\n## Contributing\nContributions are what makes open source amazingly special! Here's how you can help:\n\n### Development Setup\n1. Clone the repository\n```bash\ngit clone https://github.com/yourusername/openpo.git\ncd openpo\n```\n\n1. Install Poetry (dependency management tool)\n```bash\ncurl -sSL https://install.python-poetry.org | python3 -\n```\n\n1. Install dependencies\n```bash\npoetry install\n```\n\n### Development Workflow\n1. Create a new branch for your feature\n```bash\ngit checkout -b feature-name\n```\n\n2. Submit a Pull Request\n- Write a clear description of your changes\n- Reference any related issues\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdannylee1020%2Fopenpo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdannylee1020%2Fopenpo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdannylee1020%2Fopenpo/lists"}