{"id":21931491,"url":"https://github.com/banditml/offline-policy-evaluation","last_synced_at":"2025-04-06T02:07:22.442Z","repository":{"id":37217346,"uuid":"246196271","full_name":"banditml/offline-policy-evaluation","owner":"banditml","description":"Implementations and examples of common offline policy evaluation methods in Python.","archived":false,"fork":false,"pushed_at":"2023-02-11T00:33:53.000Z","size":1231,"stargazers_count":222,"open_issues_count":9,"forks_count":25,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-30T01:07:43.663Z","etag":null,"topics":["counterfactual-learning","counterfactual-policy-evaluation","doubly-robust","importance-sampling","off-policy-evaluation","offline-policy-evaluation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/banditml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-10T03:09:14.000Z","updated_at":"2025-02-05T23:50:48.000Z","dependencies_parsed_at":"2025-01-16T13:34:25.456Z","dependency_job_id":"be118b79-451a-4d01-97bc-5d94ba6803b0","html_url":"https://github.com/banditml/offline-policy-evaluation","commit_stats":{"total_commits":123,"total_committers":6,"mean_commits":20.5,"dds":0.1707317073170732,"last_synced_commit":"8ae4c6364d19db75f4c53ceaeaed7658f8d75243"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banditml%2Foffline-policy-evaluation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banditml%2Foffline-policy-evaluation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banditml%2Foffline-policy-evaluation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/banditml%2Foffline-policy-evaluation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/banditml","download_url":"https://codeload.github.com/banditml/offline-policy-evaluation/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247423513,"owners_count":20936626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["counterfactual-learning","counterfactual-policy-evaluation","doubly-robust","importance-sampling","off-policy-evaluation","offline-policy-evaluation"],"created_at":"2024-11-28T23:14:05.605Z","updated_at":"2025-04-06T02:07:22.424Z","avatar_url":"https://github.com/banditml.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Offline policy evaluation\n[![PyPI version](https://badge.fury.io/py/offline-evaluation.svg)](https://badge.fury.io/py/offline-evaluation) [![](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) [![Downloads](https://static.pepy.tech/personalized-badge/offline-evaluation?period=total\u0026units=international_system\u0026left_color=black\u0026right_color=brightgreen\u0026left_text=Downloads)](https://pepy.tech/project/offline-evaluation)\n\nImplementations and examples of common offline policy evaluation methods in Python. For more information on offline policy evaluation see this [tutorial](https://edoconti.medium.com/offline-policy-evaluation-run-fewer-better-a-b-tests-60ce8f93fa15).\n\n## Installation\n`pip install offline-evaluation`\n\n## Usage\n```\nfrom ope.methods import doubly_robust\n```\n\nGet some historical logs generated by a previous policy:\n```\ndf = pd.DataFrame([\n\t{\"context\": {\"p_fraud\": 0.08}, \"action\": \"blocked\", \"action_prob\": 0.90, \"reward\": 0},\n\t{\"context\": {\"p_fraud\": 0.03}, \"action\": \"allowed\", \"action_prob\": 0.90, \"reward\": 20},\n\t{\"context\": {\"p_fraud\": 0.02}, \"action\": \"allowed\", \"action_prob\": 0.90, \"reward\": 10},\n\t{\"context\": {\"p_fraud\": 0.01}, \"action\": \"allowed\", \"action_prob\": 0.90, \"reward\": 20},     \n\t{\"context\": {\"p_fraud\": 0.09}, \"action\": \"allowed\", \"action_prob\": 0.10, \"reward\": -20},\n\t{\"context\": {\"p_fraud\": 0.40}, \"action\": \"allowed\", \"action_prob\": 0.10, \"reward\": -10},\n ])\n```\nDefine a function that computes `P(action | context)` under the new policy:\n```\ndef action_probabilities(context):\n    epsilon = 0.10\n    if context[\"p_fraud\"] \u003e 0.10:\n        return {\"allowed\": epsilon, \"blocked\": 1 - epsilon}    \n    return {\"allowed\": 1 - epsilon, \"blocked\": epsilon}\n```\nConduct the evaluation:\n```\ndoubly_robust.evaluate(df, action_probabilities)\n\u003e {'expected_reward_logging_policy': 3.33, 'expected_reward_new_policy': -28.47}\n```\nThis means the new policy is significantly worse than the logging policy.  Instead of A/B testing this new policy online, it would be better to test some other policies offline first.\n\nSee examples for more detailed tutorials.\n\n## Supported methods\n\n- [x] Inverse propensity scoring\n- [x] Direct method\n- [x] Doubly robust ([paper](https://arxiv.org/abs/1503.02834))\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbanditml%2Foffline-policy-evaluation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbanditml%2Foffline-policy-evaluation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbanditml%2Foffline-policy-evaluation/lists"}