https://github.com/banditml/offline-policy-evaluation

Implementations and examples of common offline policy evaluation methods in Python.
https://github.com/banditml/offline-policy-evaluation

counterfactual-learning counterfactual-policy-evaluation doubly-robust importance-sampling off-policy-evaluation offline-policy-evaluation

Last synced: about 1 year ago
JSON representation

Implementations and examples of common offline policy evaluation methods in Python.

Host: GitHub
URL: https://github.com/banditml/offline-policy-evaluation
Owner: banditml
License: other
Created: 2020-03-10T03:09:14.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-02-11T00:33:53.000Z (over 3 years ago)
Last Synced: 2025-03-30T01:07:43.663Z (over 1 year ago)
Topics: counterfactual-learning, counterfactual-policy-evaluation, doubly-robust, importance-sampling, off-policy-evaluation, offline-policy-evaluation
Language: Python
Homepage:
Size: 1.17 MB
Stars: 222
Watchers: 6
Forks: 25
Open Issues: 9
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          
# Offline policy evaluation

[![PyPI version](https://badge.fury.io/py/offline-evaluation.svg)](https://badge.fury.io/py/offline-evaluation) [![](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) [![Downloads](https://static.pepy.tech/personalized-badge/offline-evaluation?period=total&units=international_system&left_color=black&right_color=brightgreen&left_text=Downloads)](https://pepy.tech/project/offline-evaluation)

Implementations and examples of common offline policy evaluation methods in Python. For more information on offline policy evaluation see this [tutorial](https://edoconti.medium.com/offline-policy-evaluation-run-fewer-better-a-b-tests-60ce8f93fa15).

## Installation

`pip install offline-evaluation`

## Usage

```

from ope.methods import doubly_robust

```

Get some historical logs generated by a previous policy:

```

df = pd.DataFrame([

	{"context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},

	{"context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},

	{"context": {"p_fraud": 0.02}, "action": "allowed", "action_prob": 0.90, "reward": 10},

	{"context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 20},     

	{"context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20},

	{"context": {"p_fraud": 0.40}, "action": "allowed", "action_prob": 0.10, "reward": -10},

 ])

```

Define a function that computes `P(action | context)` under the new policy:

```

def action_probabilities(context):

    epsilon = 0.10

    if context["p_fraud"] > 0.10:

        return {"allowed": epsilon, "blocked": 1 - epsilon}    

    return {"allowed": 1 - epsilon, "blocked": epsilon}

```

Conduct the evaluation:

```

doubly_robust.evaluate(df, action_probabilities)

> {'expected_reward_logging_policy': 3.33, 'expected_reward_new_policy': -28.47}

```

This means the new policy is significantly worse than the logging policy.  Instead of A/B testing this new policy online, it would be better to test some other policies offline first.

See examples for more detailed tutorials.

## Supported methods

- [x] Inverse propensity scoring

- [x] Direct method

- [x] Doubly robust ([paper](https://arxiv.org/abs/1503.02834))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/banditml/offline-policy-evaluation

Awesome Lists containing this project

README