{"id":31681273,"url":"https://github.com/koulanurag/opcc","last_synced_at":"2026-05-19T05:44:36.017Z","repository":{"id":83652621,"uuid":"391179213","full_name":"koulanurag/opcc","owner":"koulanurag","description":"Benchmark for \"Offline Policy Comparison with Confidence\"","archived":false,"fork":false,"pushed_at":"2023-10-25T15:02:51.000Z","size":74082,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-28T19:14:34.551Z","etag":null,"topics":["confidence-estimation","offline-policy-comparison","offline-reinforcement-learning","policy-evaluation","reinforcement-learning","uncertainty-estimation"],"latest_commit_sha":null,"homepage":"https://koulanurag.dev/opcc","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/koulanurag.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-07-30T20:15:25.000Z","updated_at":"2024-02-19T20:42:24.000Z","dependencies_parsed_at":"2023-03-11T10:00:39.506Z","dependency_job_id":null,"html_url":"https://github.com/koulanurag/opcc","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/koulanurag/opcc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koulanurag%2Fopcc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koulanurag%2Fopcc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koulanurag%2Fopcc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koulanurag%2Fopcc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/koulanurag","download_url":"https://codeload.github.com/koulanurag/opcc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/koulanurag%2Fopcc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278909712,"owners_count":26066887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["confidence-estimation","offline-policy-comparison","offline-reinforcement-learning","policy-evaluation","reinforcement-learning","uncertainty-estimation"],"created_at":"2025-10-08T07:47:37.768Z","updated_at":"2025-10-08T07:47:51.938Z","avatar_url":"https://github.com/koulanurag.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Offline Policy Comparison with Confidence (opcc)\n\nIt's a benchmark comprising **\"policy comparison queries\"(pcq)** to evaluate uncertainty estimation in offline\nreinforcement learning. \n\n__Research Paper:__ [https://arxiv.org/abs/2205.10739](https://arxiv.org/abs/2205.10739)\n\n__Website/Docs:__  [https://koulanurag.dev/opcc](https://koulanurag.dev/opcc)\n\n[![Python package](https://github.com/koulanurag/opcc/actions/workflows/python-package.yml/badge.svg)](https://github.com/koulanurag/opcc/actions/workflows/python-package.yml)\n![License](https://img.shields.io/github/license/koulanurag/opcc)\n[![codecov](https://codecov.io/gh/koulanurag/opcc/branch/main/graph/badge.svg?token=47LIB1CLI4)](https://codecov.io/gh/koulanurag/opcc)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n## Installation\n\n**1. Setup Mujoco**\n- Download **[mujoco 210](https://github.com/google-deepmind/mujoco/releases/tag/2.1.0)** and unzip in `~/.mujoco`\n- Add following to `.consolerc/.zshrc` and source it.  \n  ```console\n  export MUJOCO_PY_MUJOCO_PATH=$HOME/.mujoco/mujoco210/\n  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin\n  ```\n  \u003cbr\u003e*(You can also refer **[here](https://github.com/koulanurag/opcc/blob/main/.github/workflows/python-package.yml#L27)** for step-by-step instructions on mujoco installation)*\n\n**2. Setup [Python 3.7+](https://www.python.org/downloads/)** and optionally(recommended) create a  `virtualenv` [(refer here)](https://docs.python.org/3/tutorial/venv.html)\n\n**3. Python package and dependencies could be installed using:**\n\n  ```console\n  pip3 install --upgrade 'pip\u003c=23.0.1'\n  pip3 install --upgrade 'setuptools\u003c=66'\n  pip3 install --upgrade 'wheel\u003c=0.38.4'\n  pip3 install git+https://github.com/koulanurag/opcc@main#egg=opcc\n  ```\n\n**4. Install Pytorch [\\[\u003e= 1.8.0\\]](https://pytorch.org/)**\n\n## Usage\n\n### Queries:\n\n```python\nimport opcc\nimport numpy as np\nfrom sklearn import metrics\n\nenv_name = 'HalfCheetah-v2'\ndataset_name = 'random'\n\n# ########################################################\n# Policy Comparison Queries (PCQ) (Section : 3.1 in paper)\n# ########################################################\n# Queries are dictionaries with policies as keys and\n# corresponding queries as values.\nqueries = opcc.get_queries(env_name)\n\n\ndef random_predictor(obs_a, obs_b, action_a, action_b,\n                     policy_a, policy_b, horizon):\n    answer = np.random.randint(low=0, high=2, size=len(obs_a)).tolist()\n    confidence = np.random.rand(len(obs_a)).tolist()\n    return answer, confidence\n\n\ntargets = []\npredictions = []\nconfidences = []\n# Batch iteration through Queries :\nfor (policy_a_id, policy_b_id), query_batch in queries.items():\n    # retrieve policies\n    policy_a, _ = opcc.get_policy(*policy_a_id)\n    policy_b, _ = opcc.get_policy(*policy_b_id)\n\n    # query-a\n    obs_a = query_batch['obs_a']\n    action_a = query_batch['action_a']\n\n    # query-b\n    obs_b = query_batch['obs_b']\n    action_b = query_batch['action_b']\n\n    # horizon for policy evaluation\n    horizon = query_batch['horizon']\n\n    # ground truth binary vector:\n    # (Q(obs_a, action_a, policy_a, horizon)\n    # \u003c  Q(obs_b, action_b, policy_b, horizon))\n    target = query_batch['target'].tolist()\n    targets += target\n\n    # Let's make predictions for the given queries.\n    # One can use any mechanism to predict the corresponding\n    # answer to queries, and we simply use a random predictor\n    # over here for demonstration purposes\n    p, c = random_predictor(obs_a, obs_b, action_a, action_b,\n                            policy_a, policy_b, horizon)\n    predictions += p\n    confidences += c\n```\n\n### Evaluation Metrics:\n\n```python\n# #########################################\n# (Section 3.3 in paper)\n# #########################################\nloss = np.logical_xor(predictions, targets)  # we use 0-1 loss for demo\n\n# List of tuples (coverage, selective_risks, tau)\ncoverage_sr_tau = []\ntau_interval=0.01\nfor tau in np.arange(0, 1 + 2 * tau_interval, tau_interval):\n  non_abstain_filter = confidences \u003e= tau\n  if any(non_abstain_filter):\n    selective_risk = np.sum(loss[non_abstain_filter])\n    selective_risk /= np.sum(non_abstain_filter)\n    coverage = np.mean(non_abstain_filter)\n    coverage_sr_tau.append((coverage, selective_risk, tau))\n  else:\n    # 0 risk for 0 coverage\n    coverage_sr_tau.append((0, 0, tau))\n\ncoverages, selective_risks, taus = list(zip(*sorted(coverage_sr_tau)))\nassert selective_risks[0] == 0 and coverages[0] == 0 , \"no coverage not found\"\nassert coverages[-1] == 1, 'complete coverage not found'\n\n# AURCC ( Area Under Risk-Coverage Curve): Ideally, we would like it to be 0\naurcc = metrics.auc(x=coverages,y=selective_risks)\n\n# Reverse-pair-proportion\nrpp = np.logical_and(np.expand_dims(loss, 1)\n                     \u003c np.expand_dims(loss, 1).transpose(),\n                     np.expand_dims(confidences, 1)\n                     \u003c np.expand_dims(confidences, 1).transpose()).mean()\n\n# Coverage Resolution (cr_k) : Ideally, we would like it to be 1\nk = 10\nbins = [_ for _ in np.arange(0, 1, 1 / k)]\ncr_k = np.unique(np.digitize(coverages, bins)).size / len(bins)\n\nprint(\"aurcc: {}, rpp: {}, cr_{}:{}\".format(aurcc, rpp, k, cr_k))\n```\n\n### Dataset:\n\n```python\n\n# ###########################################\n# Datasets: (Section 4 in paper - step (1) )\n# ###########################################\n\nimport opcc\n\nenv_name = 'HalfCheetah-v2'\n\n# list all dataset names corresponding to an env\ndataset_names = opcc.get_dataset_names(env_name)\n\ndataset_name = 'random'\n# This is a very-slim wrapper over D4RL datasets.\ndataset = opcc.get_qlearning_dataset(env_name, dataset_name)\n\n```\n\n### Policy Usage:\n\n```python\nimport opcc, gym, torch\n\nenv_name = \"HalfCheetah-v2\"\nmodel, model_info = opcc.get_policy(env_name, pre_trained=1)\n\ndone = False\nenv = gym.make(env_name)\n\nobs = env.reset()\nwhile not done:\n    action = model(torch.tensor(obs).unsqueeze(0))\n    action = action.data.cpu().numpy()[0].astype('float32')\n    obs, reward, done, step_info = env.step(action)\n    env.render()\n```\n\n## Benchmark Information\n\n- We borrow dataset's from [**D4RL**](https://arxiv.org/abs/2004.07219)\n- Queries can be visualized [**HERE**](https://wandb.ai/koulanurag/opcc/reports/Visualization-of-Policy-Comparison-Queries-pcq---VmlldzoxNTg3NzM2?accessToken=i71bbslusbt5rrb1kqfpz1e7n6yij6ocq47c19nydukrrvs4kv66k17j1s6dr5hw)\n- Baselines can be found here [**HERE**](https://github.com/koulanurag/opcc-baselines)\n\n### :low_brightness: [d4rl:maze2d](https://github.com/rail-berkeley/d4rl/wiki/Tasks#maze2d)\n\n\u003cimg width=\"500\" alt=\"maze2d-environments\" src=\"https://github.com/rail-berkeley/offline_rl/raw/assets/assets/mazes_filmstrip.png\"\u003e\n\n#### Datasets:\n\n|    Environment Name     |      Datasets       |    Query-Count    |\n|:-----------------------:|:-------------------:|:-----------------:|\n|  `d4rl:maze2d-open-v0`  | `1k, 10k, 100k, 1m` |      `1500`       |\n| `d4rl:maze2d-medium-v1` | `1k, 10k, 100k, 1m` |      `1500`       |\n| `d4rl:maze2d-umaze-v1`  | `1k, 10k, 100k, 1m` |      `1500`       |\n| `d4rl:maze2d-large-v1`  | `1k, 10k, 100k, 1m` | `121`\u003csup\u003e\u003cstrong\u003e*\u003c/strong\u003e\u003c/sup\u003e |\n\n#### Pre-trained policy performance:\n\n|    Environment Name     | `pre_trained=1` (best) | `pre_trained=2` | `pre_trained=3` | `pre_trained=4` (worst) |\n|:-----------------------:|:----------------------:|:---------------:|:---------------:|:-----------------------:|\n|  `d4rl:maze2d-open-v0`  |      122.2±10.61       |   104.9±22.19   |   18.05±14.85   |        4.85±8.62        |\n| `d4rl:maze2d-medium-v1` |     245.55±272.75      |  203.75±252.61  |  256.65±260.16  |      258.55±262.81      |\n| `d4rl:maze2d-umaze-v1`  |      235.5±35.45       |  197.75±58.21   |   23.4±73.24    |        3.2±9.65         |\n| `d4rl:maze2d-large-v1`  |     231.35±268.37      |  160.8±201.97   |   50.65±76.94   |        9.95±9.95        |\n\n### :low_brightness: [mujoco(gym)](https://gym.openai.com/envs/#mujoco)\n\n\u003cp float=\"left\"\u003e\n    \u003cimg width=\"160\" alt=\"mujoco-halfcheetah\" src=\"opcc/assets/HalfCheetah-v2/halfcheetah.png\" /\u003e \n    \u003cimg width=\"160\" alt=\"mujoco-hopper\" src=\"opcc/assets/Hopper-v2/hopper.png\" /\u003e\n    \u003cimg width=\"160\" alt=\"mujoco-walker2d\" src=\"opcc/assets/Walker2d-v2/walker2d.png\" /\u003e\n\u003c/p\u003e\n\n#### Datasets:\n\n| Environment Name |                        Datasets                        |    Query-Count    |\n|:----------------:|:------------------------------------------------------:|:-----------------:|\n| `HalfCheetah-v2` | `random, expert, medium, medium-replay, medium-expert` |      `1500`       |\n|   `Hopper-v2`    | `random, expert, medium, medium-replay, medium-expert` |      `1500`       |\n|  `Walker2d-v2`   | `random, expert, medium, medium-replay, medium-expert` |      `1500`       |\n\n#### Pre-trained Policy performance:\n\n| Environment Name | `pre_trained=1` (best) | `pre_trained=2` | `pre_trained=3` | `pre_trained=4` (worst) |\n|:----------------:|:----------------------:|:---------------:|:---------------:|:-----------------------:|\n| `HalfCheetah-v2` |     1169.13±80.45      | 1044.39±112.61  |  785.88±303.59  |       94.79±40.88       |\n|   `Hopper-v2`    |     1995.84±794.71     |  1466.71±497.1  | 1832.43±560.86  |       236.51±1.09       |\n|  `Walker2d-v2`   |     2506.9±689.45      |  811.28±321.66  |  387.01±42.82   |      162.7±102.14       |\n\n\n## Testing Package:\n\n- Install : `pip install -e \".[test]\"`\n- Run: `pytest -v`\n- Testing is computationally expensive as we validate ground truth value estimates and corresponding labels. These can\n  be disabled by setting following flags:\n  ```console\n  export SKIP_QUERY_TARGET_TESTS=1 # disable target estimation and label validation  \n  export SKIP_Q_LEARNING_DATASET_TEST=1  # disable test for checking dataset existence\n  export SKIP_SEQUENCE_DATASET_TEST=1 # disables test for checking sequence dataset\n  ```\n\n## Development:\n- Install : `pip install -e \".[all]\"`\n- Generate-Queries:\n  ```console\n  % Mujoco (Gym) Environment\n  python scripts/generate_queries.py --env-name HalfCheetah-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb\n  python scripts/generate_queries.py --env-name Hopper-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb\n  python scripts/generate_queries.py --env-name Walker2d-v2 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.1 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb\n\n  % Maze Environment\n  python scripts/generate_queries.py --env-name d4rl:maze2d-large-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb\n  python scripts/generate_queries.py --env-name d4rl:maze2d-umaze-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb\n  python scripts/generate_queries.py --env-name d4rl:maze2d-medium-v1 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.2 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb\n  python scripts/generate_queries.py --env-name d4rl:maze2d-open-v0 --horizons 10 20 30 40 50 --policy-ids 1 2 3 4 --noise 0.5 --eval-runs 10 --ignore-delta-per-horizons 10 10 10 10 10 --max-trans-count 2000 --ignore-stuck-count 1000 --save-prob 0.6 --per-policy-comb-query 250 --use-wandb\n  ```\n- Generate policy performance stats for readme:\n  ```console\n  python scripts/generate_policy_stats.py --all-envs\n  ```\n\n## Contact\n\nIf you have any questions or suggestions , please open an issue on this GitHub repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoulanurag%2Fopcc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkoulanurag%2Fopcc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkoulanurag%2Fopcc/lists"}