{"id":16735115,"url":"https://github.com/mamba413/cope","last_synced_at":"2025-04-10T12:15:55.351Z","repository":{"id":51676908,"uuid":"380738105","full_name":"Mamba413/cope","owner":"Mamba413","description":"Off-Policy Interval Estimation withConfounded Markov Decision Process","archived":false,"fork":false,"pushed_at":"2023-09-13T14:01:35.000Z","size":3297,"stargazers_count":6,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T11:07:50.897Z","etag":null,"topics":["causal-inference","confidence-intervals","off-policy-evaluation","reinforcement-learning","statistical-inference"],"latest_commit_sha":null,"homepage":"https://www.tandfonline.com/doi/full/10.1080/01621459.2022.2110878","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mamba413.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-27T12:48:20.000Z","updated_at":"2025-02-16T12:47:22.000Z","dependencies_parsed_at":"2025-02-17T05:41:11.602Z","dependency_job_id":null,"html_url":"https://github.com/Mamba413/cope","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mamba413%2Fcope","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mamba413%2Fcope/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mamba413%2Fcope/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mamba413%2Fcope/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mamba413","download_url":"https://codeload.github.com/Mamba413/cope/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248217084,"owners_count":21066633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causal-inference","confidence-intervals","off-policy-evaluation","reinforcement-learning","statistical-inference"],"created_at":"2024-10-13T00:04:59.752Z","updated_at":"2025-04-10T12:15:55.323Z","avatar_url":"https://github.com/Mamba413.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Off-Policy Interval Estimation with Confounded Markov Decision Process (COPE)\n\nThis repository contains the implementation for the paper \"[Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process](https://arxiv.org/pdf/2202.10589.pdf)\" (JASA, 2022) in Python.\n\n## Summary of the Paper\n\nThis paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy’s value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies.\n\n\u003cimg align=\"center\" src=\"CausalDiagram.png\" alt=\"drawing\" width=\"700\"\u003e\n\n**Figure 1**: Causal diagrams for MDP, confounded MDP and confounded MDP with mediators (the focus of this paper). \n\n## Requirements\nChange your working directory to this main folder, run `conda env create --file COPE.yml` to create the Conda environment, \nand then run `conda activate COPE` to activate the environment. \n\n## Code Description\n\nThe proposed estimators:\n- `opeuc.py`: direct estimator, importance sampling estimator, confounded off-policy estimator\n\nNuisance parameters:\n- `problearner.py`: learn transition probabilities: (i) state --\u003e action \u0026 (ii) (state, action) --\u003e mediator\n- `qlearner.py`: fitted Q evaluation\n- `rnnl.py`: marginal ratio learning via neural network\n- `rll.py`: marginal ratio learning via linear model\n\nSampling:\n- `simulator_save.py`: generate observations tuple from MDP\n\nUtilities:\n- `policy.py`: target policies\n- `utilize.py`: some helpful functions\n- `utilize_ci.py`: helpful functions for computing confidence intervals\n- `utilize_prototypemodel.py`: helpful classes for simulations\n\nNumerical experiments:\n\n(See ACC form detailed for instructions)\n\n- `sim_robust.py`: simulation for demonstrating double robustness\n- `sim_time_compare.py` \u0026 `sim_time_compare_multdim.py`: simulation when time points vary\n- `sim_trajectory_compare.py` \u0026 `sim_trajectory_compare_multdim.py`: simulation when the number of trajectories vary\n- `sim_ratiolearner_compare.py`\n- `sim_ratio_features_number_compare.py`\n\n## Citations\n\nPlease cite the following publications if you make use of the material here. \n\n- Shi, C., Zhu, J., Ye, S., Luo, S., Zhu, H., \u0026 Song, R. (2022). Off-policy confidence interval estimation with confounded Markov decision process. Journal of the American Statistical Association, accepted.\n\n```\n@article{shi2022off,\n  title={Off-policy confidence interval estimation with confounded Markov decision process},\n  author={Shi, Chengchun and Zhu, Jin and Ye, Shen and Luo, Shikai and Zhu, Hongtu and Song, Rui},\n  journal={Journal of the American Statistical Association},\n  volume={accepted},\n  year={2022},\n  publisher={Taylor \\\u0026 Francis}\n}\n```\n\n## License\n\nAll content in this repository is licensed under the GPL-3 license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmamba413%2Fcope","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmamba413%2Fcope","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmamba413%2Fcope/lists"}