{"id":46350262,"url":"https://github.com/microsoft/orchestrated-value-mapping","last_synced_at":"2026-03-04T23:00:54.496Z","repository":{"id":52486016,"uuid":"464671304","full_name":"microsoft/orchestrated-value-mapping","owner":"microsoft","description":"Orchestrated Value Mapping","archived":false,"fork":false,"pushed_at":"2022-08-03T19:13:30.000Z","size":21,"stargazers_count":4,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-02-21T19:05:05.744Z","etag":null,"topics":["algorithm","algorithms","dqn","log-lin","log-rl","loglin","loglinear","logrl","mapping","q-decomporition","reinforcement-learning","reward-decomposition","rl","value","value-mapping"],"latest_commit_sha":null,"homepage":"https://openreview.net/forum?id=c87d0TS4yX","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null}},"created_at":"2022-02-28T22:58:56.000Z","updated_at":"2025-11-25T19:30:41.000Z","dependencies_parsed_at":"2022-09-23T09:02:37.895Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/orchestrated-value-mapping","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/microsoft/orchestrated-value-mapping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Forchestrated-value-mapping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Forchestrated-value-mapping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Forchestrated-value-mapping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Forchestrated-value-mapping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/orchestrated-value-mapping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2Forchestrated-value-mapping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30078798,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T08:01:56.766Z","status":"ssl_error","status_checked_at":"2026-03-04T08:00:42.919Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","algorithms","dqn","log-lin","log-rl","loglin","loglinear","logrl","mapping","q-decomporition","reinforcement-learning","reward-decomposition","rl","value","value-mapping"],"created_at":"2026-03-04T23:00:30.078Z","updated_at":"2026-03-04T23:00:54.484Z","avatar_url":"https://github.com/microsoft.png","language":"Python","readme":"# **Orchestrated Value Mapping**\n\nThis repository hosts the code release for the paper [\"Orchestrated Value Mapping for Reinforcement Learning\"](https://arxiv.org/abs/2203.07171), published at [ICLR 2022][map_rl]. This work was done by [Mehdi Fatemi](https://www.microsoft.com/en-us/research/people/mefatemi) (Microsoft Research) and [Arash Tavakoli](https://atavakol.github.io) (Max Planck Institute for Intelligent Systems).\n\nWe release a flexible framework, built upon Dopamine ([Castro et al., 2018][dopamine_paper]), for building and orchestrating various mappings over different reward decomposition schemes. This enables the research community to easily explore the design space that our theory opens up and investigate new convergent families of algorithms.\n\nThe code has been developed by [Arash Tavakoli](https://github.com/atavakol).\n\n## [LICENSE](https://github.com/microsoft/orchestrated-value-mapping/blob/master/LICENSE)\n\n## [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)\n \n\n## Citing\n\nIf you make use of our work, please use the citation information below:\n\n```\n@inproceedings{Fatemi2022Orchestrated,\n  title={Orchestrated Value Mapping for Reinforcement Learning},\n  author={Mehdi Fatemi and Arash Tavakoli},\n  booktitle={International Conference on Learning Representations},\n  year={2022},\n  url={https://openreview.net/forum?id=c87d0TS4yX}\n}\n```\n\n# Getting started\n\nWe install the required packages within a virtual environment. \n\n\n## Virtual environment\n\nCreate a virtual environment using `conda` via: \n\n```\nconda create --name maprl-env python=3.8\nconda activate maprl-env\n```\n\n\n## Prerequisites\n\n**Atari benchmark.** \nTo set up the Atari suite, please follow the steps outlined [here](https://github.com/google/dopamine/blob/master/README.md#prerequisites).    \n\n**Install Dopamine.** Install a compatible version of [Dopamine][dopamine_repo] with `pip`:\n```\npip install dopamine-rl==3.1.10\n```\n\n\n## Installing from source\n\nTo easily experiment within our framework, install it from source and modify the code directly:\n\n```\ngit clone https://github.com/microsoft/orchestrated-value-mapping.git\ncd orchestrated-value-mapping\npip install -e .\n```\n\n\n## Training an agent\n\nChange directory to the workspace directory:\n```\ncd map_rl\n```\n\nTo train a **LogDQN** agent, similar to that introduced by [van Seijen, Fatemi \u0026 Tavakoli (2019)][log_rl], run the following command:\n```\npython -um map_rl.train \\\n  --base_dir=/tmp/log_dqn \\\n  --gin_files='configs/map_dqn.gin' \\\n  --gin_bindings='MapDQNAgent.map_func_id=\"[log,log]\"' \\\n  --gin_bindings='MapDQNAgent.rew_decomp_id=\"polar\"' \u0026\n```\nHere, `polar` refers to the reward decomposition scheme described in Equation 13 of [Fatemi \u0026 Tavakoli (2022)][map_rl] (which has two reward channels) and `[log,log]` results in a logarithmic mapping for each of the two reward channels. \n\nTrain a **LogLinDQN** agent, similar to that described by [Fatemi \u0026 Tavakoli (2022)][map_rl], using:\n```\npython -um map_rl.train \\\n  --base_dir=/tmp/loglin_dqn \\\n  --gin_files='configs/map_dqn.gin' \\\n  --gin_bindings='MapDQNAgent.map_func_id=\"[loglin,loglin]\"' \\\n  --gin_bindings='MapDQNAgent.rew_decomp_id=\"polar\"' \u0026\n```\n\n\n## Creating custom agents\n\nTo instantiate a custom agent, simply set the mapping functions for each channel and a reward decomposition scheme. For instance, the following setting\n```\nMapDQNAgent.map_func_id=\"[log,identity]\"\nMapDQNAgent.rew_decomp_id=\"polar\"\n```\nresults in a logarithmic mapping for the positive-reward channel and the identity mapping (same as in [DQN][dqn]) for the negative-reward channel. \n\nTo use more complex reward decomposition schemes, such as Configurations 1 and 2 from [Fatemi \u0026 Tavakoli (2022)][map_rl], you can do as follows:\n```\nMapDQNAgent.map_func_id=\"[identity,identity,log,log,loglin,loglin]\"\nMapDQNAgent.rew_decomp_id=\"config_1\"\n```\n\nTo instantiate an ensemble of two learners, each using a `polar` reward decomposition, use the following syntax:\n```\nMapDQNAgent.map_func_id=\"[loglin,loglin,log,log]\"\nMapDQNAgent.rew_decomp_id=\"two_ensemble_polar\"\n```\n\n\n## Custom mappings and reward decomposition schemes\n\nTo implement custom mapping functions and reward decomposition schemes, we suggest that you draw on insights from [Fatemi \u0026 Tavakoli (2022)][map_rl] and follow the format of such methods in [map_dqn_agent.py](https://github.com/microsoft/orchestrated-value-mapping/map_rl/map_dqn_agent.py) to design yours.  \n\n\n\n[map_rl]: https://openreview.net/forum?id=c87d0TS4yX\n[log_rl]: https://arxiv.org/abs/1906.00572\n[dqn]: https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf\n[dopamine_paper]: https://arxiv.org/abs/1812.06110\n[dopamine_repo]: https://github.com/google/dopamine","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Forchestrated-value-mapping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Forchestrated-value-mapping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Forchestrated-value-mapping/lists"}