{"id":30957270,"url":"https://github.com/rystrauss/dopamax","last_synced_at":"2025-09-11T13:45:13.206Z","repository":{"id":65953434,"uuid":"592036728","full_name":"rystrauss/dopamax","owner":"rystrauss","description":"Reinforcement learning in pure JAX.","archived":false,"fork":false,"pushed_at":"2025-02-23T20:50:10.000Z","size":268,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-17T04:36:59.793Z","etag":null,"topics":["alphazero","anakin","brax","ddpg","dopamax","dqn","jax","mcts","muzero","podracer","ppo","reinforcement-learning","sac","td3"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/dopamax/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rystrauss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-22T18:22:36.000Z","updated_at":"2025-05-20T16:16:07.000Z","dependencies_parsed_at":"2024-01-01T10:04:49.945Z","dependency_job_id":"d624d544-3d96-4081-a8f4-037fc87c2029","html_url":"https://github.com/rystrauss/dopamax","commit_stats":{"total_commits":79,"total_committers":1,"mean_commits":79.0,"dds":0.0,"last_synced_commit":"26fb213a04120d65bd4f2dcfe49cbbc951b85e90"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/rystrauss/dopamax","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rystrauss%2Fdopamax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rystrauss%2Fdopamax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rystrauss%2Fdopamax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rystrauss%2Fdopamax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rystrauss","download_url":"https://codeload.github.com/rystrauss/dopamax/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rystrauss%2Fdopamax/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274648319,"owners_count":25324299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-11T02:00:13.660Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphazero","anakin","brax","ddpg","dopamax","dqn","jax","mcts","muzero","podracer","ppo","reinforcement-learning","sac","td3"],"created_at":"2025-09-11T13:45:09.040Z","updated_at":"2025-09-11T13:45:13.189Z","avatar_url":"https://github.com/rystrauss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[1]: https://github.com/google/jax\n\n[2]: https://arxiv.org/abs/2104.06272\n\n# Dopamax\n\n\u003cp\u003e\n       \u003ca href=\"https://pypi.python.org/pypi/dopamax\"\u003e\n        \u003cimg src=\"https://img.shields.io/pypi/pyversions/dopamax.svg\" /\u003e\u003c/a\u003e\n       \u003ca href= \"https://badge.fury.io/py/dopamax\"\u003e\n        \u003cimg src=\"https://badge.fury.io/py/dopamax.svg\" /\u003e\u003c/a\u003e\n       \u003ca href= \"https://github.com/rystrau/dopamax/blob/master/LICENSE.md\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/license-MIT-blue.svg\" /\u003e\u003c/a\u003e\n       \u003ca href= \"https://github.com/psf/black\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nDopamax is a library containing pure [JAX][1] implementations of common reinforcement learning algorithms. _Everything_\nis implemented in JAX, including the environments. This allows for extremely fast training and evaluation of agents,\nbecause the entire loop of environment simulation, agent interaction, and policy updates can be compiled as a single\nXLA program and executed on CPUs, GPUs, or TPUs. More specifically, the implementations in Dopamax follow the\nAnakin Podracer architecture -- see [this paper][2] for more details.\n\n## Supported Algorithms\n\n- [Proximal Policy Optimization (PPO)](src/dopamax/agents/anakin/ppo.py)\n- [Deep Q-Network (DQN)](src/dopamax/agents/anakin/dqn.py)\n- [Deep Deterministic Policy Gradients (DDPG)](src/dopamax/agents/anakin/ddpg.py)\n- [Twin Delayed DDPG (TD3)](src/dopamax/agents/anakin/ddpg.py)\n- [Soft Actor Critic](src/dopamax/agents/anakin/sac.py)\n- [AlphaZero](src/dopamax/agents/anakin/alphazero.py)\n\n## Installation\n\nDopamax can be installed with:\n\n```bash\npip install dopamax\n```\n\nThis will install the `dopamax` Python package, as well as a command-line interface (CLI) for training and evaluation.\nNote that only the CPU version of JAX is installed by default. If you would like to use a GPU or TPU, you will need to\ninstall the appropriate version of JAX. See the\n[JAX installation instructions](https://github.com/google/jax#installation).\n\n\u003e [!NOTE]  \n\u003e The above command will install the latest \"release\" of Dopamax, which may not necessarily align with the latest\n\u003e commit in the main branch. To install the version found in the main branch of this repository, you can use:\n\u003e ```bash\n\u003e pip install git+https://github.com/rystrauss/dopamax.git\n\u003e ```\n\n## Usage\n\nAfter installation, the Dopamax CLI can be used to train and evaluate agents:\n\n```bash\ndopamax --help\n```\n\nDopamax uses [Weights and Biases (W\u0026B)](https://wandb.ai/site) for logging and artifact management. Before using the CLI\nfor training and evaluation, you must first make sure you have a W\u0026B account (it's free) and have authenticated\nwith `wandb login`.\n\n### Training\n\nAgent's can be trained using the `dopamax train` command, to which you must provide a configuration file. The\nconfiguration file is a YAML file that specifies the agent, environment, and training hyperparameters. You can find\nexamples in the [examples](examples) directory. For example, to train a PPO agent on the CartPole environment, you would\nrun:\n\n```bash\ndopamax train --config examples/ppo-cartpole/config.yaml\n```\n\nNote that all of the example config files have a random seed specified, so you will get the same result every time you\nrun the command. The seeds provided in the examples are known to result in a successful run (with the given\nhyperparameters). To get different results on each run, you can remove the seed from the config file.\n\n### Evaluation\n\nOnce you have trained some agents, you can evaluate them using the `dopamax evaluate` command. This will allow you to\nspecify a W\u0026B agent artifact that you'd like to evaluate (these artifacts are produced by the training runs and\ncontain the agent hyperparameters and weights from the end of training). For example, to evaluate a PPO agent trained\non CartPole, you might use a command like:\n\n```bash\ndopamax evaluate --agent_artifact CartPole-PPO-agent:v0 --num_episodes 100\n```\n\nwhere `--num_episodes 100` signals that you would like to rollout the agent's policy for 100 episodes. The minimum,\nmean, and maximum episode reward will be logged back to W\u0026B. If you would additionally like to render the episodes and\nhave then logged back to W\u0026B, you can provide the `--render` flag. But note that this will usually significantly slow\ndown the evaluation process since environment rendering is not a pure JAX function and requires callbacks to the host.\nYou should usually only use the `--render` flag with a small number of episodes.\n\n## See Also\n\nSome of the JAX-native packages that Dopamax relies on:\n- [sotetsuk/pgx](https://github.com/sotetsuk/pgx)\n- [deepmind/mctx](https://github.com/deepmind/mctx)\n- [deepmind/rlax](https://github.com/deepmind/rlax)\n- [google/brax](https://github.com/google/brax)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frystrauss%2Fdopamax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frystrauss%2Fdopamax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frystrauss%2Fdopamax/lists"}