{"id":13844412,"url":"https://github.com/kengz/openai_lab","last_synced_at":"2025-04-09T08:09:00.321Z","repository":{"id":50361901,"uuid":"65353367","full_name":"kengz/openai_lab","owner":"kengz","description":"An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.","archived":false,"fork":false,"pushed_at":"2018-02-09T04:30:18.000Z","size":8879,"stargazers_count":327,"open_issues_count":0,"forks_count":68,"subscribers_count":32,"default_branch":"master","last_synced_at":"2025-04-02T05:08:27.833Z","etag":null,"topics":["actor-critic","ddpg","deep-reinforcement-learning","experiment","keras","openai","policy-gradient","reinforcement-learning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kengz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-08-10T05:33:39.000Z","updated_at":"2025-03-26T08:36:43.000Z","dependencies_parsed_at":"2022-07-29T02:09:06.656Z","dependency_job_id":null,"html_url":"https://github.com/kengz/openai_lab","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Fopenai_lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Fopenai_lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Fopenai_lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2Fopenai_lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kengz","download_url":"https://codeload.github.com/kengz/openai_lab/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247999860,"owners_count":21031046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-critic","ddpg","deep-reinforcement-learning","experiment","keras","openai","policy-gradient","reinforcement-learning","tensorflow"],"created_at":"2024-08-04T17:02:41.928Z","updated_at":"2025-04-09T08:09:00.292Z","avatar_url":"https://github.com/kengz.png","language":"Python","funding_links":[],"categories":["Python","Python (1887)","Open Source Reinforcement Learning Platforms","Table of Contents"],"sub_categories":["Human Computer Interaction"],"readme":"# OpenAI Lab [![GitHub release](https://img.shields.io/github/release/kengz/openai_lab.svg)](https://github.com/kengz/openai_lab) [![CircleCI](https://circleci.com/gh/kengz/openai_lab.svg?style=shield)](https://circleci.com/gh/kengz/openai_lab) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/9e55f845b10b4b51b213620bfb98e4b3)](https://www.codacy.com/app/kengzwl/openai_lab?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=kengz/openai_lab\u0026amp;utm_campaign=Badge_Grade) [![Codacy Badge](https://api.codacy.com/project/badge/Coverage/9e55f845b10b4b51b213620bfb98e4b3)](https://www.codacy.com/app/kengzwl/openai_lab?utm_source=github.com\u0026utm_medium=referral\u0026utm_content=kengz/openai_lab\u0026utm_campaign=Badge_Coverage) [![GitHub stars](https://img.shields.io/github/stars/kengz/openai_lab.svg?style=social\u0026label=Star)](https://github.com/kengz/openai_lab) [![GitHub forks](https://img.shields.io/github/forks/kengz/openai_lab.svg?style=social\u0026label=Fork)](https://github.com/kengz/openai_lab)\n\n---\n\n\u003cp align=\"center\"\u003e\u003cb\u003e\u003ca href=\"https://github.com/kengz/SLM-Lab\"\u003eNOTICE: Please use the next version, SLM-Lab.\u003c/a\u003e\u003c/b\u003e\u003c/p\u003e\n\n---\n\n\u003cp align=\"center\"\u003e\u003cb\u003e\u003ca href=\"http://kengz.me/openai_lab\"\u003eOpenAI Lab Documentation\u003c/a\u003e\u003c/b\u003e\u003c/p\u003e\n\n---\n\n_An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras._\n\n_OpenAI Lab_ is created to do Reinforcement Learning (RL) like science - _theorize, experiment_. It provides an easy interface to [OpenAI Gym](https://gym.openai.com/) and [Keras](https://keras.io/), with an automated experimentation and evaluation framework.\n\n### Features\n\n1. **Unified RL environment and agent interface** using OpenAI Gym, Tensorflow, Keras, so you can focus on developing the algorithms.\n2. **[Core RL algorithms implementations](http://kengz.me/openai_lab/#agents-matrix), with reusable modular components** for developing deep RL algorithms.\n3. **[An experimentation framework](http://kengz.me/openai_lab/#experiments)** for running hundreds of trials of hyperparameter optimizations, with logs, plots and analytics for testing new RL algorithms. Experimental settings are stored in standardized JSONs for reproducibility and comparisons.\n4. **[Automated analytics of the experiments](http://kengz.me/openai_lab/#analysis)** for evaluating the RL agents and environments, and to help pick the best solution.\n5. **The [Fitness Matrix](http://kengz.me/openai_lab/#fitness-matrix)**, a table of the best scores of RL algorithms v.s. the environments; useful for research.\n\n\nWith OpenAI Lab, we could focus on researching the essential elements of reinforcement learning such as the algorithm, policy, memory, and parameter tuning. It allows us to build agents efficiently using existing components with the implementations from research ideas. We could then test the research hypotheses systematically by running experiments.\n\n*Read more about the research problems the Lab addresses in [Motivations](http://kengz.me/openai_lab/#motivations). Ultimately, the Lab is a generalized framework for doing reinforcement learning, agnostic of OpenAI Gym and Keras. E.g. Pytorch-based implementations are on the roadmap.*\n\n\n### Implemented Algorithms\n\nA list of the core RL algorithms implemented/planned.\n\nTo see their scores against OpenAI gym environments, go to **[Fitness Matrix](http://kengz.me/openai_lab/#fitness-matrix)**.\n\n\n|algorithm|implementation|eval score (pending)|\n|:---|:---|:---|\n|[DQN](https://arxiv.org/abs/1312.5602)|[DQN](https://github.com/kengz/openai_lab/blob/master/rl/agent/dqn.py)|-|\n|[Double DQN](https://arxiv.org/abs/1509.06461)|[DoubleDQN](https://github.com/kengz/openai_lab/blob/master/rl/agent/double_dqn.py)|-|\n|[Dueling DQN](https://arxiv.org/abs/1511.06581)|-|-|\n|Sarsa|[DeepSarsa](https://github.com/kengz/openai_lab/blob/master/rl/agent/deep_sarsa.py)|-|\n|Off-Policy Sarsa|[OffPolicySarsa](https://github.com/kengz/openai_lab/blob/master/rl/agent/offpol_sarsa.py)|-|\n|[PER (Prioritized Experience Replay)](https://arxiv.org/abs/1511.05952)|[PrioritizedExperienceReplay](https://github.com/kengz/openai_lab/blob/master/rl/memory/prioritized_exp_replay.py)|-|\n|[CEM (Cross Entropy Method)](https://en.wikipedia.org/wiki/Cross-entropy_method)|next|-|\n|[REINFORCE](http://incompleteideas.net/sutton/williams-92.pdf)|-|-|\n|[DPG (Deterministic Policy Gradient) off-policy actor-critic](http://jmlr.org/proceedings/papers/v32/silver14.pdf)|[ActorCritic](https://github.com/kengz/openai_lab/blob/master/rl/agent/actor_critic.py)|-|\n|[DDPG (Deep-DPG) actor-critic with target networks](https://arxiv.org/abs/1509.02971)|[DDPG](https://github.com/kengz/openai_lab/blob/master/rl/agent/ddpg.py)|-|\n|[A3C (asynchronous advantage actor-critic)](https://arxiv.org/pdf/1602.01783.pdf)|-|-|\n|Dyna|next|-|\n|[TRPO](https://arxiv.org/abs/1502.05477)|-|-|\n|Q*(lambda)|-|-|\n|Retrace(lambda)|-|-|\n|[Neural Episodic Control (NEC)](https://arxiv.org/abs/1703.01988)|-|-|\n|[EWC (Elastic Weight Consolidation)](https://arxiv.org/abs/1612.00796)|-|-|\n\n\n### Run the Lab\n\nNext, see [Installation](http://kengz.me/openai_lab/#installation) and jump to [Quickstart](http://kengz.me/openai_lab/#quickstart).\n\n\n\u003cdiv style=\"max-width: 100%\"\u003e\u003cimg alt=\"Timelapse of OpenAI Lab\" src=\"http://kengz.me/openai_lab/images/lab_demo_dqn.gif\" /\u003e\u003c/div\u003e\n\n*Timelapse of OpenAI Lab, solving CartPole-v0.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkengz%2Fopenai_lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkengz%2Fopenai_lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkengz%2Fopenai_lab/lists"}