{"id":15640351,"url":"https://github.com/scitator/rl-course-experiments","last_synced_at":"2025-07-25T07:38:28.066Z","repository":{"id":133293921,"uuid":"74877827","full_name":"Scitator/rl-course-experiments","owner":"Scitator","description":null,"archived":false,"fork":false,"pushed_at":"2017-06-07T06:41:25.000Z","size":2092,"stargazers_count":77,"open_issues_count":1,"forks_count":23,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-21T07:07:57.109Z","etag":null,"topics":["asynchronous-advantage-actor-critic","deep-learning","deep-q-network","deep-reinforcement-learning","genetic-algorithm","monte-carlo","neural-network","policy-gradient","reinforcement-learning","temporal-differencing-learning","tensorflow"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Scitator.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-27T09:14:17.000Z","updated_at":"2024-05-10T05:13:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"33188f34-7743-43f0-89c2-4d8d5dd0f04f","html_url":"https://github.com/Scitator/rl-course-experiments","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scitator%2Frl-course-experiments","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scitator%2Frl-course-experiments/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scitator%2Frl-course-experiments/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Scitator%2Frl-course-experiments/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Scitator","download_url":"https://codeload.github.com/Scitator/rl-course-experiments/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251666047,"owners_count":21624289,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asynchronous-advantage-actor-critic","deep-learning","deep-q-network","deep-reinforcement-learning","genetic-algorithm","monte-carlo","neural-network","policy-gradient","reinforcement-learning","temporal-differencing-learning","tensorflow"],"created_at":"2024-10-03T11:34:42.074Z","updated_at":"2025-04-30T07:48:41.300Z","avatar_url":"https://github.com/Scitator.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RL course experiments\n\n### Overview\nThis repository provides code implementations for popular Reinforcement Learning algorithms.\n\nMain idea was to generalise main RL algorithms and provide unified interface for testing them on any gym environment. \nFor example, now your can create your own Double Dueling Deep Recurrent Q-Learning agent (Let's name it, 3DRQ). \nFor simplicity, all main agent blocks are in `agents` folder. \n\nFor now, repository is under after-course refactoring. So, many documentation needed.\n\nAll code is written in Python 3 and uses RL environments from OpenAI Gym. \nAdvanced techniques use Tensorflow for neural network implementations.\n\n### Inspired by:\n* [Berkeley CS188x](http://ai.berkeley.edu/home.html)\n* [David Silver's Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)\n* [dennybritz/reinforcement-learning](https://github.com/dennybritz/reinforcement-learning)\n* [yandexdataschool/Practical_RL](https://github.com/yandexdataschool/Practical_RL)\n* [yandexdataschool/AgentNet](https://github.com/yandexdataschool/AgentNet)\n\n##### Additional thanks to [JustHeuristic](https://github.com/justheuristic) for Practical_RL course\n\n### Table of Contents\n* [Genetic algorithm](https://github.com/Scitator/rl-course-experiments/tree/master/GEN)\n* [Dynamic Programming](https://github.com/Scitator/rl-course-experiments/tree/master/DP)\n* [Cross Entropy Method](https://github.com/Scitator/rl-course-experiments/tree/master/CEM)\n* [Monte Carlo Control](https://github.com/Scitator/rl-course-experiments/tree/master/MC)\n* [Temporal Difference](https://github.com/Scitator/rl-course-experiments/tree/master/TD)\n* [Deep Q-Networks](https://github.com/Scitator/rl-course-experiments/tree/master/DQN)\n* [Policy Gradient](https://github.com/Scitator/rl-course-experiments/tree/master/PG)\n* [Asynchronous Advantage Actor-Critic](https://github.com/Scitator/rl-course-experiments/tree/master/A3C)\n* [Optimality Tightening](https://arxiv.org/abs/1611.01606) [TODO]\n* [Trust Region Policy Optimization](https://arxiv.org/abs/1502.05477) [TODO]\n* Continuous action space [TODO]\n* Monte Carlo Tree Search [TODO]\n\nFor more information, look at folder readme.\n\n#### Special requirements\n\nFor simple script running you need to install additional [repo](https://github.com/Scitator/rstools) with optimization stuff for neural networks:\n\n`pip install git+https://github.com/Scitator/rstools`\n\n#### Example usage\n\nDQN:\n\n```\nPYTHONPATH=. python DQN/run_dqn.py --plot_history --env CartPole-v0 \\\n--feature_network linear --layers 128-128 --hidden_size 64 \\\n--n_epochs 1000 --n_games 4 --batch_size 128 --t_max 500 --episode_limit 500 \\\n--replay_buffer simple --replay_buffer_size 2000 \\\n--qvalue_lr 0.0001 --feature_lr 0.0001 --value_lr 0.0001 \\\n--initial_epsilon 0.8 --final_epsilon 0.1 \\\n--gpu_option 0.25 \\\n--api_key \u003cpaste_your_gym_api_key_here\u003e\n```\n\nReinforce:\n\n```\nPYTHONPATH=. python PG/run_reinforce.py --plot_history --env CartPole-v0 \\ \n--feature_network linear --layers 128-128 --hidden_size 64 \\ \n--n_epochs 10000 --n_games 1 --batch_size 1 --t_max 500 --episode_limit 500 \\\n--entropy_factor 0.005 --policy_lr 0.0000001 --feature_lr 0.0000001 --grad_clip 10.0 \\ \n --gpu_option 0.25 --time_major \\\n--api_key \u003cpaste_your_gym_api_key_here\u003e\n```\n\nFeed-Forward Asynchronous Advantage Actor-Critic:\n\n```\nPYTHONPATH=. python A3C/run_a3c.py --plot_history --env CartPole-v0 \\\n--feature_network linear --layers 128-128 --hidden_size 64 \\  \n--n_epochs 500 --n_games 1 --batch_size 1 --t_max 100 --episode_limit 500 \\\n--entropy_factor 0.005 --policy_lr 0.00001 --feature_lr 0.00001 --value_lr 0.00001 --grad_clip 10.0 \\\n--gpu_option 0.25 --time_major \\\n--api_key \u003cpaste_your_gym_api_key_here\u003e\n```\n\nIf agent start to play well, you can always stop training by `Ctrl+C` hotkey.\nIf something go wrong, you can always evaluate agent thought magic `--load --n_epochs 0` \ncombination.\n\n##### Metrics\n\n- loss - typical neural network loss\n- reward - typical environment reward, \nbut because Environment Pool is always used not very informative for now\n- steps - mean number of game ends per epoch session\n\n##### If you have linux with NVIDIA GPU and no X server, but want to try gym\n\nYou need to reinstall NVIDIA drivers.\n\n[issue source](https://github.com/openai/gym/issues/366)\n[how-to guide](https://davidsanwald.github.io/2016/11/13/building-tensorflow-with-gpu-support.html)\n\nand add `bash xvfb start; DISPLAY=:1` before run command. \n\n#### Contributing\n\n##### write code\n\nFound a bug or know how to write it simpler? \nOr maybe you want to create your own agent? \nJust follow PEP8 and make merge request.\n\n##### ...or play a game\n\nWe have a lot of RL algorithms, and even more gym environments to test them. \nSo, play a game, save\n* agent parameters (so anyone can reproduce)\n* agent itself (`model.ckpt*`)\n* plots (they will be automatically generated with `--plot_history` flag)\n* gym-link (main results)\n* make merge request (solutions should be at `field/solutions.md`, for example `DQN/solutions.md`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscitator%2Frl-course-experiments","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscitator%2Frl-course-experiments","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscitator%2Frl-course-experiments/lists"}