{"id":18892987,"url":"https://github.com/dongjunlee/dqn-tensorflow","last_synced_at":"2025-08-01T00:35:14.330Z","repository":{"id":236588773,"uuid":"95859755","full_name":"DongjunLee/dqn-tensorflow","owner":"DongjunLee","description":"Deep Q Network implements by Tensorflow","archived":false,"fork":false,"pushed_at":"2018-03-09T02:19:40.000Z","size":3468,"stargazers_count":25,"open_issues_count":0,"forks_count":10,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-31T17:32:28.060Z","etag":null,"topics":["deep","dqn","hb-experiment","reinforcement-learning","tensorflow"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DongjunLee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-30T07:07:25.000Z","updated_at":"2025-03-21T16:06:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"2cf6cdd5-adc3-49f9-b205-0531a07e4ad1","html_url":"https://github.com/DongjunLee/dqn-tensorflow","commit_stats":null,"previous_names":["dongjunlee/dqn-tensorflow"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DongjunLee/dqn-tensorflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Fdqn-tensorflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Fdqn-tensorflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Fdqn-tensorflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Fdqn-tensorflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DongjunLee","download_url":"https://codeload.github.com/DongjunLee/dqn-tensorflow/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Fdqn-tensorflow/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266053823,"owners_count":23869496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep","dqn","hb-experiment","reinforcement-learning","tensorflow"],"created_at":"2024-11-08T08:06:57.118Z","updated_at":"2025-07-20T01:31:34.774Z","avatar_url":"https://github.com/DongjunLee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep Q Network\n## Paper\n- [playing atari with deep reinforcement learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) (NIPS 2013)\n- [Human-Level Control through Deep Reinforcement Learning](https://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) (NIPS 2015)\n\n\n## TO DO\n\n- Test: Atari\n\t- more complex ConvNet model\n- use TensorBoard\n\t- average loss\n\t- average q\n\t- average reward (consecutive 100 episode)\n\t- episode reward\n\n## Config\n\n```bash\npython main.py -h\n\n  --discount_rate DISCOUNT_RATE\n                        Initial discount rate.\n  --replay_memory_length REPLAY_MEMORY_LENGTH\n                        Number of replay memory episode.\n  --target_update_count TARGET_UPDATE_COUNT\n                        DQN Target Network update count.\n  --max_episode_count MAX_EPISODE_COUNT\n                        Number of maximum episodes.\n  --batch_size BATCH_SIZE\n                        Batch size. (Must divide evenly into the dataset\n                        sizes)\n  --frame_size FRAME_SIZE\n                        Frame size. (Stack env's observation T-n ~ T)\n  --model_name MODEL_NAME\n                        DeepLearning Network Model name (MLPv1, ConvNetv1)\n  --learning_rate LEARNING_RATE\n                        Batch size. (Must divide evenly into the dataset\n                        sizes)\n  --gym_result_dir GYM_RESULT_DIR\n                        Directory to put the gym results.\n  --gym_env GYM_ENV     Name of Open Gym's enviroment name. (CartPole-v0,\n                        CartPole-v1, MountainCar-v0)\n  --step_verbose [STEP_VERBOSE]\n                        verbose every step count\n  --step_verbose_count STEP_VERBOSE_COUNT\n                        verbose step count\n```\n\n## Model\n\n### 1. MLPv1\n\n- hidden layer (16, 64, 32)\n- AdamOptimizer\n\n### 2. ConvNetv1\n\n- 3 Conv + MaxPool Layers (kernel_size [3, 3, 3], filters [32, 64, 128])\n- 2 Fully Connected Layers (hidden_size [128, 32])\n- AdamOptimizer\n\n### 3. ConvNetv2\n\n- 5 Conv + MaxPool Layers (kernel_size [7, 5, 3, 3, 3], filters [126, 256, 512, 512, 512]\n- 2 Fully Connected Layers (hidden_size [1024, 256])\n- AdamOptimizer\n\n## Expertiments\n\n### Classic control\n\n| CartPole-v0 | CartPole-v1 | MountainCar-v0 |\n| ------- | ----------- | ------------ |\n| defines \"solving\" as getting average reward of **195.0** over 100 consecutive trials. | defines \"solving\" as getting average reward of **475.0** over 100 consecutive trials. | defines \"solving\" as getting average reward of **-110.0** over 100 consecutive trials. |\n| **Model** : MLPv1 | **Model** : MLPv1 | **Model** : MLPv1 |\n| **Clear** : after 177 episode | **Clear** : after 791 episode | **Clear** : after 1182 episode |  \n| ![images](images/CartPole-v0.gif) | ![images](images/CartPole-v1.gif) | ![images](images/MountainCar-v0.gif) |\n\n### Atari\n\n| Assault-ram-v0 |  \n| ------- | \n| Maximize your score |  \n| **Model** : ConvNetv2 | \n| **Score** : 421.12 (average from 100 consecutive trials) |\n| ![images](images/assault-2000.gif) |\n| 2000 Episode (Learn something.. but still stupid) |\n\n| Breakout-ram-v0 |  |  |\n| ------- | ----------- | ------------ |\n| Maximize your score |  |  |\n| **Model** : ConvNetv1 |  |  |\n| **Score** : 9.69 (average from 100 consecutive trials) |  |  |  \n\n\n\n## Reference\n\n- Base code : [humkim/ReinforcementZeroToAll](https://github.com/hunkim/ReinforcementZeroToAll/blob/master/07_3_dqn_2015_cartpole.py)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdongjunlee%2Fdqn-tensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdongjunlee%2Fdqn-tensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdongjunlee%2Fdqn-tensorflow/lists"}