{"id":32973775,"url":"https://github.com/Kaixhin/Atari","last_synced_at":"2025-11-16T06:01:19.695Z","repository":{"id":70306929,"uuid":"48294298","full_name":"Kaixhin/Atari","owner":"Kaixhin","description":"Persistent advantage learning dueling double DQN for the Arcade Learning Environment","archived":false,"fork":false,"pushed_at":"2018-02-08T14:43:28.000Z","size":626,"stargazers_count":263,"open_issues_count":15,"forks_count":74,"subscribers_count":29,"default_branch":"master","last_synced_at":"2024-07-04T18:34:00.887Z","etag":null,"topics":["deep-learning","deep-reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Kaixhin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-12-19T19:20:51.000Z","updated_at":"2024-04-25T11:23:44.000Z","dependencies_parsed_at":"2023-02-25T21:45:26.095Z","dependency_job_id":null,"html_url":"https://github.com/Kaixhin/Atari","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Kaixhin/Atari","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaixhin%2FAtari","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaixhin%2FAtari/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaixhin%2FAtari/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaixhin%2FAtari/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Kaixhin","download_url":"https://codeload.github.com/Kaixhin/Atari/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaixhin%2FAtari/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284667476,"owners_count":27043890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-16T02:00:05.974Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-reinforcement-learning"],"created_at":"2025-11-13T06:00:20.535Z","updated_at":"2025-11-16T06:01:19.687Z","avatar_url":"https://github.com/Kaixhin.png","language":"Lua","funding_links":[],"categories":["Codes"],"sub_categories":[],"readme":"# Atari ![Space Invader](http://www.rw-designer.com/cursor-view/74522.png)\n[![Build Status](https://img.shields.io/travis/Kaixhin/Atari.svg)](https://travis-ci.org/Kaixhin/Atari)\n[![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)\n[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/Kaixhin/Atari?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\n**Work In Progress:** Crossed out items have been partially implemented.\n\n~~Prioritised experience replay~~ [[1]](#references) persistent advantage learning [[2]](#references) ~~bootstrapped~~ [[3]](#references) dueling [[4]](#references) double [[5]](#references) deep ~~recurrent~~ [[6]](#references) Q-network [[7]](#references) for the Arcade Learning Environment [[8]](#references) (and [custom environments](#custom)). Or PERPALB(triple-D)RQN for short...\n\nAdditional asynchronous agents [[9]](#references):\n\n- One-step Sarsa\n- One-step Q-learning\n- N-step Q-learning\n- Advantage actor-critic\n\nRun `th main.lua` to run headless, or `qlua main.lua` to display the game. The main options are `-game` to choose the ROM (see the [ROM directory](roms/README.md) for more details) and `-mode` as either `train` or `eval`. Can visualise saliency maps [[10]](#references), optionally using guided [[11]](#references) or \"deconvnet\" [[12]](#references) backpropagation. Saliency map modes are applied at runtime so that they can be applied retrospectively to saved models.\n\nTo run experiments based on hyperparameters specified in the individual papers, use `./run.sh \u003cpaper\u003e \u003cgame\u003e \u003cargs\u003e`. `\u003cargs\u003e` can be used to overwrite arguments specified earlier (in the script); for more details see the script itself. By default the code trains on a demo environment called Catch - use `./run.sh demo` to run the demo with good default parameters. Note that this code uses CUDA if available, but the Catch network is small enough that it runs faster on CPU. If cuDNN is available, it can be enabled using `-cudnn true`; note that by default cuDNN is nondeterministic, and its deterministic modes are slower than cutorch.\n\nIn training mode if you want to quit using `Ctrl+C` then this will be caught and you will be asked if you would like to save the agent. Note that for non-asynchronous agents the experience replay memory will be included, totalling ~7GB. The main script also automatically saves the last weights (`last.weights.t7`) and the weights of the best performing DQN (according to the average validation score) (`best.weights.t7`).\n\nIn evaluation mode you can create recordings with `-record true` (requires FFmpeg); this does not require using `qlua`. Recordings will be stored in the videos directory.\n\n## Requirements\n\nRequires [Torch7](http://torch.ch/), and can use CUDA and cuDNN if available. Also requires the following extra luarocks packages:\n\n- luaposix 33.4.0\n- luasocket\n- moses\n- logroll\n- classic\n- torchx\n- rnn\n- dpnn\n- nninit\n- tds\n- **xitari**\n- **alewrap**\n- **rlenvs**\n\nxitari, alewrap and rlenvs can be installed using the following commands:\n\n```sh\nluarocks install https://raw.githubusercontent.com/lake4790k/xitari/master/xitari-0-0.rockspec\nluarocks install https://raw.githubusercontent.com/Kaixhin/alewrap/master/alewrap-0-0.rockspec\nluarocks install https://raw.githubusercontent.com/Kaixhin/rlenvs/master/rocks/rlenvs-scm-1.rockspec\n```\n\n## Custom\n\nYou can use a custom environment (as the path to a Lua file/`rlenvs`-namespaced environment) using `-env`, as long as the class returned respects the `rlenvs` [API](https://github.com/Kaixhin/rlenvs#api). One restriction is that the state must be represented as a single tensor (with arbitrary dimensionality), and only a single discrete action must be returned. To prevent massive memory consumption for agents that use experience replay memory, states are discretised to integers ∈ [0, 255], assuming the state is comprised of reals ∈ [0, 1] - this can be disabled with `-discretiseMem false`. Visual environments can make use of explicit `-height`, `-width` and `-colorSpace` options to perform preprocessing for the network.\n\nIf the environment has separate behaviour during training and testing it should also implement `training` and `evaluate` methods - otherwise these will be added as empty methods during runtime. The environment can also implement a `getDisplay` method (with a mandatory `getDisplaySpec` method for determining screen size) which will be used for displaying the screen/computing saliency maps, where `getDisplay` must return a RGB (3D) tensor; this can also be utilised even if the state is not an image (although saliency can only be computed for states that are images). This **must** be implemented to have a visual display/computing saliency maps. The `-zoom` factor can be used to increase the size of small displays.\n\nEnvironments are meant to be ephemeral, as an instance is created in order to first extract environment details (e.g. state representation), which will later be automatically garbage collected (not under the control of this code).\n\nYou can also use a custom model (body) with `-modelBody`, which replaces the usual DQN convolutional layers with a custom Torch neural network (as the path to a Lua file/`models`-namespaced environment). The class must include a `createBody` method which returns the custom neural network. The model will receive a stack of the previous states (as determined by `-histLen`), and must reshape them manually if needed. The DQN \"heads\" will then be constructed as normal, with `-hiddenSize` used to change the size of the fully connected layer if needed.\n\nFor an example on a GridWorld environment, run `./run.sh demo-grid` - the demo also works with `qlua` and experience replay agents. The custom environment and network can be found in the [examples](https://github.com/Kaixhin/Atari/tree/master/examples) folder.\n\n## Results\n\nSingle run results from various papers can be seen below. DQN-based agents use [ε = 0.001](https://github.com/Kaixhin/Atari/blob/master/Agent.lua#L162) for evaluation [[4, 5]](#references). \n\n### DQN (Space Invaders) [[7]](#references)\n\n![DQN](figures/dqn_space_invaders.png)\n\n### Double DQN (Space Invaders) [[5]](#references)\n\n![DDQN](figures/doubleq_space_invaders.png)\n\n### Dueling DQN (Space Invaders) [[4]](#references)\n\n![DuelingDQN](figures/dueling_space_invaders.png)\n\n### Persistent Advantage Learning DQN (Asterix) [[2]](#references)\n\n![PALDQN](figures/pal_asterix.png)\n\n### A3C (Beam Rider) [[9]](#references)\n\n![A3C](figures/a3c_beam_rider.png)\n\n## Acknowledgements\n\n- [@GeorgOstrovski](https://github.com/GeorgOstrovski) for confirmation on network usage in advantage operators + note on interaction with Double DQN.\n- [@schaul](https://github.com/schaul) for clarifications on prioritised experience replay + dueling DQN hyperparameters.\n\n## Citation\n\nIf you find this library useful and would like to cite it, the following would be appropriate:\n\n```\n@misc{Atari,\n  author = {Arulkumaran, Kai and Keri, Laszlo},\n  title = {Kaixhin/Atari},\n  url = {https://github.com/Kaixhin/Atari},\n  year = {2015}\n}\n```\n\n## References\n\n[1] [Prioritized Experience Replay](http://arxiv.org/abs/1511.05952)  \n[2] [Increasing the Action Gap: New Operators for Reinforcement Learning](http://arxiv.org/abs/1512.04860)  \n[3] [Deep Exploration via Bootstrapped DQN](http://arxiv.org/abs/1602.04621)  \n[4] [Dueling Network Architectures for Deep Reinforcement Learning](http://arxiv.org/abs/1511.06581)  \n[5] [Deep Reinforcement Learning with Double Q-learning](http://arxiv.org/abs/1509.06461)  \n[6] [Deep Recurrent Q-Learning for Partially Observable MDPs](http://arxiv.org/abs/1507.06527)  \n[7] [Playing Atari with Deep Reinforcement Learning](http://arxiv.org/abs/1312.5602)  \n[8] [The Arcade Learning Environment: An Evaluation Platform for General Agents](http://arxiv.org/abs/1207.4708)  \n[9] [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783)  \n[10] [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](http://arxiv.org/abs/1312.6034)  \n[11] [Striving for Simplicity: The All Convolutional Net](http://arxiv.org/abs/1412.6806)  \n[12] [Visualizing and Understanding Convolutional Networks](http://arxiv.org/abs/1311.2901)  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKaixhin%2FAtari","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FKaixhin%2FAtari","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FKaixhin%2FAtari/lists"}