Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/andreaconti/supermario-dqn

Deep Reinforcement Learning Agent for Super Mario Bros
https://github.com/andreaconti/supermario-dqn

deep-learning python3 reinforcement-learning super-mario-bros-ai

Last synced: 27 days ago
JSON representation

Deep Reinforcement Learning Agent for Super Mario Bros

Host: GitHub
URL: https://github.com/andreaconti/supermario-dqn
Owner: andreaconti
Created: 2019-09-11T13:49:56.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2019-09-28T21:18:10.000Z (about 5 years ago)
Last Synced: 2024-10-23T03:33:20.986Z (28 days ago)
Topics: deep-learning, python3, reinforcement-learning, super-mario-bros-ai
Language: Jupyter Notebook
Homepage:
Size: 75.1 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.rst
- Authors: AUTHORS.rst

Awesome Lists containing this project

README

        supermario-dqn

==============

Deep Reinforcement Learning Agent for Super Mario Bros using OpenAI gym

[gym-super-mario-bros](https://github.com/Kautenja/gym-super-mario-bros) environment

## Usage

~~~shell

# create virtual env in the project folder

$ python3 -m venv .venv

$ source .venv/bin/activate

# install dependencies

$ pip3 install -r requirements

# use

$ supermario_train -h

usage: supermario_train [-h] [--batch_size BATCH_SIZE]

                        [--fit_interval FIT_INTERVAL] [--gamma GAMMA]

                        [--eps_start EPS_START] [--eps_end EPS_END]

                        [--eps_decay EPS_DECAY]

                        [--target_update TARGET_UPDATE]

                        [--save_path SAVE_PATH] [--memory_size MEMORY_SIZE]

                        [--num_episodes NUM_EPISODES] [--resume RESUME]

                        [--checkpoint CHECKPOINT] [--random] [--render]

                        [--world_stage WORLD_STAGE WORLD_STAGE]

                        [--actions ACTIONS] [--test TEST] [--log]

Handle training

optional arguments:

  -h, --help            show this help message and exit

  --batch_size BATCH_SIZE

                        size of each batch used for training

  --fit_interval FIT_INTERVAL

                        fit every `fit_interval` examples available

  --gamma GAMMA         discount rate used for Q-values learning

  --eps_start EPS_START

                        start probability to choose a random action

  --eps_end EPS_END     end probability to choose a random action

  --eps_decay EPS_DECAY

                        decay of eps probabilities

  --target_update TARGET_UPDATE

                        number of episodes between each target dqn update

  --save_path SAVE_PATH

                        where save trained model

  --memory_size MEMORY_SIZE

                        size of replay memory

  --num_episodes NUM_EPISODES

                        number of games to be played before end

  --resume RESUME       load from a checkpoint

  --checkpoint CHECKPOINT

                        number of episodes between each network checkpoint

  --random              choose randomly different worlds and stages

  --render              rendering of frames, only for debug

  --world_stage WORLD_STAGE WORLD_STAGE

                        select specific world and stage

  --actions ACTIONS     select actions used between ["simple"]

  --test TEST           each `test` episodes network is used and tested over

                        an episode

  --log                 logs episodes results

# play

$ supermario_play -h

usage: play a game [-h] [--world_stage WORLD_STAGE WORLD_STAGE] [--skip SKIP]

                   [--processed]

                   model

positional arguments:

  model                 neural network model

optional arguments:

  -h, --help            show this help message and exit

  --world_stage WORLD_STAGE WORLD_STAGE

                        select a specific world and stage, world in [1..8],

                        stage in [1..4]

  --skip SKIP           number of frames to skip

  --processed           shows frames processed for neural network

~~~

### Results

~~~bash

$ supermario_play --skip 5 --world_stage 1 1 trained/train_1_1/model.pt

~~~

| rewards | play gif |

|---------|----------|

|![](trained/train_1_1/rewards_over_steps.png)| ![](trained/train_1_1/play_gif.png)|