https://github.com/topagrume/atari-rainbow-dqn

Reinforcement learning project for Atari Breakout
https://github.com/topagrume/atari-rainbow-dqn

breakout-game rainbow-dqn reinforcement-learning-agent

Last synced: 6 months ago
JSON representation

Reinforcement learning project for Atari Breakout

Host: GitHub
URL: https://github.com/topagrume/atari-rainbow-dqn
Owner: TopAgrume
Created: 2024-10-21T11:08:51.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-03-29T23:28:01.000Z (7 months ago)
Last Synced: 2025-03-30T00:23:20.901Z (7 months ago)
Topics: breakout-game, rainbow-dqn, reinforcement-learning-agent
Language: Python
Homepage:
Size: 78 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Classic / Rainbow DQN implementation for Atari Breakout

This repository contains a PyTorch implementation of Rainbow DQN for Atari games, with a focus on Breakout. The implementation combines several key improvements to the original DQN algorithm proposed by the Rainbow DQN paper to achieve a better performance.

https://github.com/user-attachments/assets/b8180e58-7f20-4650-8148-2060d174038c

## Performance results

After training for 5M steps on an RTX 3070Ti GPU (and 16Go RAM), my implementation achieves the following clipped reward results on Breakout:

| Agent | Mean Score | Std Dev | Min | Max |

|-------|------------|---------|-----|-----|

| Random Agent | 1.364 | 1.394 | 0.0 | 7.0 |

| Human Baseline | 31.8 | - | - | - |

| Classic DQN | 14.211 | 5.852 | 0.0 | 28.0 |

| Rainbow DQN | 70.242 | 32.931 | 8.0 | 105.0 |

The maximal possible reward is 108 (game with 6*18=108 bricks).

### Training progress (Classic DQN ~10 hours)



  

  


  Classic DQN learning curve (each epoch represent 50,000 batches)



### Training progress (Rainbow DQN ~14 hours)



  

  


  Rainbow DQN learning curve (each epoch represent 50,000 batches)



## Key features

### 1. Environment preprocessing (AtariPreprocessing)

- Frame resizing to 84×84

- Grayscale conversion

- Frame stacking (4 frames)

- Action repeat (4 frames)

- Reward clipping between -1 and 1

### 2. Rainbow DQN components

- **Double Q-Learning**: Reduces overestimation of Q-values

- **Dueling network**: Separate streams for state value and action advantages

- **Noisy networks**: Parameter space noise for exploration

- **Prioritized experience replay**: Prioritizes important transitions

### 3. Classic network architecture

```

Conv2D(4→16, kernel=4, stride=2)

Conv2D(16→32, kernel=4, stride=2)

    ↓

Linear layers

```

See: classic_dqn/dqn.py

### 4. Rainbow network architecture

```

Conv2D(4→32, kernel=8, stride=4)

Conv2D(32→64, kernel=4, stride=2)

Conv2D(64→64, kernel=3, stride=1)

    ↓

Split into Value/Advantage Streams

    ↓

NoisyLinear layers for exploration

```

See: rainbow_dqn/DuelingDQN_model.py

## Usage

### Training

```bash

cd rainbow_dqn

python main.py train [options]

# Training options:

--learning-rate             Learning rate (default: 0.0000625)

--gamma                     Discount factor (default: 0.99)

--batch-size                Batch size (default: 32)

--memory-size               Replay buffer size (default: 100000)

--episodes                  Number of episodes (default: 500000)

--replay-start-size         Replay size before training start (default: 80000)

--target-update-frequency   Target network update frequency (default: 5000)

--continue-training         Continue from saved model (default: False)

```

### Evaluation

```bash

cd rainbow_dqn

python main.py play --model-path=

```

Here, you can directly try this:

```bash

cd rainbow_dqn

python main.py play --model-path=saved_models/breakout_5M_steps_rainbow_dqn

```

# Hyperparameters for Atari Breakout

| **Hyperparameter**                      | **Classic DQN**  | **Rainbow DQN**  |

| --------------------------------------- | ---------------- | ---------------- |

| **Learning rate**                       | 0.0001           | 0.0000625        |

| **Discount factor (γ)**                 | 0.99             | 0.99             |

| **Replay memory size**                  | 100,000          | 100,000          |

| **Batch size**                          | 32               | 32               |

| **Target update frequency**             | 5,000            | 5,000            |

| **Frame skip**                          | 4                | 4                |

| **Min epsilon**                         | 0.1              | N/A              |

| **Max epsilon**                         | 1.0              | N/A              |

| **Epsilon decay steps**                 | 4M (steps)       | N/A              |

| **Max steps**                           | 4,5M             | 8M               |

| **Replay start size**                   | 32               | 80,000           |

| **Save frequency**                      | 50,000           | 50,000           |

| **Noisy nets std init**                | N/A              | 0.5              |

| **PER alpha (α)**                      | N/A              | 0.6              |

| **PER beta start (β)**                 | N/A              | 0.4              |

| **Reward clipping**                    | [-1, 1]          | [-1, 1]          |

| **Input frame stack**                  | 4                | 4                |           |

## References for the classic DQN model / agent

1. [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602)

## References for the rainbow DQN model / agent

1. [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/pdf/1509.06461)

2. [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/pdf/1511.06581)

3. [Noisy Networks for Exploration](https://arxiv.org/pdf/1706.10295)

4. [Prioritized Experience Replay](https://arxiv.org/pdf/1511.05952)

5. [Rainbow: Combining Improvements in Deep Reinforcement Learning](https://arxiv.org/pdf/1710.02298)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/topagrume/atari-rainbow-dqn

Awesome Lists containing this project

README