https://github.com/bourbonut/dqn-pacman
Deep Q-Network on the Atari Game Ms-Pacman
https://github.com/bourbonut/dqn-pacman
atari dqn pacman reinforcement-learning
Last synced: about 1 month ago
JSON representation
Deep Q-Network on the Atari Game Ms-Pacman
- Host: GitHub
- URL: https://github.com/bourbonut/dqn-pacman
- Owner: bourbonut
- License: gpl-3.0
- Created: 2022-04-26T18:59:48.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-05-23T09:26:29.000Z (about 2 years ago)
- Last Synced: 2025-06-27T21:38:07.194Z (11 months ago)
- Topics: atari, dqn, pacman, reinforcement-learning
- Language: Python
- Homepage:
- Size: 6.08 MB
- Stars: 8
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep Q-Network on the Atari Game Ms-Pacman
## Basic information
The goal of this project is to apply the **Deep Q-Network** algorithm on Ms-Pacman environment and to reach good performance without using prioritized replay memory or better DQN namely _A3C_ or _Rainbow DQN_.
Following results were obtained with following parameters :
| Parameter | Value |
| ------------------ | ------------------- |
| Batch size | `128` |
| Discount rate | `0.99` |
| Epsilon max | `1.0` |
| Epsilon min | `0.1` |
| Epsilon decay | `1,000,000` |
| Target update | `8,000` |
| Replay memory size | `18,000` |
| Optimizer | SGD with momentum |
| Learning rate | `2.5e-4` |
| Momentum | `0.95` |
| Positive reward | `log(reward, 1000)` |
| Negative reward | `-log(20, 1000)` |
## Performances
### Rewards
Default rewards follow a large range. To standardize rewards, the logarithm function is applied on reward given by the environment (see the function `transform_reward` in `utils/utils.py`).
```python
from math import log
def transform_reward(reward):
return log(reward, 1000) if reward > 0 else reward
```
Also a negative reward is given to the agent when a ghost eats the agent. On the top of the following figure, the average reward is computed on the 20 latest episodes :
```python
import statistics
def mov_avg(self, t):
# t = 20
values = (
[0] * (t - len(self._total)) + self._total
if len(self._total) < t
else self._total[-t:]
)
self._mean.append(statistics.mean(values))
```

### Q-value
The behavior of the agent becomes better when the Q-value improves over the time.

### Results
#### Smart behavior
The agent is able to avoid chasing ghosts
#### High score
The agent eats pills and eats ghosts after having gotten a boost.
## For installation
It is **highly recommended** to install packages in a virtual environment.
### Installation of Atari environment
```sh
pip install ale-py==0.7
wget http://www.atarimania.com/roms/Roms.rar
unrar e Roms.rar
unzip -qq ROMS.zip
ale-import-roms /content/ROMS/ | grep pacman
pip install -U gym
pip install -U gym[atari]
```
### Installation of dependencies
```sh
pip install -r requirements.txt
```
**Note :** If you don't follow the requirements file, `opencv-python` and `matplotlib` could be incompatible depending on the versions of packages. `opencv-python` is only used to write a video in `eval.py`.
## For usage
### Training part
#### Train the agent
In `deep_Q_network` folder, you can find the file `parameters.py` where parameters are set. After checking them, you can run the training with the following command line
```sh
python main.py
```
#### Train and save evolution step by step (a lot of memory)
To save the evolution step by step, simply run:
```sh
python main.py --image
```

#### Dynamic display
This mode is useful when you want to see how the agent reacts and interacts with its environment.
To display the "dashboard", simply run :
```sh
python main.py --stream
```
Then enter the URL `localhost:5000` in your browser.

**Note :**: It is recommended for a long training to not use this mode.
### Evaluation
#### Location of saved data
When you run `main.py`, it will automatically create a folder `results` in where all results will be stored.
#### Usage
By default, the evaluation from `eval.py` is on the most recent **folder** and **episode**.
To specify them :
```sh
python eval.py -e 120 --path ./results/mytrainingfolder
```
You can find different flags to get what you want :
- by default, it saves a plot with Q values, rewards and the last losses of desired episode.
- `--reward`, it saves rewards with a pseudo moving average
- `--qvalue`, it saves Q values with a pseudo moving average
- `--record`, it records the agent interaction
- `-a` or `--all`, it records the agent interaction and save plots
## Structure of the code
```
.
├── deep_Q_network
│ ├── __init__.py
│ ├── buffer.py # buffer class used for websocket and for tracking training performances
│ ├── memory.py # replay memory
│ ├── model.py # dueling DQN and optimization (see the class for more details)
│ ├── parameters.py # all parameters except how rewards are managed
│ └── preprocessing.py # for preprocessing observations
├── docs
│ └── ...
├── evaluation # only use by `eval.py`
│ ├── __init__.py
│ ├── parser.py
│ └── utils.py
├── utils
│ ├── __init__.py
│ ├── actions.py
│ ├── opencv.py
│ ├── parser.py
│ ├── path.py
│ ├── rewards.py
│ └── save_functions.py
├── results
│ └── training-[...]
│ ├── models # folder with pytorch models
│ │ ├── policy-model-[...].pt
│ │ └── target-model-[...].pt
│ ├── plots # folder for `python main.py --image` command
│ │ └── episode-[...].png
│ ├── recorded-data # folder with pickle files
│ │ └── episode-[...].pkl
│ ├── output_video.avi
│ ├── q_values.png
│ ├── result.png
│ └── rewards.png
├── eval.py # to evaluate the agent
├── main.py # to train the agent
├── README.md
└── requirements.txt
```