https://github.com/mathiasgruber/bananaagent-pytorch
Solution to Project 1 of Udacity Deep Reinforcement Learning Nanodegree
https://github.com/mathiasgruber/bananaagent-pytorch
Last synced: 3 months ago
JSON representation
Solution to Project 1 of Udacity Deep Reinforcement Learning Nanodegree
- Host: GitHub
- URL: https://github.com/mathiasgruber/bananaagent-pytorch
- Owner: MathiasGruber
- Created: 2019-02-09T05:07:50.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T01:41:12.000Z (over 2 years ago)
- Last Synced: 2025-04-02T04:01:40.864Z (3 months ago)
- Language: Python
- Size: 28.3 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Double-Duel-Deep-Q-Network in PyTorch
This model is developed as a solution to Project 1 of Udacity Deep Reinforcement Learning Nanodegree.
Image from official repo# Installation
Install the package requirements for this repository
```
pip install -r requirements.txt
```## Banana Environment
The agent was developed specifically to solve a banana collection environment developed in Unity, which can be downloaded from the following locations. The objective in the banana environment is for an agent to navigate and collect yellow bananas (+1 reward) while avoiding blue bananas (-1 reward). Download your specific environment and unpack it into the `./env_unity/` folder in this repo:Environment with discrete state space (37 dimensions):
* Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip)
* Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip)
* Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip)
* Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip)Environment with pixel state space.
* Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana_Linux.zip)
* Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana.app.zip)
* Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana_Windows_x86.zip)
* Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana_Windows_x86_64.zip)In both versions of the environment, the agent has an action space with four discrete actions;
* `0`: forward
* `1`: backwards
* `2`: left
* `3`: rightThe environment is considered solved when the agent collect an average score of 13 bananas over 100 consecutive episodes.
# Repository Structure
* `libs/agents.py`: A DQN agent, which by default is configured to be a double dueling DQN.
* `libs/models.py`: PyTorch models used by the DQN agent
* `libs/memory.py`: Prioritized experience replay, using sum-tree as defined in `libs/sumtree.py`
* `libs/monitor.py`: Functionality for training/testing the agent and interacting with the environment
* `main.py`: Main command-line interface for training & testing the agent# Training the Agent
For training the agent on the discrete state space, the model can be run by using one of the following (only tested on windows!):
```
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DQN
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DuelDQN
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DQN --double
python main.py --environment env_unity/DiscreteBanana/Banana.exe --model_name DuelDQN --double
```For training agent on the pixel state space, the following can be used (only tested on windows!)
```
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DuelDQN
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN --double
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DuelDQN --double
```# Testing the agent
Once the agent has been trained, it can be run as follows:
```
python main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN --test --checkpoint logs/weights_env_unity_VisualBanana_DQN_single.pth
```# Profiling
When trying to optimize training speed, I've used the following to profile the code:
```
python -m cProfile -o profile.txt -s tottime main.py --environment env_unity/VisualBanana/Banana.exe --model_name DQN --double
```