Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/herrfeder/udacity-deep-reinforcement-learning-project-continuous-control

Last synced: 27 days ago
JSON representation

Host: GitHub
URL: https://github.com/herrfeder/udacity-deep-reinforcement-learning-project-continuous-control
Owner: herrfeder
Created: 2023-10-16T09:05:55.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-12-01T06:26:22.000Z (about 1 year ago)
Last Synced: 2024-10-15T19:13:15.502Z (2 months ago)
Language: Jupyter Notebook
Size: 1.89 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Project 2: Continuous Control

| Agent with untrained random actions | Trained Agent |
|--------------------------------------|--------------------------------------|
| ![](images/random_reacher.gif) | ![](images/trained_reacher.gif) |

> Please see the [Report](https://github.com/herrfeder/Udacity-Deep-Reinforcement-Learning-Project-Continuous-Control/blob/main/Report.md) for a detailed description of my approach.

## Introduction

This project includes a training approach for the Unity [Reacher](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#reacher) environment.

In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Therefore it's the goal to **maintain the arm as long as possible in the desired goal location**.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.

## Goal

The goal is to train the robotic arm to stay as long in the goal location until a reward of +30 over 100 consecutive episodes is reached. It will be trained using a Deep Deterministic Policy Gradient (DDPG) algorithm with a single agent.

The training will be done in a headless python environment instead of a jupyter notebook as it will take several hours.

## Included important files

* `run_trained_ddpg_models.ipynb`: Compare visually a random action DDPG-Agent with a trained Agent using Unity environment with visual frontend -> [Linux Environment](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Linux.zip)
* `train_ddpg_headless.py`: Training script to run on a headless server with non visual Unity environment -> [Linux Headless Environment](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Linux_NoVis.zip)
* `ddpg_agent.py`: includes the class for the agent, that initialises the models and acts and learns (with the initialised models) against the given Unity environment
* `model.py`: has the Pytorch-based architectures for the Actor and Critic
* `checkpoint_actor.pth` and `checkpoint_critic.pth`: the checkpoint weights for the models that reached a consecutive reward of +30 over 100 episodes
* `last_scores.txt`: list of numbers which represents the reward development over all training episodes

## How to run

### Training

To train your DDPG Agent, run:

* `python train_ddpg_headless.py`

### See acting agent

To check how the agent is acting in the Reacher environment with visual feedback and check your trained model weights:

* Run the cells in `run_trained_ddpg_models.ipynb` notebook

## Installation

### Unity Environment

1. Download the environment from one of the links below. You need only select the environment that matches your operating system:

- **_Version 1: One (1) Agent_**
- Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Linux.zip)
- Linux (Headless) [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Linux_NoVis.zip)
- Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher.app.zip)
- Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Windows_x86.zip)
- Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P2/Reacher/one_agent/Reacher_Windows_x86_64.zip)

2. Place the included binary file that ends with `x.86_64` (as I'm using 64-bit OS) in a location that is reachable from your training code.

3. This unity environment can only be run with a Python Version `>= 3.6`.

### Python Environment

1. Install a Python 3.6 Environment on your host (the Unity Environment will only work with Python 3.6)
2. Create a virtual environment:`virtualenv -p /usr/bin/python3.6 your_environment` and activate it `source your_environment/bin/activate`
3. (When in activated virtual environment) Install the necessary packages from the requirements in this repo: `pip install -r requirements.txt`