Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps

Course Project - Advanced Topics in Machine Learning - Autumn Semester 2023 - Indian Institute of Technology Bombay
https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps

deep-recurrent-q-network lstm partially-observable-markov-decision-process pytorch reinforcement-learning resnet-18 transfer-learning

Last synced: 3 days ago
JSON representation

Course Project - Advanced Topics in Machine Learning - Autumn Semester 2023 - Indian Institute of Technology Bombay

Awesome Lists containing this project

README

        

# Deep Recurrent Q-Learning for Partially Observable Markov Decision Processes

## EE782 : Advanced Topics in Machine Learning

### Abstract

This project presents our unique implementation of
**Deep Recurrent Q-Learning (DRQL)** that incorporates Transfer
Learning for feature extraction, a customized LSTM for temporal recurrence, and a domain-informed reward function. This
tailored approach aims to expedite convergence compared to the
vanilla implementation outlined in the [original paper](https://arxiv.org/abs/1507.06527). The performance evaluation focuses on two adaptive [Atari 2600](https://en.wikipedia.org/wiki/Atari_2600) games:
[Assault-v5](https://gymnasium.farama.org/environments/atari/assault/) and [Bowling](https://gymnasium.farama.org/environments/atari/bowling/), where game difficulty scales with player
proficiency. Comparative analysis between the convergence of our
optimized reward function and the vanilla version is conducted
using [StepLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.StepLR.html) and [CosineAnnealingLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.html) learning rate schedulers,
complemented by theoretical explanations. Additionally, an efficient windowed episodic memory implementation employing
bootstrapped sequential updates is proposed to optimize GPU
memory utilization

### Watch our agent play in action

Assault-v5 | Bowling
:-------------------------:|:-------------------------:
|

### Environment Setup

```bash
python3 -m venv mlproj
source mlproj/bin/activate
pip install -r requirements.txt
```

> Link to [Jupyter Notebook](https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps/blob/main/EE782.ipynb)
> Detailed [Report](https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps/blob/main/Report.pdf) with Code, Experimentation and Results

### Collaborators:

- Rohan Kalbag
- Vansh Kapoor
- Sankalp Bhamare