Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps
Course Project - Advanced Topics in Machine Learning - Autumn Semester 2023 - Indian Institute of Technology Bombay
https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps
deep-recurrent-q-network lstm partially-observable-markov-decision-process pytorch reinforcement-learning resnet-18 transfer-learning
Last synced: 3 days ago
JSON representation
Course Project - Advanced Topics in Machine Learning - Autumn Semester 2023 - Indian Institute of Technology Bombay
- Host: GitHub
- URL: https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps
- Owner: rohankalbag
- Created: 2023-11-01T13:13:43.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-05T23:09:30.000Z (12 months ago)
- Last Synced: 2024-01-28T06:13:41.221Z (10 months ago)
- Topics: deep-recurrent-q-network, lstm, partially-observable-markov-decision-process, pytorch, reinforcement-learning, resnet-18, transfer-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 9.93 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Deep Recurrent Q-Learning for Partially Observable Markov Decision Processes
## EE782 : Advanced Topics in Machine Learning
### Abstract
This project presents our unique implementation of
**Deep Recurrent Q-Learning (DRQL)** that incorporates Transfer
Learning for feature extraction, a customized LSTM for temporal recurrence, and a domain-informed reward function. This
tailored approach aims to expedite convergence compared to the
vanilla implementation outlined in the [original paper](https://arxiv.org/abs/1507.06527). The performance evaluation focuses on two adaptive [Atari 2600](https://en.wikipedia.org/wiki/Atari_2600) games:
[Assault-v5](https://gymnasium.farama.org/environments/atari/assault/) and [Bowling](https://gymnasium.farama.org/environments/atari/bowling/), where game difficulty scales with player
proficiency. Comparative analysis between the convergence of our
optimized reward function and the vanilla version is conducted
using [StepLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.StepLR.html) and [CosineAnnealingLR](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.html) learning rate schedulers,
complemented by theoretical explanations. Additionally, an efficient windowed episodic memory implementation employing
bootstrapped sequential updates is proposed to optimize GPU
memory utilization### Watch our agent play in action
Assault-v5 | Bowling
:-------------------------:|:-------------------------:
|### Environment Setup
```bash
python3 -m venv mlproj
source mlproj/bin/activate
pip install -r requirements.txt
```> Link to [Jupyter Notebook](https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps/blob/main/EE782.ipynb)
> Detailed [Report](https://github.com/rohankalbag/deep-recurrent-q-learning-for-pomdps/blob/main/Report.pdf) with Code, Experimentation and Results### Collaborators:
- Rohan Kalbag
- Vansh Kapoor
- Sankalp Bhamare