https://github.com/nonkloq/mazeharvest
A Grid Based RL Environment & Implementaions of few Deep-RL Algorithms.
https://github.com/nonkloq/mazeharvest
dqn-pytorch ppo-pytorch pytorch reinforcement-learning rl-algorithms-pytorch rl-environment
Last synced: 2 months ago
JSON representation
A Grid Based RL Environment & Implementaions of few Deep-RL Algorithms.
- Host: GitHub
- URL: https://github.com/nonkloq/mazeharvest
- Owner: nonkloq
- Created: 2025-01-12T09:15:35.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-01-29T15:23:26.000Z (4 months ago)
- Last Synced: 2025-01-29T16:28:12.790Z (4 months ago)
- Topics: dqn-pytorch, ppo-pytorch, pytorch, reinforcement-learning, rl-algorithms-pytorch, rl-environment
- Language: Python
- Homepage:
- Size: 32.9 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MazeHarvest
MazeHarvest is a partially observable, stochastic reinforcement learning environment. For a detailed description, refer to [homegym/README.md](./homegym/README.md). In summary, the agent must rely on sensory data like ray perception and heuristics to accomplish the following tasks:
- **Kill hostile moles** that hunt the agent.
- **Harvest toxic plants** to reduce environmental toxicity and survive.
- **Navigate complex mazes** in advanced settings.To install `homegym`, clone this repository and follow the instructions in [homegym/README.md](./homegym/README.md).
Agent Details
> RDQN Agent, CNN Policy. No Episode Memory (State In Action Out).
---
## Experiments
I have Implemented few RL algorithms using Pytorch in the [experiments/algos](./experiments/algos) directory to solve this environment. Each folder contains these two file:
- **toy_test.py**: Inherits the base algorithm class to test it on gymnasium environments and logs the test results in `ttresults.md`.
- **ttresults.md**: Logs with:
Hyperparameters used during training.
Descriptions of the algorithm and its implementation.
Test results: performance across different environments.
Evaluation scores for 100 episodes.**Test environments**: The algorithms are tested on Gymnasium toy environments to ensure the implmentation robustness.
### Implemented Algorithms
| Algorithm | Folder | Description |
|---------------|----------------|-----------------------------------------------------------------------------|
| **PPO** | `algos/ppo` | Simple implementation of Proximal Policy Optimization (PPO) with Generalized Advantage Estimation (GAE). |
| **Rainbow DQN** | `algos/rdqn` | Combines C51, N-Step Learning (random M to N steps, instead of fixed N steps), Prioritized Experience Replay (PER), Double Q-Learning, and Dueling Network Architecture. |---
### Utilities
The [experiments/eutils](./experiments/eutils) directory contains utility functions for:
- Processing observations.
- Training & Testing models for this environment.---
### Training Scripts
Each training script follows the naming convention `*_train.py`, where `*` corresponds to the algorithm name. For example, `ppo_train.py` is the script to train the MazeHarvest environment using PPO.