https://github.com/dhyanesh18/flappbird-rl

PPO agent and A2C agents for Flappybird. Includes scripts, training code, and evaluation tools.
https://github.com/dhyanesh18/flappbird-rl

a2c flappybird opencv ppo pygame-learning-environment reinforcement-learning stablebaselines3

Last synced: about 2 months ago
JSON representation

PPO agent and A2C agents for Flappybird. Includes scripts, training code, and evaluation tools.

Host: GitHub
URL: https://github.com/dhyanesh18/flappbird-rl
Owner: Dhyanesh18
Created: 2025-07-13T14:05:16.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-07-15T18:52:15.000Z (12 months ago)
Last Synced: 2025-07-16T17:05:10.287Z (12 months ago)
Topics: a2c, flappybird, opencv, ppo, pygame-learning-environment, reinforcement-learning, stablebaselines3
Language: Python
Homepage:
Size: 16.6 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# FlappyBirdRL

A Reinforcement Learning (RL) agent that learns to play Flappy Bird using **Stable Baselines3** and a custom **OpenAI Gym** environment.

This project demonstrates **deep RL** for an arcade-style game, with PPO/A2C and entropy annealing for improved exploration.

## Test clip

## Training graph (PPO)

---
## My setup and work

- The agents were trained on visual mode of flappy bird rather than numerical values of velocity, position etc.., so that its same as how humans percive the game.
- 4 frames are stacked while passing through the CNN so that the agent understands the temporal information from the environment.
- Entropy coefficient annealing is done so that the model stops exploring and starts exploiting at the later half of the training.
- The config in the code files are what were used to get the best results.
- Sadly, hyperparameter tuning wasn't possible and mostly intuition based tuning was done as the training of an agent for 10M timesteps took 19.2 Hrs.

## Project Highlights

- **Algorithms:** PPO & A2C from Stable Baselines3
- **Custom Gym Env:** Pixel-based Flappy Bird with frame skipping & stacking
- **Entropy Annealing:** Controls exploration dynamically
- **TensorBoard:** Visualize training progress
- **GPU Acceleration:** CUDA enabled

---

## Directory Structure

FlappyBirdRL/
│
├── flappy_gym_env.py # Custom Gym env
├── train_ppo.py # PPO training script
├── train_a2c.py # A2C training script
├── entropy_annealing.py # Custom callback for entropy scheduling
├── ppo_flappybird_tensorboard/ # Logs
├── saved_models/ # Saved weights
├── README.md # This file!

---

## Installation

1. **Clone the repo**
```
git clone https://github.com/your-username/FlappyBirdRL.git
cd FlappyBirdRL

2. **Create a virtual environment (recommended)**
```
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
```

3. **Install dependencies**
```
pip install -r requirements.txt
```

Note: The pygame-learning-environment package is to be installed the following way:
```
git clone https://github.com/ntasfi/PyGame-Learning-Environment.git
cd PyGame-Learning-Environment
pip install -e .
```

## Train the Agent
```
python train_ppo.py
```
Edit train_ppo.py or train_a2c.py to tweak hyperparameters:

n_steps, batch_size, gamma, learning_rate

Entropy annealing: initial vs. final ent_coef

Total timesteps

## Monitor Training
```
tensorboard --logdir ppo_flappybird_tensorboard/
```

Open http://localhost:6006 in your browser to view learning curves, rewards, entropy, loss terms, etc.

## Test the Agent
```
python test_ppo.py
```

## Acknowledgements

Stable Baselines3

OpenAI Gym

Original Flappy Bird graphics by dotGBA

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dhyanesh18/flappbird-rl

Awesome Lists containing this project

README