{"id":29550882,"url":"https://github.com/dhyanesh18/flappbird-rl","last_synced_at":"2026-05-06T15:45:27.076Z","repository":{"id":304875621,"uuid":"1018993796","full_name":"Dhyanesh18/flappbird-rl","owner":"Dhyanesh18","description":"PPO agent and A2C agents for Flappybird. Includes scripts, training code, and evaluation tools. ","archived":false,"fork":false,"pushed_at":"2025-07-15T18:52:15.000Z","size":17,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-16T17:05:10.287Z","etag":null,"topics":["a2c","flappybird","opencv","ppo","pygame-learning-environment","reinforcement-learning","stablebaselines3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dhyanesh18.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-13T14:05:16.000Z","updated_at":"2025-07-15T18:55:55.000Z","dependencies_parsed_at":"2025-07-16T23:04:31.419Z","dependency_job_id":null,"html_url":"https://github.com/Dhyanesh18/flappbird-rl","commit_stats":null,"previous_names":["dhyanesh18/flappbird-rl"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Dhyanesh18/flappbird-rl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dhyanesh18%2Fflappbird-rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dhyanesh18%2Fflappbird-rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dhyanesh18%2Fflappbird-rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dhyanesh18%2Fflappbird-rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dhyanesh18","download_url":"https://codeload.github.com/Dhyanesh18/flappbird-rl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dhyanesh18%2Fflappbird-rl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265697906,"owners_count":23813099,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","flappybird","opencv","ppo","pygame-learning-environment","reinforcement-learning","stablebaselines3"],"created_at":"2025-07-18T04:01:19.187Z","updated_at":"2026-05-06T15:45:27.070Z","avatar_url":"https://github.com/Dhyanesh18.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FlappyBirdRL\n\nA Reinforcement Learning (RL) agent that learns to play Flappy Bird using **Stable Baselines3** and a custom **OpenAI Gym** environment.\n\nThis project demonstrates **deep RL** for an arcade-style game, with PPO/A2C and entropy annealing for improved exploration.\n\n## Test clip\n\u003cimg width=\"300\" height=\"550\" alt=\"image\" src=\"https://github.com/user-attachments/assets/e04a5912-2c25-4208-9413-15cb2a16870b\" /\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\u003cimg width=\"300\" height=\"550\" alt=\"image\" src=\"https://github.com/user-attachments/assets/ecbbc0bb-6fc8-4d72-b5a2-f7682990d42c\" /\u003e\n\n\n## Training graph (PPO)\n\u003cimg width=\"600\" height=\"400\" alt=\"image\" src=\"https://github.com/user-attachments/assets/c4bea6b6-2c95-46a0-b44e-0e85ec249aa4\" /\u003e\n\n\n---\n## My setup and work\n\n- The agents were trained on visual mode of flappy bird rather than numerical values of velocity, position etc.., so that its same as how humans percive the game.  \n- 4 frames are stacked while passing through the CNN so that the agent understands the temporal information from the environment.  \n- Entropy coefficient annealing is done so that the model stops exploring and starts exploiting at the later half of the training.  \n- The config in the code files are what were used to get the best results.  \n- Sadly, hyperparameter tuning wasn't possible and mostly intuition based tuning was done as the training of an agent for 10M timesteps took 19.2 Hrs.\n\n## Project Highlights\n\n- **Algorithms:** PPO \u0026 A2C from Stable Baselines3\n- **Custom Gym Env:** Pixel-based Flappy Bird with frame skipping \u0026 stacking\n- **Entropy Annealing:** Controls exploration dynamically\n- **TensorBoard:** Visualize training progress\n- **GPU Acceleration:** CUDA enabled\n\n---\n\n## Directory Structure\n\nFlappyBirdRL/  \n│  \n├── flappy_gym_env.py # Custom Gym env  \n├── train_ppo.py # PPO training script  \n├── train_a2c.py # A2C training script  \n├── entropy_annealing.py # Custom callback for entropy scheduling  \n├── ppo_flappybird_tensorboard/ # Logs  \n├── saved_models/ # Saved weights  \n├── README.md # This file!  \n\n\n\n---\n\n## Installation\n\n1. **Clone the repo**\n   ```\n   git clone https://github.com/your-username/FlappyBirdRL.git\n   cd FlappyBirdRL\n\n2. **Create a virtual environment (recommended)**\n  ```\n  python -m venv venv\n  source venv/bin/activate  # Linux/macOS\n  venv\\Scripts\\activate     # Windows\n  ```\n \n3. **Install dependencies**\n  ```\n  pip install -r requirements.txt\n  ```\n\nNote: The pygame-learning-environment package is to be installed the following way:\n  ```\n  git clone https://github.com/ntasfi/PyGame-Learning-Environment.git\n  cd PyGame-Learning-Environment\n  pip install -e .\n  ```\n\n\n## Train the Agent\n  ```\n  python train_ppo.py\n  ```\nEdit train_ppo.py or train_a2c.py to tweak hyperparameters:\n\n    n_steps, batch_size, gamma, learning_rate\n\n    Entropy annealing: initial vs. final ent_coef\n\n    Total timesteps\n\n## Monitor Training\n  ```\n  tensorboard --logdir ppo_flappybird_tensorboard/\n  ```\n\nOpen http://localhost:6006 in your browser to view learning curves, rewards, entropy, loss terms, etc.\n\n## Test the Agent\n  ```\n  python test_ppo.py\n  ```\n\n## Acknowledgements\n\n    Stable Baselines3\n\n    OpenAI Gym\n\n    Original Flappy Bird graphics by dotGBA\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhyanesh18%2Fflappbird-rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdhyanesh18%2Fflappbird-rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhyanesh18%2Fflappbird-rl/lists"}