https://github.com/jenson073/reinforment-learning_cartpole
This project serves as an introduction to Deep Q-Learning and reinforcement learning concepts. The trained agent learns to balance the cart-pole system through iterative training and evaluation. You can modify the environment or parameters to further experiment with different reinforcement learning strategies.
https://github.com/jenson073/reinforment-learning_cartpole
cart-pole q-learning-algorithm reinforcement-learning
Last synced: 9 months ago
JSON representation
This project serves as an introduction to Deep Q-Learning and reinforcement learning concepts. The trained agent learns to balance the cart-pole system through iterative training and evaluation. You can modify the environment or parameters to further experiment with different reinforcement learning strategies.
- Host: GitHub
- URL: https://github.com/jenson073/reinforment-learning_cartpole
- Owner: Jenson073
- Created: 2024-11-03T15:52:23.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-22T18:45:38.000Z (11 months ago)
- Last Synced: 2025-01-21T09:14:47.932Z (10 months ago)
- Topics: cart-pole, q-learning-algorithm, reinforcement-learning
- Homepage:
- Size: 5.86 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🤖 **Reinforcement Learning with Deep Q-Network (DQN)**
This project demonstrates the implementation of a **Deep Q-Network (DQN)** to solve the classic control problem **CartPole-v1** (or **MountainCar-v0**) using the OpenAI Gym environment. The DQN agent learns optimal policies through experience replay and an epsilon-greedy strategy.
---
## 🌟 **Project Overview**
### **Environment**
- **CartPole-v1**: A pole is attached to a cart, which moves along a frictionless track. The agent balances the pole by applying forces left or right.
- **State Size**: `4` (position, velocity, angle, angular velocity).
- **Action Size**: `2` (left or right).
Alternatively, the code can be used for **MountainCar-v0**: The goal is to drive a car up a steep hill.
---
## 🧠 **Deep Q-Network (DQN)**
- **Model Architecture**:
- Input: State size (4).
- Two hidden layers with 24 neurons each and **ReLU** activation.
- Output: Action size (2) with **linear** activation for Q-values.
- Optimizer: **Adam** with learning rate `0.001`.
- Loss Function: **Mean Squared Error (MSE)**.
- **Training Mechanism**:
- Experience replay: Stores past experiences in a replay memory buffer to break the correlation in sequential data.
- Target Q-Value:
\[
Q_{\text{target}} = \text{reward} + \gamma \cdot \max(Q(\text{next state}))
\]
- Epsilon-greedy exploration: Balances exploration and exploitation with a decaying epsilon.
---
## ⚙️ **Parameters**
| Parameter | Value | Description |
|---------------------|--------------------|------------------------------------------|
| Episodes | 20 | Number of episodes to train. |
| Max Steps | 100 | Maximum steps per episode. |
| Learning Rate | 0.001 | Learning rate for the optimizer. |
| Discount Factor (γ) | 0.95 | Discount factor for future rewards. |
| Epsilon | 1.0 (decays) | Initial exploration rate. |
| Epsilon Decay | 0.995 | Decay rate for epsilon. |
| Epsilon Min | 0.01 | Minimum exploration rate. |
| Batch Size | 64 | Size of minibatch for training. |
| Memory Size | 2000 | Maximum size of replay memory. |
---
## 📊 **Results**
- The agent trains for 20 episodes, accumulating rewards over time.
- Intermediate scores are printed every 10 steps for transparency.
- A plot of scores over episodes visualizes the training progress.
---
## 🔧 **How to Run the Code**
1. **Install Dependencies**:
Ensure you have Python installed along with the required libraries:
```bash
pip install gym tensorflow matplotlib numpy
---