https://github.com/jenson073/reinforment-learning_cartpole

This project serves as an introduction to Deep Q-Learning and reinforcement learning concepts. The trained agent learns to balance the cart-pole system through iterative training and evaluation. You can modify the environment or parameters to further experiment with different reinforcement learning strategies.
https://github.com/jenson073/reinforment-learning_cartpole

cart-pole q-learning-algorithm reinforcement-learning

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/jenson073/reinforment-learning_cartpole
Owner: Jenson073
Created: 2024-11-03T15:52:23.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-12-22T18:45:38.000Z (11 months ago)
Last Synced: 2025-01-21T09:14:47.932Z (10 months ago)
Topics: cart-pole, q-learning-algorithm, reinforcement-learning
Homepage:
Size: 5.86 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # 🤖 **Reinforcement Learning with Deep Q-Network (DQN)**  

This project demonstrates the implementation of a **Deep Q-Network (DQN)** to solve the classic control problem **CartPole-v1** (or **MountainCar-v0**) using the OpenAI Gym environment. The DQN agent learns optimal policies through experience replay and an epsilon-greedy strategy.  

---

## 🌟 **Project Overview**  

### **Environment**  

- **CartPole-v1**: A pole is attached to a cart, which moves along a frictionless track. The agent balances the pole by applying forces left or right.  

- **State Size**: `4` (position, velocity, angle, angular velocity).  

- **Action Size**: `2` (left or right).  

Alternatively, the code can be used for **MountainCar-v0**: The goal is to drive a car up a steep hill.  

---

## 🧠 **Deep Q-Network (DQN)**  

- **Model Architecture**:  

  - Input: State size (4).  

  - Two hidden layers with 24 neurons each and **ReLU** activation.  

  - Output: Action size (2) with **linear** activation for Q-values.  

  - Optimizer: **Adam** with learning rate `0.001`.  

  - Loss Function: **Mean Squared Error (MSE)**.  

- **Training Mechanism**:  

  - Experience replay: Stores past experiences in a replay memory buffer to break the correlation in sequential data.  

  - Target Q-Value:  

    \[

    Q_{\text{target}} = \text{reward} + \gamma \cdot \max(Q(\text{next state}))

    \]

  - Epsilon-greedy exploration: Balances exploration and exploitation with a decaying epsilon.  

---

## ⚙️ **Parameters**  

| Parameter           | Value              | Description                              |

|---------------------|--------------------|------------------------------------------|

| Episodes            | 20                | Number of episodes to train.            |

| Max Steps           | 100               | Maximum steps per episode.              |

| Learning Rate       | 0.001             | Learning rate for the optimizer.        |

| Discount Factor (γ) | 0.95              | Discount factor for future rewards.     |

| Epsilon             | 1.0 (decays)      | Initial exploration rate.               |

| Epsilon Decay       | 0.995             | Decay rate for epsilon.                 |

| Epsilon Min         | 0.01              | Minimum exploration rate.               |

| Batch Size          | 64                | Size of minibatch for training.         |

| Memory Size         | 2000              | Maximum size of replay memory.          |

---

## 📊 **Results**  

- The agent trains for 20 episodes, accumulating rewards over time.  

- Intermediate scores are printed every 10 steps for transparency.  

- A plot of scores over episodes visualizes the training progress.

---

## 🔧 **How to Run the Code**  

1. **Install Dependencies**:  

   Ensure you have Python installed along with the required libraries:  

   ```bash

   pip install gym tensorflow matplotlib numpy

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jenson073/reinforment-learning_cartpole

Awesome Lists containing this project

README