Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tashi-2004/deep-learning-grid-world-q-learning
π Deep Learning Grid World Q-Learning π Implement Q-learning in a 5x5 grid where an agent navigates obstacles and rewards. πΊοΈ Train the agent with varying learning rates, visualize its progress, and see Q-values as heatmaps. π Run the script to start training and view results. Contributions are welcome! π
https://github.com/tashi-2004/deep-learning-grid-world-q-learning
agent-based-modeling artificial-intelligence deep-learning deep-q-learning exploitation exploration machine-learning machine-learning-algorithms matplotlib-pyplot numpy python q-learning q-learning-algorithm reinforcement-learning reinforcement-learning-algorithms state-value-function training
Last synced: about 1 month ago
JSON representation
π Deep Learning Grid World Q-Learning π Implement Q-learning in a 5x5 grid where an agent navigates obstacles and rewards. πΊοΈ Train the agent with varying learning rates, visualize its progress, and see Q-values as heatmaps. π Run the script to start training and view results. Contributions are welcome! π
- Host: GitHub
- URL: https://github.com/tashi-2004/deep-learning-grid-world-q-learning
- Owner: tashi-2004
- Created: 2024-08-23T19:33:40.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-28T18:05:46.000Z (4 months ago)
- Last Synced: 2024-08-29T19:20:58.304Z (4 months ago)
- Topics: agent-based-modeling, artificial-intelligence, deep-learning, deep-q-learning, exploitation, exploration, machine-learning, machine-learning-algorithms, matplotlib-pyplot, numpy, python, q-learning, q-learning-algorithm, reinforcement-learning, reinforcement-learning-algorithms, state-value-function, training
- Language: Python
- Homepage:
- Size: 277 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Deep Learning Grid World Q-Learning
## Overview
This repository contains an implementation of a Q-learning algorithm to solve a grid world environment using deep learning techniques. The environment consists of a 5x5 grid with obstacles, rewards, and a goal state. The agent learns to navigate this grid to maximize its cumulative reward using Q-learning.
## Files
- `deep_learning_grid_world_q_learning.py`: Contains the main implementation of the Q-learning algorithm, including:
- `create_grid_world(ax)`: Function to create and visualize the grid world.
- `epsilon_greedy(Q, state, epsilon)`: Function to select an action using the epsilon-greedy policy.
- `step(Q, state, action, alpha, gamma)`: Function to perform a step in the environment and update Q-values.
- `q_learning_agent(alpha_values, num_episodes)`: Function to train the Q-learning agent with different alpha values.
- `visualize_q_values(Q)`: Function to visualize the learned Q-values.## Usage
1. **Run the Deep Learning Q-learning Agent**
Execute the script to train the Q-learning agent with different learning rates (`alpha_values`). The training process includes visualization of the agent's movement in the grid world and updates to the Q-values.
3. **Visualize Q-values**
After training, the Q-values are visualized to show the learned state values.
## Explanation
### Grid World
The grid world consists of a 5x5 grid with:
- **Obstacles**: Cells that are blocked and cannot be traversed.
- **Rewards**: Cells that provide rewards (+5 or +10).
- **Goal**: The cell at (5, 5) where the agent receives a reward of +10 and the episode terminates.### Deep Q-learning Algorithm
- **Epsilon-Greedy Policy**: Balances exploration and exploitation.
- **Learning Rate (Alpha)**: Controls the rate at which the Q-values are updated.
- **Discount Factor (Gamma)**: Determines the importance of future rewards.### Visualization
- **Grid World**: The grid is displayed with obstacles, rewards, and the agent's path.
- **Q-values**: Visualized as a heatmap to show the learned state values.### Screenshots
### Output Video
https://github.com/user-attachments/assets/5a1f35c8-bd06-43cc-97d9-961f69286a54
## Notes
- The script includes an early stopping condition if the average reward exceeds a threshold over a window of episodes.
- The agent's progress is visualized in real-time during training.## Contributing
- Tashfeen Abbasi ([email protected])
- [Laiba Mazhar](https://github.com/laiba-mazhar) ([email protected])Feel free to fork the repository and submit pull requests. For issues or feature requests, please open an issue on GitHub.