{"id":22229727,"url":"https://github.com/tashi-2004/deep-learning-grid-world-q-learning","last_synced_at":"2026-04-17T10:31:51.468Z","repository":{"id":254594018,"uuid":"846708909","full_name":"tashi-2004/Deep-Learning-Grid-World-Q-Learning","owner":"tashi-2004","description":"Deep Learning Grid World Q-Learning . Implement Q-learning in a 5x5 grid where an agent navigates obstacles and rewards. Train the agent with varying learning rates, visualize its progress, and see Q-values as heatmaps. Run the script to start training and view results. Contributions are welcome!","archived":false,"fork":false,"pushed_at":"2024-09-03T10:36:21.000Z","size":287,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T08:43:59.453Z","etag":null,"topics":["agent-based-modeling","artificial-intelligence","deep-learning","deep-q-learning","exploitation","exploration","machine-learning","machine-learning-algorithms","matplotlib-pyplot","numpy","python","q-learning","q-learning-algorithm","reinforcement-learning","reinforcement-learning-algorithms","state-value-function","training"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tashi-2004.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-23T19:33:40.000Z","updated_at":"2024-11-27T17:57:48.000Z","dependencies_parsed_at":"2024-12-03T01:12:12.551Z","dependency_job_id":null,"html_url":"https://github.com/tashi-2004/Deep-Learning-Grid-World-Q-Learning","commit_stats":null,"previous_names":["tashi-2004/deep-learning-grid-world-q-learning"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tashi-2004/Deep-Learning-Grid-World-Q-Learning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tashi-2004%2FDeep-Learning-Grid-World-Q-Learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tashi-2004%2FDeep-Learning-Grid-World-Q-Learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tashi-2004%2FDeep-Learning-Grid-World-Q-Learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tashi-2004%2FDeep-Learning-Grid-World-Q-Learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tashi-2004","download_url":"https://codeload.github.com/tashi-2004/Deep-Learning-Grid-World-Q-Learning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tashi-2004%2FDeep-Learning-Grid-World-Q-Learning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31925334,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T10:19:20.377Z","status":"ssl_error","status_checked_at":"2026-04-17T10:19:18.682Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-based-modeling","artificial-intelligence","deep-learning","deep-q-learning","exploitation","exploration","machine-learning","machine-learning-algorithms","matplotlib-pyplot","numpy","python","q-learning","q-learning-algorithm","reinforcement-learning","reinforcement-learning-algorithms","state-value-function","training"],"created_at":"2024-12-03T01:12:08.212Z","updated_at":"2026-04-17T10:31:51.447Z","avatar_url":"https://github.com/tashi-2004.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deep Learning Grid World Q-Learning\n\n## Overview\n\nThis repository contains an implementation of a Q-learning algorithm to solve a grid world environment using deep learning techniques. The environment consists of a 5x5 grid with obstacles, rewards, and a goal state. The agent learns to navigate this grid to maximize its cumulative reward using Q-learning.\n\n## Files\n\n- `deep_learning_grid_world_q_learning.py`: Contains the main implementation of the Q-learning algorithm, including:\n  - `create_grid_world(ax)`: Function to create and visualize the grid world.\n  - `epsilon_greedy(Q, state, epsilon)`: Function to select an action using the epsilon-greedy policy.\n  - `step(Q, state, action, alpha, gamma)`: Function to perform a step in the environment and update Q-values.\n  - `q_learning_agent(alpha_values, num_episodes)`: Function to train the Q-learning agent with different alpha values.\n  - `visualize_q_values(Q)`: Function to visualize the learned Q-values.\n\n## Usage\n\n1. **Run the Deep Learning Q-learning Agent**\n\n   Execute the script to train the Q-learning agent with different learning rates (`alpha_values`). The training process includes visualization of the agent's movement in the grid world and updates to the Q-values.\n   \n\n3. **Visualize Q-values**\n\n   After training, the Q-values are visualized to show the learned state values.\n\n## Explanation\n\n### Grid World\n\nThe grid world consists of a 5x5 grid with:\n- **Obstacles**: Cells that are blocked and cannot be traversed.\n- **Rewards**: Cells that provide rewards (+5 or +10).\n- **Goal**: The cell at (5, 5) where the agent receives a reward of +10 and the episode terminates.\n\n### Deep Q-learning Algorithm\n\n- **Epsilon-Greedy Policy**: Balances exploration and exploitation.\n- **Learning Rate (Alpha)**: Controls the rate at which the Q-values are updated.\n- **Discount Factor (Gamma)**: Determines the importance of future rewards.\n\n### Visualization\n\n- **Grid World**: The grid is displayed with obstacles, rewards, and the agent's path.\n- **Q-values**: Visualized as a heatmap to show the learned state values.\n\n### Screenshots\n\n\u003cimg width=\"959\" alt=\"ary1\" src=\"https://github.com/user-attachments/assets/6d36f435-46f1-45e9-8215-321e7c8f54f6\"\u003e\n\n\u003cimg width=\"959\" alt=\"2\" src=\"https://github.com/user-attachments/assets/84de75c9-0418-42ed-a056-ce748fea0374\"\u003e\n\n### Output Video\n\nhttps://github.com/user-attachments/assets/5a1f35c8-bd06-43cc-97d9-961f69286a54\n\n## Notes\n\n- The script includes an early stopping condition if the average reward exceeds a threshold over a window of episodes.\n- The agent's progress is visualized in real-time during training.\n\n## Contributing\n\n- Tashfeen Abbasi (abbasitashfeen7@gmail.com)\n- [Laiba Mazhar](https://github.com/laiba-mazhar) (laibamazhar.000@gmail.com)\n\nFeel free to fork the repository and submit pull requests. For issues or feature requests, please open an issue on GitHub.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftashi-2004%2Fdeep-learning-grid-world-q-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftashi-2004%2Fdeep-learning-grid-world-q-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftashi-2004%2Fdeep-learning-grid-world-q-learning/lists"}