{"id":49565221,"url":"https://github.com/aaronjs99/intelligent-agents","last_synced_at":"2026-05-03T11:13:10.590Z","repository":{"id":95302474,"uuid":"306029336","full_name":"aaronjs99/intelligent-agents","owner":"aaronjs99","description":"Comparative analysis of Markov decision processes \u0026 intelligent agents","archived":false,"fork":false,"pushed_at":"2025-05-16T00:05:23.000Z","size":1761,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-03-06T00:39:30.912Z","etag":null,"topics":["bandit-algorithms","linear-programming","mdp","policy-iteration","reinforcement-learning","value-iteration"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaronjs99.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-10-21T13:13:28.000Z","updated_at":"2025-07-04T19:00:35.000Z","dependencies_parsed_at":"2025-07-31T01:03:05.018Z","dependency_job_id":"1aa3aa68-ccd2-4196-8175-9962b2ef9f06","html_url":"https://github.com/aaronjs99/intelligent-agents","commit_stats":null,"previous_names":["aaronjs99/intelligent-agents","aaronjohnsabu1999/intelligent-agents-cs747","aaronjohnsabu1999/intelligent-agents"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aaronjs99/intelligent-agents","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjs99%2Fintelligent-agents","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjs99%2Fintelligent-agents/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjs99%2Fintelligent-agents/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjs99%2Fintelligent-agents/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaronjs99","download_url":"https://codeload.github.com/aaronjs99/intelligent-agents/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronjs99%2Fintelligent-agents/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32566492,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bandit-algorithms","linear-programming","mdp","policy-iteration","reinforcement-learning","value-iteration"],"created_at":"2026-05-03T11:13:09.790Z","updated_at":"2026-05-03T11:13:10.585Z","avatar_url":"https://github.com/aaronjs99.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Intelligent Agents\n\nA collection of reinforcement learning and intelligent agents projects showcasing implementations of key algorithms and their comparative analysis. These projects were developed as part of the *CS747: Foundations of Intelligent and Learning Agents* course at IIT Bombay.\n\n## Projects Included\n\n### Bandits (`src/bandits`)\n\nImplements and compares classical multi-armed bandit algorithms: **ε-greedy**, **UCB**, **KL-UCB**, and **Thompson Sampling**, along with a custom variant: **Thompson Sampling with a hint**. Each algorithm minimizes cumulative regret across different horizons and random seeds.\n\n**Key Findings:**\n- Thompson Sampling generally outperforms others in regret minimization.\n- KL-UCB improves over UCB using a tighter confidence bound via binary search.\n- ε-Greedy performs best at `ε ≈ 0.02` — striking a balance between exploration and exploitation.\n- The Thompson Sampling \"hinted\" version leverages knowledge of true means, improving early performance through a custom Beta-distribution-based selector.\n\nIncludes regret plots over multiple seeds and horizons, as well as parameter studies.\n\n### MDP Maze Solver (`src/mdp`)\n\nSolves mazes by modeling them as Markov Decision Processes using:\n- **Value Iteration**\n- **Linear Programming** (via PuLP)\n- **Howard’s Policy Iteration**\n\n**Pipeline:**\n1. `encoder.py` transforms grid mazes into MDPs.\n2. `solver.py` computes optimal policy.\n3. `decoder.py` reconstructs the shortest path using the policy.\n\n**Insights:**\n- LP is consistently fastest for large mazes.\n- Howard's Policy Iteration performs well on small problems but becomes costly as maze complexity grows.\n- Visual comparisons confirm that solved mazes follow intuitive paths with minimal steps.\n\nBenchmarks for runtime across methods and visualizations for grid navigation are included.\n\n### Windy Gridworld (`src/windy_gridworld`)\n\nAdopts the Sutton \u0026 Barto Windy Gridworld challenge with multiple RL approaches:\n- **Sarsa** (normal and King’s moves)\n- **Sarsa with stochastic wind**\n- **Q-Learning**\n- **Expected Sarsa**\n\n**Key Results:**\n- Sarsa with King’s Moves converges fastest due to shorter episodes.\n- Q-Learning and Expected Sarsa outperform standard Sarsa on stability and convergence.\n- The stochastic wind variant adds realistic randomness but slows convergence.\n- Paths from all agents are visualized for both deterministic and windy environments.\n\nGridworld is defined as an episodic MDP with reward shaping and stepwise convergence plotting.\n\n## 🔧 Running Experiments with `run.py`\n\nUse the `run.py` script to run all experiments. It acts as a unified launcher for bandits, MDP solving, verification, visualization, and Windy Gridworld tasks.\n\nEnable `--verbose` to view subprocess outputs and logs in real time.\n\n### Bandits\n```bash\npython run.py --verbose bandits \\\n  --instance data/bandits/instances/i-1.txt \\\n  --algorithm thompson-sampling \\\n  --rseed 42 \\\n  --epsilon 0.1 \\\n  --horizon 1000\n```\n\n### MDP Maze Solver\n```bash\npython run.py --verbose solve_mdp \\\n  --grid data/mdp/grids/grid10.txt \\\n  --algorithm pi\n```\n\nTo create a synthetic MDP file:\n```bash\npython run.py --verbose generate_mdp \\\n  --num_states 10 \\\n  --num_actions 5 \\\n  --gamma 0.95 \\\n  --mdptype episodic \\\n  --rseed 42 \\\n  --output_file src/mdp/tmp/generated_mdp.txt\n```\n\nTo verify all default grids (10 through 100):\n```bash\npython run.py --verbose verify_mdp --algorithm vi\n```\n\nTo verify specific grids:\n```bash\npython run.py --verbose verify_mdp \\\n  --algorithm lp \\\n  --grid data/mdp/grids/grid40.txt data/mdp/grids/grid50.txt\n```\n\nTo visualize a grid:\n```bash\npython run.py --verbose visualize_mdp \\\n  --grid_file data/mdp/grids/grid10.txt \\\n  --output_file plots/mdp/grid10_unsolved.png\n```\n\nTo visualize a solved grid:\n```bash\npython run.py --verbose visualize_mdp \\\n  --grid_file data/mdp/grids/grid10.txt \\\n  --path_file data/mdp/paths/path10.txt \\\n  --output_file plots/mdp/grid10_solved.png\n```\n\n### Windy Gridworld\n```bash\npython run.py --verbose windy \\\n  --episodes 200 \\\n  --epsilon 0.15 \\\n  --discount 0.99 \\\n  --learning-rate 0.5\n```\n\n## Command Reference (Summary)\n\n| Command          | Description                              |\n|------------------|------------------------------------------|\n| `bandits`        | Run multi-armed bandit experiments       |\n| `windy`          | Run Windy Gridworld RL agents            |\n| `generate_mdp`   | Generate synthetic MDP instance files    |\n| `solve_mdp`      | Solve a maze-based MDP using vi/pi/lp    |\n| `verify_mdp`     | Verify path optimality for maze solvers  |\n| `visualize_mdp`  | Create visual output of MDP grid/paths   |\n\n## References\n\n- [`./references/mdp_references.txt`](./references/mdp_references.txt)\n- [`./references/bandits_references.txt`](./references/bandits_references.txt)\n- [`./references/windy_gridworld_references.txt`](./references/windy_gridworld_references.txt)\n\n## License\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronjs99%2Fintelligent-agents","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaronjs99%2Fintelligent-agents","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronjs99%2Fintelligent-agents/lists"}