https://github.com/aaronjs99/intelligent-agents

Comparative analysis of Markov decision processes & intelligent agents
https://github.com/aaronjs99/intelligent-agents

bandit-algorithms linear-programming mdp policy-iteration reinforcement-learning value-iteration

Last synced: about 1 month ago
JSON representation

Comparative analysis of Markov decision processes & intelligent agents

Host: GitHub
URL: https://github.com/aaronjs99/intelligent-agents
Owner: aaronjs99
License: mit
Created: 2020-10-21T13:13:28.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2025-05-16T00:05:23.000Z (about 1 year ago)
Last Synced: 2026-03-06T00:39:30.912Z (3 months ago)
Topics: bandit-algorithms, linear-programming, mdp, policy-iteration, reinforcement-learning, value-iteration
Language: Python
Homepage:
Size: 1.68 MB
Stars: 3
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Intelligent Agents

A collection of reinforcement learning and intelligent agents projects showcasing implementations of key algorithms and their comparative analysis. These projects were developed as part of the *CS747: Foundations of Intelligent and Learning Agents* course at IIT Bombay.

## Projects Included

### Bandits (`src/bandits`)

Implements and compares classical multi-armed bandit algorithms: **ε-greedy**, **UCB**, **KL-UCB**, and **Thompson Sampling**, along with a custom variant: **Thompson Sampling with a hint**. Each algorithm minimizes cumulative regret across different horizons and random seeds.

**Key Findings:**
- Thompson Sampling generally outperforms others in regret minimization.
- KL-UCB improves over UCB using a tighter confidence bound via binary search.
- ε-Greedy performs best at `ε ≈ 0.02` — striking a balance between exploration and exploitation.
- The Thompson Sampling "hinted" version leverages knowledge of true means, improving early performance through a custom Beta-distribution-based selector.

Includes regret plots over multiple seeds and horizons, as well as parameter studies.

### MDP Maze Solver (`src/mdp`)

Solves mazes by modeling them as Markov Decision Processes using:
- **Value Iteration**
- **Linear Programming** (via PuLP)
- **Howard’s Policy Iteration**

**Pipeline:**
1. `encoder.py` transforms grid mazes into MDPs.
2. `solver.py` computes optimal policy.
3. `decoder.py` reconstructs the shortest path using the policy.

**Insights:**
- LP is consistently fastest for large mazes.
- Howard's Policy Iteration performs well on small problems but becomes costly as maze complexity grows.
- Visual comparisons confirm that solved mazes follow intuitive paths with minimal steps.

Benchmarks for runtime across methods and visualizations for grid navigation are included.

### Windy Gridworld (`src/windy_gridworld`)

Adopts the Sutton & Barto Windy Gridworld challenge with multiple RL approaches:
- **Sarsa** (normal and King’s moves)
- **Sarsa with stochastic wind**
- **Q-Learning**
- **Expected Sarsa**

**Key Results:**
- Sarsa with King’s Moves converges fastest due to shorter episodes.
- Q-Learning and Expected Sarsa outperform standard Sarsa on stability and convergence.
- The stochastic wind variant adds realistic randomness but slows convergence.
- Paths from all agents are visualized for both deterministic and windy environments.

Gridworld is defined as an episodic MDP with reward shaping and stepwise convergence plotting.

## 🔧 Running Experiments with `run.py`

Use the `run.py` script to run all experiments. It acts as a unified launcher for bandits, MDP solving, verification, visualization, and Windy Gridworld tasks.

Enable `--verbose` to view subprocess outputs and logs in real time.

### Bandits
```bash
python run.py --verbose bandits \
--instance data/bandits/instances/i-1.txt \
--algorithm thompson-sampling \
--rseed 42 \
--epsilon 0.1 \
--horizon 1000
```

### MDP Maze Solver
```bash
python run.py --verbose solve_mdp \
--grid data/mdp/grids/grid10.txt \
--algorithm pi
```

To create a synthetic MDP file:
```bash
python run.py --verbose generate_mdp \
--num_states 10 \
--num_actions 5 \
--gamma 0.95 \
--mdptype episodic \
--rseed 42 \
--output_file src/mdp/tmp/generated_mdp.txt
```

To verify all default grids (10 through 100):
```bash
python run.py --verbose verify_mdp --algorithm vi
```

To verify specific grids:
```bash
python run.py --verbose verify_mdp \
--algorithm lp \
--grid data/mdp/grids/grid40.txt data/mdp/grids/grid50.txt
```

To visualize a grid:
```bash
python run.py --verbose visualize_mdp \
--grid_file data/mdp/grids/grid10.txt \
--output_file plots/mdp/grid10_unsolved.png
```

To visualize a solved grid:
```bash
python run.py --verbose visualize_mdp \
--grid_file data/mdp/grids/grid10.txt \
--path_file data/mdp/paths/path10.txt \
--output_file plots/mdp/grid10_solved.png
```

### Windy Gridworld
```bash
python run.py --verbose windy \
--episodes 200 \
--epsilon 0.15 \
--discount 0.99 \
--learning-rate 0.5
```

## Command Reference (Summary)

| Command | Description |
|------------------|------------------------------------------|
| `bandits` | Run multi-armed bandit experiments |
| `windy` | Run Windy Gridworld RL agents |
| `generate_mdp` | Generate synthetic MDP instance files |
| `solve_mdp` | Solve a maze-based MDP using vi/pi/lp |
| `verify_mdp` | Verify path optimality for maze solvers |
| `visualize_mdp` | Create visual output of MDP grid/paths |

## References

- [`./references/mdp_references.txt`](./references/mdp_references.txt)
- [`./references/bandits_references.txt`](./references/bandits_references.txt)
- [`./references/windy_gridworld_references.txt`](./references/windy_gridworld_references.txt)

## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aaronjs99/intelligent-agents

Awesome Lists containing this project

README