https://github.com/vishal-sys-code/smarttraffic-rl
A Gym-style RL environment and benchmark for adaptive urban traffic signal control using macroscopic flow models (with optional SUMO integration).
https://github.com/vishal-sys-code/smarttraffic-rl
Last synced: 5 months ago
JSON representation
A Gym-style RL environment and benchmark for adaptive urban traffic signal control using macroscopic flow models (with optional SUMO integration).
- Host: GitHub
- URL: https://github.com/vishal-sys-code/smarttraffic-rl
- Owner: Vishal-sys-code
- License: mit
- Created: 2025-08-05T09:57:03.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-08-06T13:11:03.000Z (6 months ago)
- Last Synced: 2025-08-06T14:27:43.613Z (6 months ago)
- Language: HTML
- Homepage: https://smart-traffic-rl.vercel.app
- Size: 1.17 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SmartTraffic-RL
A Gym-style RL environment and benchmark for adaptive urban traffic signal control using macroscopic flow models.
## Architecture Overview
Below is the end-to-end structure of **SmartTraffic-RL**, showing how the components interact:

## Getting Started
### Installation
1. Clone the repository:
```bash
git clone https://github.com/your-repo/SmartTraffic-RL.git
cd SmartTraffic-RL
```
2. Install the package in editable mode with its dependencies:
```bash
pip install -e .
```
You will also need to install `stable-baselines3` with PyTorch. It is recommended to install the CPU-only version of PyTorch if you are not using a GPU.
```bash
pip install "stable-baselines3[extra]" torch --index-url https://download.pytorch.org/whl/cpu
```
### Training a PPO Agent
The `examples/train_ppo.py` script provides a configurable way to train a PPO agent. It includes observation normalization and reward scaling, which are crucial for stable training.
To train an agent with a custom set of hyperparameters:
```bash
python examples/train_ppo.py \
--exp_name "my_experiment" \
--lr 1e-4 \
--n_steps 4096 \
--reward_scale 1e6 \
--total_timesteps 200000
```
Training logs and the best model will be saved in the `logs/` and `ppo_tensorboard/` directories.
### Evaluating a Trained Agent
The `examples/evaluate.py` script allows you to evaluate a trained model and compare its performance against a fixed-time baseline.
To run an evaluation:
```bash
python -m examples.evaluate --model_path /path/to/your/model.zip
```
This will print the average queue length for both the PPO agent and the fixed-time agent over 10 episodes.
## Results
After implementing observation/reward scaling and tuning hyperparameters, the PPO agent successfully learns a policy that outperforms a fixed-time (equal green splits) baseline.
The following results were obtained after training for 200,000 timesteps with a learning rate of `1e-4`, `n_steps` of `4096`, and a reward scaling factor of `1e6`:
| Agent | Average Queue Length |
|------------|----------------------|
| PPO (Tuned)| **246.07** |
| Fixed-Time | 246.95 |
The training metrics showed a healthy learning process, with `approx_kl` around `0.01` and `explained_variance` consistently above 0, indicating that the value function was learning effectively. This confirms that the PPO pipeline is working as expected.
## Visualization
The repository includes tools to log and visualize simulation metrics. The `examples/evaluate.py` script, when run, will produce JSON log files for both the PPO and fixed-time agents. These logs can be used to generate plots comparing their performance.
For example, the following plot shows the average vehicle delay per episode for a trained PPO agent versus a fixed-time agent:

For more details on logging and visualization, see the [API documentation](docs/api.md).