{"id":50876702,"url":"https://github.com/meolen07/learning-agents-playground","last_synced_at":"2026-06-15T10:31:19.519Z","repository":{"id":361712549,"uuid":"1255293196","full_name":"meolen07/learning-agents-playground","owner":"meolen07","description":"A learning-focused reinforcement learning prototype for training and evaluating simple agents with Q-learning and DQN.","archived":false,"fork":false,"pushed_at":"2026-05-31T23:31:01.000Z","size":238,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-01T00:18:39.086Z","etag":null,"topics":["dqn","gymnasium","learning-agents","pytorch","q-learning","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://learning-agents-playground.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/meolen07.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-31T16:46:02.000Z","updated_at":"2026-05-31T23:31:01.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/meolen07/learning-agents-playground","commit_stats":null,"previous_names":["meolen07/learning-agents-playground"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/meolen07/learning-agents-playground","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meolen07%2Flearning-agents-playground","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meolen07%2Flearning-agents-playground/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meolen07%2Flearning-agents-playground/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meolen07%2Flearning-agents-playground/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/meolen07","download_url":"https://codeload.github.com/meolen07/learning-agents-playground/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/meolen07%2Flearning-agents-playground/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34357285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dqn","gymnasium","learning-agents","pytorch","q-learning","reinforcement-learning"],"created_at":"2026-06-15T10:31:18.731Z","updated_at":"2026-06-15T10:31:19.514Z","avatar_url":"https://github.com/meolen07.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learning Agents Playground\n\n**Author:** Huynh Mai Linh Nguyen · **[View live dashboard](https://learning-agents-playground.streamlit.app/)**\n\nA hands-on reinforcement learning playground for training, evaluating, and visualizing classic and deep RL agents on Gymnasium environments. Built as an educational prototype for undergraduate research and portfolio work — focused on clarity and reproducibility, not state-of-the-art benchmarks.\n\n---\n\n## Overview\n\nLearning Agents Playground is a modular Python project that walks through two core RL paradigms:\n\n| Paradigm | Algorithm | Environment | State type |\n|----------|-----------|-------------|------------|\n| Tabular | Q-Learning | FrozenLake-v1 | Discrete |\n| Deep | DQN | CartPole-v1 | Continuous |\n\nThe codebase covers the full experiment loop: YAML-driven training, CSV logging, checkpoint saving, matplotlib plots, CLI evaluation, and a Streamlit dashboard for exploring saved runs.\n\n---\n\n## Motivation\n\nReinforcement learning tutorials often jump straight to library APIs without showing how the pieces fit together. This project was built to:\n\n- **Implement algorithms from scratch** — tabular Q-learning and a minimal DQN with replay buffer and target network\n- **Practice reproducible experimentation** — centralized seeding, config files, and structured output directories\n- **Bridge theory and tooling** — connect Bellman updates and neural function approximation to working code you can run, plot, and inspect\n\nIt is a learning sandbox, not a production RL framework. The goal is to understand *how* agents learn, not to chase leaderboard scores.\n\n---\n\n## Features\n\n- **Agents:** `RandomAgent` (baseline), `QLearningAgent` (tabular), `DQNAgent` (128→128 MLP)\n- **Environments:** Gymnasium factory with seeding and `is_slippery` support for FrozenLake\n- **Training:** Episode-level CSV logs, automatic plot generation, agent checkpoints\n- **Evaluation:** Success rate and reward summaries via CLI\n- **Dashboard:** Streamlit app with Environment and Agent selectors, moving-average charts, and saved asset viewer\n- **Testing:** Lightweight smoke tests (`pytest`) for imports, agent APIs, and short training loops\n\n---\n\n## Project Structure\n\n```\nlearning-agents-playground/\n├── README.md\n├── requirements.txt\n├── pyproject.toml\n├── app.py                          # Streamlit dashboard (visualization only)\n├── configs/\n│   ├── frozenlake_q_learning.yaml\n│   └── cartpole_dqn.yaml\n├── scripts/\n│   ├── train_q_learning.py         # CLI: tabular Q-learning\n│   ├── train_dqn.py                # CLI: DQN training\n│   ├── evaluate_agent.py           # CLI: evaluate checkpoints\n│   └── plot_results.py             # CLI: re-plot from CSV\n├── src/\n│   ├── agents/\n│   │   ├── random_agent.py\n│   │   ├── q_learning.py\n│   │   └── dqn.py\n│   ├── envs/\n│   │   └── make_env.py\n│   ├── training/\n│   │   ├── train_q_learning.py\n│   │   └── train_dqn.py\n│   ├── evaluation/\n│   │   ├── evaluate.py\n│   │   └── metrics.py\n│   └── utils/\n│       ├── seeding.py\n│       ├── plotting.py\n│       └── replay_buffer.py\n├── results/                        # Training outputs (created on first run)\n│   ├── frozenlake_qlearning/\n│   └── cartpole_dqn/\n└── tests/\n    └── test_smoke.py\n```\n\n---\n\n## Installation\n\n**Requirements:** Python 3.10+\n\n```bash\ngit clone https://github.com/\u003cyour-username\u003e/learning-agents-playground.git\ncd learning-agents-playground\n\npython -m venv .venv\nsource .venv/bin/activate          # Windows: .venv\\Scripts\\activate\n\npip install -r requirements.txt\n```\n\n### PyTorch note\n\n`requirements.txt` installs `torch\u003e=2.0.0` from PyPI. On some platforms (especially macOS or CUDA-specific setups), you may prefer the official install command from [pytorch.org](https://pytorch.org/get-started/locally/) before installing other dependencies:\n\n```bash\n# Example: CPU-only PyTorch (adjust for your platform)\npip install torch --index-url https://download.pytorch.org/whl/cpu\npip install -r requirements.txt\n```\n\nSmoke tests skip DQN-related cases if PyTorch is unavailable. Set `LAP_SKIP_TORCH_TESTS=1` to force-skip torch tests even when installed.\n\n---\n\n## Quick Start\n\nAll training runs via CLI scripts. The Streamlit app only visualizes saved results.\n\n### Train Q-Learning on FrozenLake\n\n```bash\npython scripts/train_q_learning.py --config configs/frozenlake_q_learning.yaml --verbose\n```\n\nDefault config (`configs/frozenlake_q_learning.yaml`): 5,000 episodes on slippery 4×4 FrozenLake, seed 42.\n\n**Outputs** → `results/frozenlake_qlearning/`:\n\n| File | Description |\n|------|-------------|\n| `training_log.csv` | Per-episode reward, success, epsilon, steps |\n| `rewards.png` | Reward curve with moving average |\n| `success_rate.png` | Rolling success rate |\n| `agent.pkl` | Saved Q-table checkpoint |\n\nOverride options:\n\n```bash\npython scripts/train_q_learning.py --episodes 1000 --seed 0 --verbose\npython scripts/train_q_learning.py --output-dir results --verbose\n```\n\n### Train DQN on CartPole\n\n```bash\npython scripts/train_dqn.py --config configs/cartpole_dqn.yaml --verbose\n```\n\nDefault config (`configs/cartpole_dqn.yaml`): 500 episodes on CartPole-v1, 128-hidden MLP, replay buffer, target network sync every 10 episodes.\n\n**Outputs** → `results/cartpole_dqn/`:\n\n| File | Description |\n|------|-------------|\n| `training_log.csv` | Per-episode reward, success, epsilon, steps, avg_loss |\n| `rewards.png` | Reward curve with moving average |\n| `losses.png` | TD-loss curve |\n| `agent.pt` | Saved PyTorch checkpoint |\n\nOverride options:\n\n```bash\npython scripts/train_dqn.py --episodes 200 --seed 0 --verbose\n```\n\n### Evaluate Agents\n\nLoad a checkpoint and run evaluation episodes (default: 100, seed 42):\n\n```bash\n# Q-learning on FrozenLake\npython scripts/evaluate_agent.py \\\n  --agent-type qlearning \\\n  --checkpoint results/frozenlake_qlearning/agent.pkl \\\n  --env-id FrozenLake-v1 \\\n  --episodes 100\n\n# DQN on CartPole\npython scripts/evaluate_agent.py \\\n  --agent-type dqn \\\n  --checkpoint results/cartpole_dqn/agent.pt \\\n  --env-id CartPole-v1 \\\n  --episodes 100\n```\n\nOptional: save summary JSON with `--output results/eval_summary.json`.\n\nFor FrozenLake, control slipperiness with `--is-slippery` / `--no-is-slippery` (default: slippery).\n\n### Re-plot from CSV\n\nRegenerate plots from an existing training log:\n\n```bash\npython scripts/plot_results.py --csv results/frozenlake_qlearning/training_log.csv\n```\n\nPlots are saved alongside the CSV as `replotted_rewards.png` and (if applicable) `replotted_success_rate.png`.\n\n### Streamlit Dashboard\n\n**[View live dashboard](https://learning-agents-playground.streamlit.app/)** — explore pre-trained results online. To run locally:\n\n```bash\nstreamlit run app.py\n```\n\nThe dashboard provides:\n\n- **Sidebar selectors** — Environment (`FrozenLake-v1`, `CartPole-v1`) and Agent (`Random`, `Q-learning`, `DQN`)\n- **Metrics** — Episode count, mean/max reward, success rate\n- **Tabs** — Rewards chart, success rate, raw CSV table, saved PNG/checkpoint assets\n- **Auto-discovery** — Finds the latest matching run under `results/`, or falls back to default run names (`frozenlake_qlearning`, `cartpole_dqn`)\n\n\u003e **Note:** Training is **not** performed in the app. Run the CLI scripts first, then launch the dashboard to explore results.\n\n---\n\n## Configuration\n\nKey hyperparameters are defined in YAML. All configs support `seed`, `run_name`, and `output_dir`.\n\n### FrozenLake Q-Learning\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `env_id` | `FrozenLake-v1` | Gymnasium environment |\n| `is_slippery` | `true` | Stochastic vs deterministic transitions |\n| `episodes` | `5000` | Training episodes |\n| `learning_rate` | `0.1` | Q-learning step size (α) |\n| `discount_factor` | `0.99` | Bellman discount (γ) |\n| `epsilon` / `epsilon_decay` | `1.0` / `0.995` | Epsilon-greedy exploration schedule |\n\n### CartPole DQN\n\n| Key | Default | Description |\n|-----|---------|-------------|\n| `env_id` | `CartPole-v1` | Gymnasium environment |\n| `episodes` | `500` | Training episodes |\n| `hidden_dim` | `128` | MLP hidden size (128→128) |\n| `batch_size` | `64` | Replay mini-batch size |\n| `buffer_capacity` | `100000` | Replay buffer capacity |\n| `warmup_steps` | `1000` | Steps before gradient updates |\n| `target_update_freq` | `10` | Target network sync frequency |\n| `success_threshold` | `195.0` | Reward threshold for \"solved\" episodes |\n\n---\n\n## Example Results\n\nAfter training, each run directory under `results/\u003crun_name\u003e/` contains logs, plots, and checkpoints. Example paths:\n\n**FrozenLake Q-Learning** (`results/frozenlake_qlearning/`):\n\n![Reward curve](results/frozenlake_qlearning/rewards.png)\n![Success rate](results/frozenlake_qlearning/success_rate.png)\n\n**CartPole DQN** (`results/cartpole_dqn/`):\n\n![Reward curve](results/cartpole_dqn/rewards.png)\n![TD loss](results/cartpole_dqn/losses.png)\n\nTypical outcomes on default configs (your results may vary with seed and hardware):\n\n- **FrozenLake (slippery):** Success rate improves gradually over thousands of episodes; learning is noisy due to stochastic transitions.\n- **CartPole:** DQN typically reaches near-threshold rewards (~195+) within a few hundred episodes, though convergence is not guaranteed on every run.\n\n---\n\n## What I Learned\n\nBuilding this project reinforced several RL concepts in practice:\n\n- **Exploration vs exploitation** — epsilon decay schedules matter; too-fast decay can trap tabular agents in local optima on slippery FrozenLake\n- **Off-policy learning** — Q-learning bootstraps from the max Q-value of the next state; DQN approximates this with a neural network and replay buffer\n- **Stabilizing deep RL** — experience replay, target networks, and warmup steps reduce divergence; without them, CartPole training is noticeably less stable\n- **Reproducibility habits** — seeding Python/NumPy/env RNGs and logging every episode to CSV makes debugging and comparison much easier than printing to stdout\n- **Separation of concerns** — keeping training (CLI), evaluation, and visualization (Streamlit) in distinct entry points keeps the codebase easier to reason about\n\n---\n\n## Limitations\n\nThis is an intentionally minimal prototype. Be aware of the following:\n\n- **Not SOTA** — no Double DQN, Dueling architectures, prioritized replay, or hyperparameter tuning pipelines\n- **Small environment set** — only FrozenLake and CartPole; no Atari, continuous control, or multi-agent setups\n- **Single-run workflow** — no experiment tracking (Weights \u0026 Biases, MLflow), hyperparameter sweeps, or parallel training\n- **Basic evaluation** — no statistical significance testing or confidence intervals across multiple seeds\n- **Dashboard is read-only** — no live training or hyperparameter editing in Streamlit\n- **Slippery FrozenLake is hard** — tabular Q-learning may need many episodes and still show high variance\n\nThese trade-offs are by design for a learning-focused codebase.\n\n---\n\n## Roadmap\n\nPossible next steps if extending the project:\n\n- [ ] Add Double DQN and compare against vanilla DQN on CartPole\n- [ ] Multi-seed training with aggregated mean ± std plots\n- [ ] Policy gradient baseline (REINFORCE) on CartPole\n- [ ] Additional environments (MountainCar, Acrobot)\n- [ ] Experiment logging integration (CSV → W\u0026B or TensorBoard)\n- [ ] Random-agent evaluation baseline with logged metrics\n\n---\n\n## Testing\n\n```bash\npytest tests/test_smoke.py -v\n```\n\nSmoke tests verify imports, agent save/load, metrics helpers, short Q-learning runs, and (when PyTorch is available) DQN forward passes and mini training loops. They use only a handful of episodes — not full training.\n\n---\n\n## License\n\nMIT License — see [LICENSE](LICENSE) file.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeolen07%2Flearning-agents-playground","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmeolen07%2Flearning-agents-playground","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmeolen07%2Flearning-agents-playground/lists"}