{"id":15108824,"url":"https://github.com/kengz/slm-lab","last_synced_at":"2026-02-11T04:10:39.613Z","repository":{"id":40336195,"uuid":"105591065","full_name":"kengz/SLM-Lab","owner":"kengz","description":"Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book \"Foundations of Deep Reinforcement Learning\".","archived":false,"fork":false,"pushed_at":"2025-02-16T01:19:34.000Z","size":4279,"stargazers_count":1277,"open_issues_count":16,"forks_count":274,"subscribers_count":46,"default_branch":"master","last_synced_at":"2025-04-14T22:07:20.758Z","etag":null,"topics":["a2c","a3c","benchmark","deep-reinforcement-learning","dqn","policy-gradient","ppo","pytorch","reinforcement-learning","sac"],"latest_commit_sha":null,"homepage":"https://slm-lab.gitbook.io/slm-lab/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kengz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-02T22:20:22.000Z","updated_at":"2025-04-04T09:01:00.000Z","dependencies_parsed_at":"2025-03-31T17:06:42.514Z","dependency_job_id":"7753e630-dfb8-48a8-bc63-5c91826079e1","html_url":"https://github.com/kengz/SLM-Lab","commit_stats":null,"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2FSLM-Lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2FSLM-Lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2FSLM-Lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kengz%2FSLM-Lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kengz","download_url":"https://codeload.github.com/kengz/SLM-Lab/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254160173,"owners_count":22024567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","a3c","benchmark","deep-reinforcement-learning","dqn","policy-gradient","ppo","pytorch","reinforcement-learning","sac"],"created_at":"2024-09-25T22:40:59.928Z","updated_at":"2026-02-11T04:10:39.608Z","avatar_url":"https://github.com/kengz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [SLM Lab](https://www.amazon.com/dp/0135172381) \u003cbr\u003e ![GitHub tag (latest SemVer)](https://img.shields.io/github/tag/kengz/slm-lab) ![CI](https://github.com/kengz/SLM-Lab/workflows/CI/badge.svg)\n\n\u003cp align=\"center\"\u003e\n  \u003ci\u003eModular Deep Reinforcement Learning framework in PyTorch.\u003c/i\u003e\n  \u003cbr\u003e\n  \u003ci\u003eCompanion library of the book \u003ca href=\"https://www.amazon.com/dp/0135172381\"\u003eFoundations of Deep Reinforcement Learning\u003c/a\u003e.\u003c/i\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://slm-lab.gitbook.io/slm-lab/\"\u003eDocumentation\u003c/a\u003e · \u003ca href=\"https://github.com/kengz/SLM-Lab/blob/master/docs/BENCHMARKS.md\"\u003eBenchmark Results\u003c/a\u003e\n\u003c/p\u003e\n\n\u003e**NOTE:** v5.0 updates to Gymnasium, `uv` tooling, and modern dependencies with ARM support - see [CHANGELOG.md](CHANGELOG.md).\n\u003e\n\u003eBook readers: `git checkout v4.1.1` for *Foundations of Deep Reinforcement Learning* code.\n\n|||||\n|:---:|:---:|:---:|:---:|\n| ![ppo beamrider](https://user-images.githubusercontent.com/8209263/63994698-689ecf00-caaa-11e9-991f-0a5e9c2f5804.gif) | ![ppo breakout](https://user-images.githubusercontent.com/8209263/63994695-650b4800-caaa-11e9-9982-2462738caa45.gif) | ![ppo kungfumaster](https://user-images.githubusercontent.com/8209263/63994690-60469400-caaa-11e9-9093-b1cd38cee5ae.gif) | ![ppo mspacman](https://user-images.githubusercontent.com/8209263/63994685-5cb30d00-caaa-11e9-8f35-78e29a7d60f5.gif) |\n| BeamRider | Breakout | KungFuMaster | MsPacman |\n| ![ppo pong](https://user-images.githubusercontent.com/8209263/63994680-59b81c80-caaa-11e9-9253-ed98370351cd.gif) | ![ppo qbert](https://user-images.githubusercontent.com/8209263/63994672-54f36880-caaa-11e9-9757-7780725b53af.gif) | ![ppo seaquest](https://user-images.githubusercontent.com/8209263/63994665-4dcc5a80-caaa-11e9-80bf-c21db818115b.gif) | ![ppo spaceinvaders](https://user-images.githubusercontent.com/8209263/63994624-15c51780-caaa-11e9-9c9a-854d3ce9066d.gif) |\n| Pong | Qbert | Seaquest | Sp.Invaders |\n| ![sac ant](https://user-images.githubusercontent.com/8209263/63994867-ff6b8b80-caaa-11e9-971e-2fac1cddcbac.gif) | ![sac halfcheetah](https://user-images.githubusercontent.com/8209263/63994869-01354f00-caab-11e9-8e11-3893d2c2419d.gif) | ![sac hopper](https://user-images.githubusercontent.com/8209263/63994871-0397a900-caab-11e9-9566-4ca23c54b2d4.gif) | ![sac humanoid](https://user-images.githubusercontent.com/8209263/63994883-0befe400-caab-11e9-9bcc-c30c885aad73.gif) |\n| Ant | HalfCheetah | Hopper | Humanoid |\n| ![sac doublependulum](https://user-images.githubusercontent.com/8209263/63994879-07c3c680-caab-11e9-974c-06cdd25bfd68.gif) | ![sac pendulum](https://user-images.githubusercontent.com/8209263/63994880-085c5d00-caab-11e9-850d-049401540e3b.gif) | ![sac reacher](https://user-images.githubusercontent.com/8209263/63994881-098d8a00-caab-11e9-8e19-a3b32d601b10.gif) | ![sac walker](https://user-images.githubusercontent.com/8209263/63994882-0abeb700-caab-11e9-9e19-b59dc5c43393.gif) |\n| Inv.DoublePendulum | InvertedPendulum | Reacher | Walker |\n\nSLM Lab is a software framework for **reinforcement learning** (RL) research and application in PyTorch. RL trains agents to make decisions by learning from trial and error—like teaching a robot to walk or an AI to play games.\n\n## What SLM Lab Offers\n\n| Feature | Description |\n|---------|-------------|\n| **Ready-to-use algorithms** | PPO, SAC, DQN, A2C, REINFORCE—validated on 70+ environments |\n| **Easy configuration** | JSON spec files fully define experiments—no code changes needed |\n| **Reproducibility** | Every run saves its spec + git SHA for exact reproduction |\n| **Automatic analysis** | Training curves, metrics, and TensorBoard logging out of the box |\n| **Cloud integration** | dstack for GPU training, HuggingFace for sharing results |\n\n## Algorithms\n\n| Algorithm | Type | Best For | Validated Environments |\n|-----------|------|----------|------------------------|\n| **REINFORCE** | On-policy | Learning/teaching | Classic |\n| **SARSA** | On-policy | Tabular-like | Classic |\n| **DQN/DDQN+PER** | Off-policy | Discrete actions | Classic, Box2D, Atari |\n| **A2C** | On-policy | Fast iteration | Classic, Box2D, Atari |\n| **PPO** | On-policy | General purpose | Classic, Box2D, MuJoCo (11), Atari (54) |\n| **SAC** | Off-policy | Continuous control | Classic, Box2D, MuJoCo |\n\nSee [Benchmark Results](docs/BENCHMARKS.md) for detailed performance data.\n\n## Environments\n\nSLM Lab uses [Gymnasium](https://gymnasium.farama.org/) (the maintained fork of OpenAI Gym):\n\n| Category | Examples | Difficulty | Docs |\n|----------|----------|------------|------|\n| **Classic Control** | CartPole, Pendulum, Acrobot | Easy | [Gymnasium Classic](https://gymnasium.farama.org/environments/classic_control/) |\n| **Box2D** | LunarLander, BipedalWalker | Medium | [Gymnasium Box2D](https://gymnasium.farama.org/environments/box2d/) |\n| **MuJoCo** | Hopper, HalfCheetah, Humanoid | Hard | [Gymnasium MuJoCo](https://gymnasium.farama.org/environments/mujoco/) |\n| **Atari** | Breakout, MsPacman, and 54 more | Varied | [ALE](https://ale.farama.org/environments/) |\n\nAny gymnasium-compatible environment works—just specify its name in the spec.\n\n## Quick Start\n\n```bash\n# Install\nuv sync\nuv tool install --editable .\n\n# Run demo (PPO CartPole)\nslm-lab run                                    # PPO CartPole\nslm-lab run --render                           # with visualization\n\n# Run custom experiment\nslm-lab run spec.json spec_name train          # local training\nslm-lab run-remote spec.json spec_name train   # cloud training (dstack)\n\n# Help (CLI uses Typer)\nslm-lab --help                                 # list all commands\nslm-lab run --help                             # options for run command\n\n# Troubleshoot: if slm-lab not found, use uv run\nuv run slm-lab run\n```\n\n## Cloud Training (dstack)\n\nRun experiments on cloud GPUs with automatic result sync to HuggingFace.\n\n```bash\n# Setup\ncp .env.example .env  # Add HF_TOKEN\nuv tool install dstack  # Install dstack CLI\n# Configure dstack server - see https://dstack.ai/docs/quickstart\n\n# Run on cloud\nslm-lab run-remote spec.json spec_name train           # CPU training (default)\nslm-lab run-remote spec.json spec_name search          # CPU ASHA search (default)\nslm-lab run-remote --gpu spec.json spec_name train     # GPU training (for image envs)\n\n# Sync results\nslm-lab pull spec_name    # Download from HuggingFace\nslm-lab list              # List available experiments\n```\n\nConfig options in `.dstack/`: `run-gpu-train.yml`, `run-gpu-search.yml`, `run-cpu-train.yml`, `run-cpu-search.yml`\n\n### Minimal Install (Orchestration Only)\n\nFor a lightweight box that only dispatches dstack runs, syncs results, and generates plots (no local ML training):\n\n```bash\nuv sync --no-default-groups\nuv run --no-default-groups slm-lab run-remote spec.json spec_name train\nuv run --no-default-groups slm-lab pull spec_name\nuv run --no-default-groups slm-lab plot -f folder1,folder2\n```\n\n## Citation\n\nIf you use SLM Lab in your research, please cite:\n\n```bibtex\n@misc{kenggraesser2017slmlab,\n    author = {Keng, Wah Loon and Graesser, Laura},\n    title = {SLM Lab},\n    year = {2017},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https://github.com/kengz/SLM-Lab}},\n}\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkengz%2Fslm-lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkengz%2Fslm-lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkengz%2Fslm-lab/lists"}