{"id":19630931,"url":"https://github.com/heronsystems/adeptrl","last_synced_at":"2025-04-06T13:10:39.629Z","repository":{"id":48118390,"uuid":"145903300","full_name":"heronsystems/adeptRL","owner":"heronsystems","description":"Reinforcement learning framework to accelerate research","archived":false,"fork":false,"pushed_at":"2021-08-25T16:28:31.000Z","size":2742,"stargazers_count":204,"open_issues_count":20,"forks_count":29,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-03-30T10:07:59.944Z","etag":null,"topics":["actor-critic","artificial-intelligence","atari","pysc2","pytorch","reinforcement-learning","starcraft2-ai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heronsystems.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-08-23T20:24:09.000Z","updated_at":"2025-03-21T16:07:41.000Z","dependencies_parsed_at":"2022-08-12T19:00:45.696Z","dependency_job_id":null,"html_url":"https://github.com/heronsystems/adeptRL","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heronsystems%2FadeptRL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heronsystems%2FadeptRL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heronsystems%2FadeptRL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heronsystems%2FadeptRL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heronsystems","download_url":"https://codeload.github.com/heronsystems/adeptRL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247485287,"owners_count":20946398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["actor-critic","artificial-intelligence","atari","pysc2","pytorch","reinforcement-learning","starcraft2-ai"],"created_at":"2024-11-11T12:07:10.026Z","updated_at":"2025-04-06T13:10:39.599Z","avatar_url":"https://github.com/heronsystems.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![banner](images/banner.png)\n\nadept is a reinforcement learning framework designed to accelerate research \nby abstracting away engineering challenges associated with deep reinforcement\nlearning. adept provides:\n* multi-GPU training\n* a modular interface for using custom networks, agents, and environments\n* baseline reinforcement learning models and algorithms for PyTorch\n* built-in tensorboard logging, model saving, reloading, evaluation, and \nrendering\n* proven hyperparameter defaults\n\nThis code is early-access, expect rough edges. Interfaces subject to change. \nWe're happy to accept feedback and contributions.\n\n### Read More\n* [Installation](#installation)\n* [Quickstart](#quickstart)\n* [Features](#features)\n* [Performance](#performance)\n\n### Documentation\n* [Architecture Overview](docs/api_overview.md)\n* [ModularNetwork Overview](docs/modular_network.md)\n* [Resume training](docs/resume_training.md)\n* Evaluate a model\n* Render environment\n\n### Examples\n* Custom Network ([stub](examples/custom_network_stub.py) | example)\n* Custom SubModule ([stub](examples/custom_submodule_stub.py) | [example](adept/network/net1d/lstm.py))\n* Custom Agent ([stub](examples/custom_agent_stub.py) | [example](adept/agent/actor_critic.py))\n* Custom Environment ([stub](examples/custom_environment_stub.py) | [example](adept/env/openai_gym.py))\n\n## Installation\n```bash\ngit clone https://github.com/heronsystems/adeptRL\ncd adeptRL\npip install -e .[all]\n```\n\n**From docker:**\n* [docker instructions](./docker/)\n\n## Quickstart\n**Train an Agent**\nLogs go to `/tmp/adept_logs/` by default. The log directory contains the \ntensorboard file, saved models, and other metadata.\n\n```bash\n# Local Mode (A2C)\n# We recommend 4GB+ GPU memory, 8GB+ RAM, 4+ Cores\npython -m adept.app local --env BeamRiderNoFrameskip-v4\n\n# Distributed Mode (A2C, requires NCCL)\n# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores\npython -m adept.app distrib --env BeamRiderNoFrameskip-v4\n\n# IMPALA (requires ray, resource intensive)\n# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores\npython -m adept.app actorlearner --env BeamRiderNoFrameskip-v4\n\n# To see a full list of options:\npython -m adept.app -h\npython -m adept.app help \u003ccommand\u003e\n```\n\n**Use your own Agent, Environment, Network, or SubModule**  \n```python\n\"\"\"\nmy_script.py\n\nTrain an agent on a single GPU.\n\"\"\"\nfrom adept.scripts.local import parse_args, main\nfrom adept.network import NetworkModule, SubModule1D\nfrom adept.agent import AgentModule\nfrom adept.env import EnvModule\n\n\nclass MyAgent(AgentModule):\n    pass  # Implement\n\n\nclass MyEnv(EnvModule):\n    pass  # Implement\n\n\nclass MyNet(NetworkModule):\n    pass  # Implement\n\n\nclass MySubModule1D(SubModule1D):\n    pass  # Implement\n\n\nif __name__ == '__main__':\n    import adept\n    adept.register_agent(MyAgent)\n    adept.register_env(MyEnv)\n    adept.register_network(MyNet)\n    adept.register_submodule(MySubModule1D)\n    main(parse_args())\n```\n* Call your script like this: `python my_script.py --agent MyAgent --env \nenv-id-1 --custom-network MyNet`\n* You can see all the args [here](adept/scripts/local.py) or how to implement\n the stubs in the examples section above.\n\n## Features\n### Scripts\n**Local (Single-node, Single-GPU)**\n* Best place to [start](adept/scripts/local.py) if you're trying to understand code.\n\n**Distributed (Multi-node, Multi-GPU)**\n* Uses NCCL backend to all-reduce gradients across GPUs without a parameter \nserver or host process.\n* Supports NVLINK and InfiniBand to reduce communication overhead\n* InfiniBand untested since we do not have a setup to test on.\n\n**Importance Weighted Actor Learner Architectures, IMPALA (Single Node, Multi-GPU)**\n* Our implementation uses GPU workers rather than CPU workers for forward \npasses.\n* On Atari we achieve ~4k SPS = ~16k FPS with two GPUs and an 8-core CPU.\n* \"Note that the shallow IMPALA experiment completes training over 200 \nmillion frames in less than one hour.\"\n* IMPALA official experiments use 48 cores.\n* Ours: 2000 frame / (second * # CPU core) DeepMind: 1157 frame / (second * # CPU core)\n* Does not yet support multiple nodes or direct GPU memory transfers.\n\n### Agents\n* Advantage Actor Critic, A2C ([paper](https://arxiv.org/pdf/1708.05144.pdf) | [code](adept/agents/actor_critic.py))\n* Actor Critic Vtrace, IMPALA ([paper](https://arxiv.org/pdf/1802.01561.pdf) | [code](https://arxiv.org/pdf/1802.01561.pdf))\n\n### Networks\n* Modular Network Interface: supports arbitrary input and output shapes up to\n 4D via a SubModule API.\n* Stateful networks (ie. LSTMs)\n* Batch normalization ([paper](https://arxiv.org/pdf/1502.03167.pdf))\n\n### Environments\n* OpenAI Gym Atari\n\n## Performance\n* ~ 3,000 Steps/second = 12,000 FPS (Atari)\n  * Local Mode\n  * 64 environments\n  * GeForce 2080 Ti\n  * Ryzen 2700x 8-core\n* Used to win a \n[Doom competition](https://www.crowdai.org/challenges/visual-doom-ai-competition-2018-track-2) \n(Ben Bell / Marv2in)\n![architecture](images/benchmark.png)\n* Trained for 50M Steps / 200M Frames\n* Up to 30 no-ops at start of each episode\n* Evaluated on different seeds than trained on\n* Architecture: [Four Convs](./adept/networks/net3d/four_conv.py) (F=32)\nfollowed by an [LSTM](./adept/networks/net1d/lstm.py) (F=512)\n* Reproduce with `python -m adept.app local --logdir ~/local64_benchmark --eval \n-y --nb-step 50e6 --env \u003cenv-id\u003e`\n\n## Acknowledgements\nWe borrow pieces of OpenAI's [gym](https://github.com/openai/gym) and \n[baselines](https://github.com/openai/baselines) code. We indicate where this\n is done.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheronsystems%2Fadeptrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheronsystems%2Fadeptrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheronsystems%2Fadeptrl/lists"}