{"id":13935677,"url":"https://github.com/lcswillems/torch-ac","last_synced_at":"2025-10-10T05:29:29.377Z","repository":{"id":50268978,"uuid":"179939383","full_name":"lcswillems/torch-ac","owner":"lcswillems","description":"Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO","archived":false,"fork":false,"pushed_at":"2022-10-05T23:17:23.000Z","size":24,"stargazers_count":192,"open_issues_count":6,"forks_count":66,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-27T03:34:32.318Z","etag":null,"topics":["a2c","a3c","actor-critic","advantage-actor-critic","deep-reinforcement-learning","minigrid","multi-process","ppo","proximal-policy-optimization","pytorch","recurrent","recurrent-neural-networks","reinforcement-learning","reward-shaping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lcswillems.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-07T08:47:24.000Z","updated_at":"2024-11-19T07:48:00.000Z","dependencies_parsed_at":"2023-01-19T06:15:14.232Z","dependency_job_id":null,"html_url":"https://github.com/lcswillems/torch-ac","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lcswillems/torch-ac","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcswillems%2Ftorch-ac","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcswillems%2Ftorch-ac/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcswillems%2Ftorch-ac/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcswillems%2Ftorch-ac/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lcswillems","download_url":"https://codeload.github.com/lcswillems/torch-ac/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcswillems%2Ftorch-ac/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002877,"owners_count":26083468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["a2c","a3c","actor-critic","advantage-actor-critic","deep-reinforcement-learning","minigrid","multi-process","ppo","proximal-policy-optimization","pytorch","recurrent","recurrent-neural-networks","reinforcement-learning","reward-shaping"],"created_at":"2024-08-07T23:01:59.072Z","updated_at":"2025-10-10T05:29:29.351Z","avatar_url":"https://github.com/lcswillems.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# PyTorch Actor-Critic deep reinforcement learning algorithms: A2C and PPO\n\nThe `torch_ac` package contains the PyTorch implementation of two Actor-Critic deep reinforcement learning algorithms:\n\n- [Synchronous A3C (A2C)](https://arxiv.org/pdf/1602.01783.pdf)\n- [Proximal Policy Optimizations (PPO)](https://arxiv.org/pdf/1707.06347.pdf)\n\n**Note:** An example of use of this package is given in the [`rl-starter-files` repository](https://github.com/lcswillems/rl-starter-files). More details below.\n\n## Features\n\n- **Recurrent policies**\n- Reward shaping\n- Handle observation spaces that are tensors or _dict of tensors_\n- Handle _discrete_ action spaces\n- Observation preprocessing\n- Multiprocessing\n- CUDA\n\n## Installation\n\n```bash\npip3 install torch-ac\n```\n\n**Note:** If you want to modify `torch-ac` algorithms, you will need to rather install a cloned version, i.e.:\n```\ngit clone https://github.com/lcswillems/torch-ac.git\ncd torch-ac\npip3 install -e .\n```\n\n## Package components overview\n\nA brief overview of the components of the package:\n\n- `torch_ac.A2CAlgo` and `torch_ac.PPOAlgo` classes for A2C and PPO algorithms\n- `torch_ac.ACModel` and `torch_ac.RecurrentACModel` abstract classes for non-recurrent and recurrent actor-critic models\n- `torch_ac.DictList` class for making dictionnaries of lists list-indexable and hence batch-friendly\n\n## Package components details\n\nHere are detailled the most important components of the package.\n\n`torch_ac.A2CAlgo` and `torch_ac.PPOAlgo` have 2 methods:\n- `__init__` that may take, among the other parameters:\n    - an `acmodel` actor-critic model, i.e. an instance of a class inheriting from either `torch_ac.ACModel` or `torch_ac.RecurrentACModel`.\n    - a `preprocess_obss` function that transforms a list of observations into a list-indexable object `X` (e.g. a PyTorch tensor). The default `preprocess_obss` function converts observations into a PyTorch tensor.\n    - a `reshape_reward` function that takes into parameter an observation `obs`, the action `action` taken, the reward `reward` received and the terminal status `done` and returns a new reward. By default, the reward is not reshaped.\n    - a `recurrence` number to specify over how many timesteps gradient is backpropagated. This number is only taken into account if a recurrent model is used and **must divide** the `num_frames_per_agent` parameter and, for PPO, the `batch_size` parameter.\n- `update_parameters` that first collects experiences, then update the parameters and finally returns logs.\n\n`torch_ac.ACModel` has 2 abstract methods:\n- `__init__` that takes into parameter an `observation_space` and an `action_space`.\n- `forward` that takes into parameter N preprocessed observations `obs` and returns a PyTorch distribution `dist` and a tensor of values `value`. The tensor of values **must be** of size N, not N x 1.\n\n`torch_ac.RecurrentACModel` has 3 abstract methods:\n- `__init__` that takes into parameter the same parameters than `torch_ac.ACModel`.\n- `forward` that takes into parameter the same parameters than `torch_ac.ACModel` along with a tensor of N memories `memory` of size N x M where M is the size of a memory. It returns the same thing than `torch_ac.ACModel` plus a tensor of N memories `memory`.\n- `memory_size` that returns the size M of a memory.\n\n**Note:** The `preprocess_obss` function must return a list-indexable object (e.g. a PyTorch tensor). If your observations are dictionnaries, your `preprocess_obss` function may first convert a list of dictionnaries into a dictionnary of lists and then make it list-indexable using the `torch_ac.DictList` class as follow:\n\n```python\n\u003e\u003e\u003e d = DictList({\"a\": [[1, 2], [3, 4]], \"b\": [[5], [6]]})\n\u003e\u003e\u003e d.a\n[[1, 2], [3, 4]]\n\u003e\u003e\u003e d[0]\nDictList({\"a\": [1, 2], \"b\": [5]})\n```\n\n**Note:** if you use a RNN, you will need to set `batch_first` to `True`.\n\n## Examples\n\nExamples of use of the package components are given in the [`rl-starter-scripts` repository](https://github.com/lcswillems/torch-rl).\n\n### Example of use of `torch_ac.A2CAlgo` and `torch_ac.PPOAlgo`\n\n```python\n...\n\nalgo = torch_ac.PPOAlgo(envs, acmodel, args.frames_per_proc, args.discount, args.lr, args.gae_lambda,\n                        args.entropy_coef, args.value_loss_coef, args.max_grad_norm, args.recurrence,\n                        args.optim_eps, args.clip_eps, args.epochs, args.batch_size, preprocess_obss)\n\n...\n\nexps, logs1 = algo.collect_experiences()\nlogs2 = algo.update_parameters(exps)\n```\n\nMore details [here](https://github.com/lcswillems/rl-starter-files/blob/master/scripts/train.py).\n\n### Example of use of `torch_ac.DictList`\n\n```python\ntorch_ac.DictList({\n    \"image\": preprocess_images([obs[\"image\"] for obs in obss], device=device),\n    \"text\": preprocess_texts([obs[\"mission\"] for obs in obss], vocab, device=device)\n})\n```\n\nMore details [here](https://github.com/lcswillems/rl-starter-files/blob/master/utils/format.py).\n\n### Example of implementation of `torch_ac.RecurrentACModel`\n\n```python\nclass ACModel(nn.Module, torch_ac.RecurrentACModel):\n    ...\n\n    def forward(self, obs, memory):\n        ...\n\n        return dist, value, memory\n```\n\nMore details [here](https://github.com/lcswillems/rl-starter-files/blob/master/model.py).\n\n### Examples of `preprocess_obss` functions\n\nMore details [here](https://github.com/lcswillems/rl-starter-files/blob/master/utils/format.py).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flcswillems%2Ftorch-ac","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flcswillems%2Ftorch-ac","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flcswillems%2Ftorch-ac/lists"}