{"id":13563154,"url":"https://github.com/pytorch/rl","last_synced_at":"2025-05-11T05:47:43.206Z","repository":{"id":37077764,"uuid":"454479855","full_name":"pytorch/rl","owner":"pytorch","description":"A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.","archived":false,"fork":false,"pushed_at":"2025-05-09T16:47:18.000Z","size":127853,"stargazers_count":2735,"open_issues_count":269,"forks_count":365,"subscribers_count":40,"default_branch":"main","last_synced_at":"2025-05-11T05:47:31.877Z","etag":null,"topics":["ai","control","decision-making","distributed-computing","machine-learning","marl","model-based-reinforcement-learning","multi-agent-reinforcement-learning","pytorch","reinforcement-learning","rl","robotics","torch"],"latest_commit_sha":null,"homepage":"https://pytorch.org/rl","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-01T17:17:36.000Z","updated_at":"2025-05-10T16:25:03.000Z","dependencies_parsed_at":"2023-12-13T18:34:29.782Z","dependency_job_id":"c2dfbd93-775b-4158-ac73-a95742936a2f","html_url":"https://github.com/pytorch/rl","commit_stats":{"total_commits":1554,"total_committers":161,"mean_commits":9.652173913043478,"dds":0.6634491634491635,"last_synced_commit":"36545af5062821dada2cdb91594209442d3dd0e6"},"previous_names":["facebookresearch/rl"],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Frl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Frl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Frl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Frl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/rl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253523733,"owners_count":21921818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","control","decision-making","distributed-computing","machine-learning","marl","model-based-reinforcement-learning","multi-agent-reinforcement-learning","pytorch","reinforcement-learning","rl","robotics","torch"],"created_at":"2024-08-01T13:01:15.700Z","updated_at":"2025-05-11T05:47:43.190Z","avatar_url":"https://github.com/pytorch.png","language":"Python","funding_links":[],"categories":["Python","Other","Industry Strength Reinforcement Learning","Uncategorized","漏洞库_漏洞靶场"],"sub_categories":["Uncategorized","资源传输下载"],"readme":"[![Unit-tests](https://github.com/pytorch/rl/actions/workflows/test-linux.yml/badge.svg)](https://github.com/pytorch/rl/actions/workflows/test-linux.yml)\n[![Documentation](https://img.shields.io/badge/Documentation-blue.svg)](https://pytorch.org/rl/)\n[![Benchmarks](https://img.shields.io/badge/Benchmarks-blue.svg)](https://pytorch.github.io/rl/dev/bench/)\n[![codecov](https://codecov.io/gh/pytorch/rl/branch/main/graph/badge.svg?token=HcpK1ILV6r)](https://codecov.io/gh/pytorch/rl)\n[![Twitter Follow](https://img.shields.io/twitter/follow/torchrl1?style=social)](https://twitter.com/torchrl1)\n[![Python version](https://img.shields.io/pypi/pyversions/torchrl.svg)](https://www.python.org/downloads/)\n[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/pytorch/rl/blob/main/LICENSE)\n\u003ca href=\"https://pypi.org/project/torchrl\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/torchrl\" alt=\"pypi version\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pypi.org/project/torchrl-nightly\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/torchrl-nightly?label=nightly\" alt=\"pypi nightly version\"\u003e\u003c/a\u003e\n[![Downloads](https://static.pepy.tech/personalized-badge/torchrl?period=total\u0026units=international_system\u0026left_color=blue\u0026right_color=orange\u0026left_text=Downloads)](https://pepy.tech/project/torchrl)\n[![Downloads](https://static.pepy.tech/personalized-badge/torchrl-nightly?period=total\u0026units=international_system\u0026left_color=blue\u0026right_color=orange\u0026left_text=Downloads%20(nightly))](https://pepy.tech/project/torchrl-nightly)\n[![Discord Shield](https://dcbadge.vercel.app/api/server/cZs26Qq3Dd)](https://discord.gg/cZs26Qq3Dd)\n\n# TorchRL\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/source/_static/img/icon.png\"  width=\"200\" \u003e\n\u003c/p\u003e\n\n[**Documentation**](#documentation-and-knowledge-base) | [**TensorDict**](#writing-simplified-and-portable-rl-codebase-with-tensordict) |\n[**Features**](#features) | [**Examples, tutorials and demos**](#examples-tutorials-and-demos) | [**Citation**](#citation) | [**Installation**](#installation) |\n[**Asking a question**](#asking-a-question) | [**Contributing**](#contributing)\n\n**TorchRL** is an open-source Reinforcement Learning (RL) library for PyTorch.\n\n## Key features\n\n- 🐍 **Python-first**: Designed with Python as the primary language for ease of use and flexibility\n- ⏱️ **Efficient**: Optimized for performance to support demanding RL research applications\n- 🧮 **Modular, customizable, extensible**: Highly modular architecture allows for easy swapping, transformation, or creation of new components\n- 📚 **Documented**: Thorough documentation ensures that users can quickly understand and utilize the library\n- ✅ **Tested**: Rigorously tested to ensure reliability and stability\n- ⚙️ **Reusable functionals**: Provides a set of highly reusable functions for cost functions, returns, and data processing\n\n### Design Principles\n\n- 🔥 **Aligns with PyTorch ecosystem**: Follows the structure and conventions of popular PyTorch libraries\n  (e.g., dataset pillar, transforms, models, data utilities)\n- ➖ Minimal dependencies: Only requires Python standard library, NumPy, and PyTorch; optional dependencies for\n  common environment libraries (e.g., OpenAI Gym) and datasets (D4RL, OpenX...)\n\nRead the [full paper](https://arxiv.org/abs/2306.00577) for a more curated description of the library.\n\n## Getting started\n\nCheck our [Getting Started tutorials](https://pytorch.org/rl/stable/index.html#getting-started) for quickly ramp up with the basic \nfeatures of the library!\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/ppo.png\"  width=\"800\" \u003e\n\u003c/p\u003e\n\n## Documentation and knowledge base\n\nThe TorchRL documentation can be found [here](https://pytorch.org/rl).\nIt contains tutorials and the API reference.\n\nTorchRL also provides a RL knowledge base to help you debug your code, or simply\nlearn the basics of RL. Check it out [here](https://pytorch.org/rl/stable/reference/knowledge_base.html).\n\nWe have some introductory videos for you to get to know the library better, check them out:\n\n- [TalkRL podcast](https://www.talkrl.com/episodes/vincent-moens-on-torchrl)\n- [TorchRL intro at PyTorch day 2022](https://youtu.be/cIKMhZoykEE)\n- [PyTorch 2.0 Q\u0026A: TorchRL](https://www.youtube.com/live/myEfUoYrbts?feature=share)\n\n## Spotlight publications\n\nTorchRL being domain-agnostic, you can use it across many different fields. Here are a few examples:\n\n- [ACEGEN](https://pubs.acs.org/doi/10.1021/acs.jcim.4c00895): Reinforcement Learning of Generative Chemical Agents\n  for Drug Discovery\n- [BenchMARL](https://www.jmlr.org/papers/v25/23-1612.html): Benchmarking Multi-Agent Reinforcement Learning\n- [BricksRL](https://arxiv.org/abs/2406.17490): A Platform for Democratizing Robotics and Reinforcement Learning\n  Research and Education with LEGO\n- [OmniDrones](https://ieeexplore.ieee.org/abstract/document/10409589): An Efficient and Flexible Platform for Reinforcement Learning in Drone Control\n- [RL4CO](https://arxiv.org/abs/2306.17100): an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark\n- [Robohive](https://proceedings.neurips.cc/paper_files/paper/2023/file/8a84a4341c375b8441b36836bb343d4e-Paper-Datasets_and_Benchmarks.pdf): A unified framework for robot learning\n\n## Writing simplified and portable RL codebase with `TensorDict`\n\nRL algorithms are very heterogeneous, and it can be hard to recycle a codebase\nacross settings (e.g. from online to offline, from state-based to pixel-based \nlearning).\nTorchRL solves this problem through [`TensorDict`](https://github.com/pytorch/tensordict/),\na convenient data structure\u003csup\u003e(1)\u003c/sup\u003e that can be used to streamline one's\nRL codebase.\nWith this tool, one can write a *complete PPO training script in less than 100\nlines of code*!\n\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  import torch\n  from tensordict.nn import TensorDictModule\n  from tensordict.nn.distributions import NormalParamExtractor\n  from torch import nn\n  \n  from torchrl.collectors import SyncDataCollector\n  from torchrl.data.replay_buffers import TensorDictReplayBuffer, \\\n    LazyTensorStorage, SamplerWithoutReplacement\n  from torchrl.envs.libs.gym import GymEnv\n  from torchrl.modules import ProbabilisticActor, ValueOperator, TanhNormal\n  from torchrl.objectives import ClipPPOLoss\n  from torchrl.objectives.value import GAE\n  \n  env = GymEnv(\"Pendulum-v1\") \n  model = TensorDictModule(\n    nn.Sequential(\n        nn.Linear(3, 128), nn.Tanh(),\n        nn.Linear(128, 128), nn.Tanh(),\n        nn.Linear(128, 128), nn.Tanh(),\n        nn.Linear(128, 2),\n        NormalParamExtractor()\n    ),\n    in_keys=[\"observation\"],\n    out_keys=[\"loc\", \"scale\"]\n  )\n  critic = ValueOperator(\n    nn.Sequential(\n        nn.Linear(3, 128), nn.Tanh(),\n        nn.Linear(128, 128), nn.Tanh(),\n        nn.Linear(128, 128), nn.Tanh(),\n        nn.Linear(128, 1),\n    ),\n    in_keys=[\"observation\"],\n  )\n  actor = ProbabilisticActor(\n    model,\n    in_keys=[\"loc\", \"scale\"],\n    distribution_class=TanhNormal,\n    distribution_kwargs={\"low\": -1.0, \"high\": 1.0},\n    return_log_prob=True\n    )\n  buffer = TensorDictReplayBuffer(\n    storage=LazyTensorStorage(1000),\n    sampler=SamplerWithoutReplacement(),\n    batch_size=50,\n    )\n  collector = SyncDataCollector(\n    env,\n    actor,\n    frames_per_batch=1000,\n    total_frames=1_000_000,\n  )\n  loss_fn = ClipPPOLoss(actor, critic)\n  adv_fn = GAE(value_network=critic, average_gae=True, gamma=0.99, lmbda=0.95)\n  optim = torch.optim.Adam(loss_fn.parameters(), lr=2e-4)\n  \n  for data in collector:  # collect data\n    for epoch in range(10):\n        adv_fn(data)  # compute advantage\n        buffer.extend(data)\n        for sample in buffer:  # consume data\n            loss_vals = loss_fn(sample)\n            loss_val = sum(\n                value for key, value in loss_vals.items() if\n                key.startswith(\"loss\")\n                )\n            loss_val.backward()\n            optim.step()\n            optim.zero_grad()\n    print(f\"avg reward: {data['next', 'reward'].mean().item(): 4.4f}\")\n  ```\n  \u003c/details\u003e\n\nHere is an example of how the [environment API](https://pytorch.org/rl/stable/reference/envs.html)\nrelies on tensordict to carry data from one function to another during a rollout\nexecution:\n![Alt Text](https://github.com/pytorch/rl/blob/main/docs/source/_static/img/rollout.gif)\n\n`TensorDict` makes it easy to re-use pieces of code across environments, models and\nalgorithms.\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n  \n  For instance, here's how to code a rollout in TorchRL:\n\n  ```diff\n  - obs, done = env.reset()\n  + tensordict = env.reset()\n  policy = SafeModule(\n      model,\n      in_keys=[\"observation_pixels\", \"observation_vector\"],\n      out_keys=[\"action\"],\n  )\n  out = []\n  for i in range(n_steps):\n  -     action, log_prob = policy(obs)\n  -     next_obs, reward, done, info = env.step(action)\n  -     out.append((obs, next_obs, action, log_prob, reward, done))\n  -     obs = next_obs\n  +     tensordict = policy(tensordict)\n  +     tensordict = env.step(tensordict)\n  +     out.append(tensordict)\n  +     tensordict = step_mdp(tensordict)  # renames next_observation_* keys to observation_*\n  - obs, next_obs, action, log_prob, reward, done = [torch.stack(vals, 0) for vals in zip(*out)]\n  + out = torch.stack(out, 0)  # TensorDict supports multiple tensor operations\n  ```\n  \u003c/details\u003e\n\nUsing this, TorchRL abstracts away the input / output signatures of the modules, env, \ncollectors, replay buffers and losses of the library, allowing all primitives\nto be easily recycled across settings.\n\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  Here's another example of an off-policy training loop in TorchRL (assuming \n  that a data collector, a replay buffer, a loss and an optimizer have been instantiated):\n  \n  ```diff\n  - for i, (obs, next_obs, action, hidden_state, reward, done) in enumerate(collector):\n  + for i, tensordict in enumerate(collector):\n  -     replay_buffer.add((obs, next_obs, action, log_prob, reward, done))\n  +     replay_buffer.add(tensordict)\n      for j in range(num_optim_steps):\n  -         obs, next_obs, action, hidden_state, reward, done = replay_buffer.sample(batch_size)\n  -         loss = loss_fn(obs, next_obs, action, hidden_state, reward, done)\n  +         tensordict = replay_buffer.sample(batch_size)\n  +         loss = loss_fn(tensordict)\n          loss.backward()\n          optim.step()\n          optim.zero_grad()\n  ```\n  This training loop can be re-used across algorithms as it makes a minimal number of assumptions about the structure of the data.\n  \u003c/details\u003e\n\n  TensorDict supports multiple tensor operations on its device and shape\n  (the shape of TensorDict, or its batch size, is the common arbitrary N first dimensions of all its contained tensors):\n\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  # stack and cat\n  tensordict = torch.stack(list_of_tensordicts, 0)\n  tensordict = torch.cat(list_of_tensordicts, 0)\n  # reshape\n  tensordict = tensordict.view(-1)\n  tensordict = tensordict.permute(0, 2, 1)\n  tensordict = tensordict.unsqueeze(-1)\n  tensordict = tensordict.squeeze(-1)\n  # indexing\n  tensordict = tensordict[:2]\n  tensordict[:, 2] = sub_tensordict\n  # device and memory location\n  tensordict.cuda()\n  tensordict.to(\"cuda:1\")\n  tensordict.share_memory_()\n  ```\n  \u003c/details\u003e\n\nTensorDict comes with a dedicated [`tensordict.nn`](https://pytorch.github.io/tensordict/reference/nn.html)\nmodule that contains everything you might need to write your model with it.\nAnd it is `functorch` and `torch.compile` compatible!\n\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```diff\n  transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)\n  + td_module = SafeModule(transformer_model, in_keys=[\"src\", \"tgt\"], out_keys=[\"out\"])\n  src = torch.rand((10, 32, 512))\n  tgt = torch.rand((20, 32, 512))\n  + tensordict = TensorDict({\"src\": src, \"tgt\": tgt}, batch_size=[20, 32])\n  - out = transformer_model(src, tgt)\n  + td_module(tensordict)\n  + out = tensordict[\"out\"]\n  ```\n\n  The `TensorDictSequential` class allows to branch sequences of `nn.Module` instances in a highly modular way.\n  For instance, here is an implementation of a transformer using the encoder and decoder blocks:\n  ```python\n  encoder_module = TransformerEncoder(...)\n  encoder = TensorDictSequential(encoder_module, in_keys=[\"src\", \"src_mask\"], out_keys=[\"memory\"])\n  decoder_module = TransformerDecoder(...)\n  decoder = TensorDictModule(decoder_module, in_keys=[\"tgt\", \"memory\"], out_keys=[\"output\"])\n  transformer = TensorDictSequential(encoder, decoder)\n  assert transformer.in_keys == [\"src\", \"src_mask\", \"tgt\"]\n  assert transformer.out_keys == [\"memory\", \"output\"]\n  ```\n\n  `TensorDictSequential` allows to isolate subgraphs by querying a set of desired input / output keys:\n  ```python\n  transformer.select_subsequence(out_keys=[\"memory\"])  # returns the encoder\n  transformer.select_subsequence(in_keys=[\"tgt\", \"memory\"])  # returns the decoder\n  ```\n  \u003c/details\u003e\n\n  Check [TensorDict tutorials](https://pytorch.github.io/tensordict/) to\n  learn more!\n\n\n## Features\n\n- A common [interface for environments](https://github.com/pytorch/rl/blob/main/torchrl/envs)\n  which supports common libraries (OpenAI gym, deepmind control lab, etc.)\u003csup\u003e(1)\u003c/sup\u003e and state-less execution \n  (e.g. Model-based environments).\n  The [batched environments](https://github.com/pytorch/rl/blob/main/torchrl/envs/batched_envs.py) containers allow parallel execution\u003csup\u003e(2)\u003c/sup\u003e.\n  A common PyTorch-first class of [tensor-specification class](https://github.com/pytorch/rl/blob/main/torchrl/data/tensor_specs.py) is also provided.\n  TorchRL's environments API is simple but stringent and specific. Check the \n  [documentation](https://pytorch.org/rl/stable/reference/envs.html)\n  and [tutorial](https://pytorch.org/rl/stable/tutorials/pendulum.html) to learn more!\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  env_make = lambda: GymEnv(\"Pendulum-v1\", from_pixels=True)\n  env_parallel = ParallelEnv(4, env_make)  # creates 4 envs in parallel\n  tensordict = env_parallel.rollout(max_steps=20, policy=None)  # random rollout (no policy given)\n  assert tensordict.shape == [4, 20]  # 4 envs, 20 steps rollout\n  env_parallel.action_spec.is_in(tensordict[\"action\"])  # spec check returns True\n  ```\n  \u003c/details\u003e\n\n- multiprocess and distributed [data collectors](https://github.com/pytorch/rl/blob/main/torchrl/collectors/collectors.py)\u003csup\u003e(2)\u003c/sup\u003e\n  that work synchronously or asynchronously.\n  Through the use of TensorDict, TorchRL's training loops are made very similar\n  to regular training loops in supervised\n  learning (although the \"dataloader\" -- read data collector -- is modified on-the-fly):\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  env_make = lambda: GymEnv(\"Pendulum-v1\", from_pixels=True)\n  collector = MultiaSyncDataCollector(\n      [env_make, env_make],\n      policy=policy,\n      devices=[\"cuda:0\", \"cuda:0\"],\n      total_frames=10000,\n      frames_per_batch=50,\n      ...\n  )\n  for i, tensordict_data in enumerate(collector):\n      loss = loss_module(tensordict_data)\n      loss.backward()\n      optim.step()\n      optim.zero_grad()\n      collector.update_policy_weights_()\n  ```\n  \u003c/details\u003e\n\n  Check our [distributed collector examples](https://github.com/pytorch/rl/blob/main/examples/distributed/collectors) to\n  learn more about ultra-fast data collection with TorchRL.\n\n- efficient\u003csup\u003e(2)\u003c/sup\u003e and generic\u003csup\u003e(1)\u003c/sup\u003e [replay buffers](https://github.com/pytorch/rl/blob/main/torchrl/data/replay_buffers/replay_buffers.py) with modularized storage:\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  storage = LazyMemmapStorage(  # memory-mapped (physical) storage\n      cfg.buffer_size,\n      scratch_dir=\"/tmp/\"\n  )\n  buffer = TensorDictPrioritizedReplayBuffer(\n      alpha=0.7,\n      beta=0.5,\n      collate_fn=lambda x: x,\n      pin_memory=device != torch.device(\"cpu\"),\n      prefetch=10,  # multi-threaded sampling\n      storage=storage\n  )\n  ```\n  \u003c/details\u003e\n\n  Replay buffers are also offered as wrappers around common datasets for *offline RL*:\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  from torchrl.data.replay_buffers import SamplerWithoutReplacement\n  from torchrl.data.datasets.d4rl import D4RLExperienceReplay\n  data = D4RLExperienceReplay(\n      \"maze2d-open-v0\",\n      split_trajs=True,\n      batch_size=128,\n      sampler=SamplerWithoutReplacement(drop_last=True),\n  )\n  for sample in data:  # or alternatively sample = data.sample()\n      fun(sample)\n  ```\n  \u003c/details\u003e\n\n\n- cross-library [environment transforms](https://github.com/pytorch/rl/blob/main/torchrl/envs/transforms/transforms.py)\u003csup\u003e(1)\u003c/sup\u003e,\n  executed on device and in a vectorized fashion\u003csup\u003e(2)\u003c/sup\u003e,\n  which process and prepare the data coming out of the environments to be used by the agent:\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  env_make = lambda: GymEnv(\"Pendulum-v1\", from_pixels=True)\n  env_base = ParallelEnv(4, env_make, device=\"cuda:0\")  # creates 4 envs in parallel\n  env = TransformedEnv(\n      env_base,\n      Compose(\n          ToTensorImage(),\n          ObservationNorm(loc=0.5, scale=1.0)),  # executes the transforms once and on device\n  )\n  tensordict = env.reset()\n  assert tensordict.device == torch.device(\"cuda:0\")\n  ```\n  Other transforms include: reward scaling (`RewardScaling`), shape operations (concatenation of tensors, unsqueezing etc.), concatenation of\n  successive operations (`CatFrames`), resizing (`Resize`) and many more.\n\n  Unlike other libraries, the transforms are stacked as a list (and not wrapped in each other), which makes it\n  easy to add and remove them at will:\n  ```python\n  env.insert_transform(0, NoopResetEnv())  # inserts the NoopResetEnv transform at the index 0\n  ```\n  Nevertheless, transforms can access and execute operations on the parent environment:\n  ```python\n  transform = env.transform[1]  # gathers the second transform of the list\n  parent_env = transform.parent  # returns the base environment of the second transform, i.e. the base env + the first transform\n  ```\n  \u003c/details\u003e\n\n- various tools for distributed learning (e.g. [memory mapped tensors](https://github.com/pytorch/tensordict/blob/main/tensordict/memmap.py))\u003csup\u003e(2)\u003c/sup\u003e;\n- various [architectures](https://github.com/pytorch/rl/blob/main/torchrl/modules/models/) and models (e.g. [actor-critic](https://github.com/pytorch/rl/blob/main/torchrl/modules/tensordict_module/actors.py))\u003csup\u003e(1)\u003c/sup\u003e:\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  # create an nn.Module\n  common_module = ConvNet(\n      bias_last_layer=True,\n      depth=None,\n      num_cells=[32, 64, 64],\n      kernel_sizes=[8, 4, 3],\n      strides=[4, 2, 1],\n  )\n  # Wrap it in a SafeModule, indicating what key to read in and where to\n  # write out the output\n  common_module = SafeModule(\n      common_module,\n      in_keys=[\"pixels\"],\n      out_keys=[\"hidden\"],\n  )\n  # Wrap the policy module in NormalParamsWrapper, such that the output\n  # tensor is split in loc and scale, and scale is mapped onto a positive space\n  policy_module = SafeModule(\n      NormalParamsWrapper(\n          MLP(num_cells=[64, 64], out_features=32, activation=nn.ELU)\n      ),\n      in_keys=[\"hidden\"],\n      out_keys=[\"loc\", \"scale\"],\n  )\n  # Use a SafeProbabilisticTensorDictSequential to combine the SafeModule with a\n  # SafeProbabilisticModule, indicating how to build the\n  # torch.distribution.Distribution object and what to do with it\n  policy_module = SafeProbabilisticTensorDictSequential(  # stochastic policy\n      policy_module,\n      SafeProbabilisticModule(\n          in_keys=[\"loc\", \"scale\"],\n          out_keys=\"action\",\n          distribution_class=TanhNormal,\n      ),\n  )\n  value_module = MLP(\n      num_cells=[64, 64],\n      out_features=1,\n      activation=nn.ELU,\n  )\n  # Wrap the policy and value funciton in a common module\n  actor_value = ActorValueOperator(common_module, policy_module, value_module)\n  # standalone policy from this\n  standalone_policy = actor_value.get_policy_operator()\n  ```\n  \u003c/details\u003e\n\n- exploration [wrappers](https://github.com/pytorch/rl/blob/main/torchrl/modules/tensordict_module/exploration.py) and\n  [modules](https://github.com/pytorch/rl/blob/main/torchrl/modules/models/exploration.py) to easily swap between exploration and exploitation\u003csup\u003e(1)\u003c/sup\u003e:\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ```python\n  policy_explore = EGreedyWrapper(policy)\n  with set_exploration_type(ExplorationType.RANDOM):\n      tensordict = policy_explore(tensordict)  # will use eps-greedy\n  with set_exploration_type(ExplorationType.DETERMINISTIC):\n      tensordict = policy_explore(tensordict)  # will not use eps-greedy\n  ```\n  \u003c/details\u003e\n\n- A series of efficient [loss modules](https://github.com/pytorch/rl/tree/main/torchrl/objectives)\n  and highly vectorized\n  [functional return and advantage](https://github.com/pytorch/rl/blob/main/torchrl/objectives/value/functional.py)\n  computation.\n\n  \u003cdetails\u003e\n    \u003csummary\u003eCode\u003c/summary\u003e\n\n  ### Loss modules\n  ```python\n  from torchrl.objectives import DQNLoss\n  loss_module = DQNLoss(value_network=value_network, gamma=0.99)\n  tensordict = replay_buffer.sample(batch_size)\n  loss = loss_module(tensordict)\n  ```\n\n  ### Advantage computation\n  ```python\n  from torchrl.objectives.value.functional import vec_td_lambda_return_estimate\n  advantage = vec_td_lambda_return_estimate(gamma, lmbda, next_state_value, reward, done, terminated)\n  ```\n\n  \u003c/details\u003e\n\n- a generic [trainer class](https://github.com/pytorch/rl/blob/main/torchrl/trainers/trainers.py)\u003csup\u003e(1)\u003c/sup\u003e that\n  executes the aforementioned training loop. Through a hooking mechanism,\n  it also supports any logging or data transformation operation at any given\n  time.\n\n- various [recipes](https://github.com/pytorch/rl/blob/main/torchrl/trainers/helpers/models.py) to build models that\n    correspond to the environment being deployed.\n\nIf you feel a feature is missing from the library, please submit an issue!\nIf you would like to contribute to new features, check our [call for contributions](https://github.com/pytorch/rl/issues/509) and our [contribution](https://github.com/pytorch/rl/blob/main/CONTRIBUTING.md) page.\n\n\n## Examples, tutorials and demos\n\nA series of [State-of-the-Art implementations](https://github.com/pytorch/rl/blob/main/sota-implementations/) are provided with an illustrative purpose:\n\n\u003ctable\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003cstrong\u003eAlgorithm\u003c/strong\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\u003cstrong\u003eCompile Support**\u003c/strong\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\u003cstrong\u003eTensordict-free API\u003c/strong\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\u003cstrong\u003eModular Losses\u003c/strong\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\u003cstrong\u003eContinuous and Discrete\u003c/strong\u003e\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/dqn\"\u003eDQN\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 1.9x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e NA\n   \u003c/td\u003e\n   \u003ctd\u003e + (through \u003ca href=\"https://pytorch.org/rl/stable/reference/generated/torchrl.envs.transforms.ActionDiscretizer.html?highlight=actiondiscretizer\"\u003eActionDiscretizer\u003c/a\u003e transform)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py\"\u003eDDPG\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 1.87x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/iql/\"\u003eIQL\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 3.22x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py\"\u003eCQL\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 2.68x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py\"\u003eTD3\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 2.27x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/td3_bc/td3_bc.py\"\u003eTD3+BC\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/pytorch/rl/blob/main/examples/a2c/\"\u003eA2C\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 2.67x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e -\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/\"\u003ePPO\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 2.42x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e -\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py\"\u003eSAC\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 2.62x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e -\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py\"\u003eREDQ\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e 2.28x\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e -\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py\"\u003eDreamer v1\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e + (\u003ca href=\"https://pytorch.org/rl/stable/reference/objectives.html#dreamer\"\u003edifferent classes\u003c/a\u003e)\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer\"\u003eDecision Transformers\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e NA\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/crossq\"\u003eCrossQ\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/gail\"\u003eGail\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e NA\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/impala\"\u003eImpala\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e -\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/iql.py\"\u003eIQL (MARL)\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/maddpg_iddpg.py\"\u003eDDPG (MARL)\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e - (continuous only)\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/mappo_ippo.py\"\u003ePPO (MARL)\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e -\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/qmix_vdn.py\"\u003eQMIX-VDN (MARL)\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e NA\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/sac.py\"\u003eSAC (MARL)\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e untested\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e -\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\u003ca href=\"https://github.com/pytorch/rl/blob/main/examples/rlhf\"\u003eRLHF\u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e NA\n   \u003c/td\u003e\n   \u003ctd\u003e +\n   \u003c/td\u003e\n   \u003ctd\u003e NA\n   \u003c/td\u003e\n   \u003ctd\u003e NA\n   \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on\n  architecture and device.\n\nand many more to come!\n\n[Code examples](examples/) displaying toy code snippets and training scripts are also available \n- [RLHF](examples/rlhf)\n- [Memory-mapped replay buffers](examples/torchrl_features)\n\n\nCheck the [examples](https://github.com/pytorch/rl/blob/main/sota-implementations/) directory for more details \nabout handling the various configuration settings.\n\nWe also provide [tutorials and demos](https://pytorch.org/rl/stable#tutorials) that give a sense of\nwhat the library can do.\n\n## Citation\n\nIf you're using TorchRL, please refer to this BibTeX entry to cite this work:\n```\n@misc{bou2023torchrl,\n      title={TorchRL: A data-driven decision-making library for PyTorch}, \n      author={Albert Bou and Matteo Bettini and Sebastian Dittert and Vikash Kumar and Shagun Sodhani and Xiaomeng Yang and Gianni De Fabritiis and Vincent Moens},\n      year={2023},\n      eprint={2306.00577},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n\n## Installation\n\nCreate a conda environment where the packages will be installed.\n\n```\nconda create --name torch_rl python=3.9\nconda activate torch_rl\n```\n\n**PyTorch**\n\nDepending on the use of functorch that you want to make, you may want to \ninstall the latest (nightly) PyTorch release or the latest stable version of PyTorch.\nSee [here](https://pytorch.org/get-started/locally/) for a detailed list of commands, \nincluding `pip3` or other special installation instructions.\n\n**Torchrl**\n\nYou can install the **latest stable release** by using\n```bash\npip3 install torchrl\n```\nThis should work on linux, Windows 10 and OsX (Intel or Silicon chips).\nOn certain Windows machines (Windows 11), one should install the library locally (see below).\n\nFor AArch64 machines, the binaries are not yet stored on PyPI so you will need to download them directly from\nthe [release page](https://github.com/pytorch/rl/releases/) or install the library via\n```\npip3 install git+https://github.com/pytorch/rl@v0.8.0\n```\n\nThe **nightly build** can be installed via\n```bash\npip3 install tensordict-nightly torchrl-nightly\n```\nwhich we currently only ship for Linux machines.\nImportantly, the nightly builds require the nightly builds of PyTorch too.\n\nTo install extra dependencies, call\n```bash\npip3 install \"torchrl[atari,dm_control,gym_continuous,rendering,tests,utils,marl,open_spiel,checkpointing]\"\n```\nor a subset of these.\n\nTo install torchrl with the latest pytorch, use\n```bash\npip3 install \"torchrl[replay_buffer]\"\n```\nsince some features in the replay buffer require PyTorch 2.7.0 or above.\n\nOne may also desire to install the library locally. Three main reasons can motivate this:\n- the nightly/stable release isn't available for one's platform (eg, Windows 11, nightlies for Apple Silicon etc.);\n- contributing to the code;\n- install torchrl with a previous version of PyTorch (any version \u003e= 2.1) (note that this should also be doable via a regular install followed\n  by a downgrade to a previous pytorch version -- but the C++ binaries will not be available so some feature will not work,  \n  such as prioritized replay buffers and the like.)\n\n  **Disclaimer**: As of today, TorchRL is roughly compatible with any pytorch version \u003e= 2.1 and installing it will not\n  directly require a newer version of pytorch to be installed. Indirectly though, tensordict still requires the latest\n  PyTorch to be installed and we are working hard to loosen that requirement. \n  The C++ binaries of TorchRL (mainly for prioritized replay buffers) will only work with PyTorch 2.7.0 and above.\n  Some features (e.g., working with nested jagged tensors) may also\n  be limited with older versions of pytorch. It is recommended to use the latest TorchRL with the latest PyTorch version\n  unless there is a strong reason not to do so.\n\nTo install the library locally, start by cloning the repo:\n```bash\ngit clone https://github.com/pytorch/rl\n```\nand don't forget to check out the branch or tag you want to use for the build:\n```bash\ngit checkout v0.8.0\n```\n\nGo to the directory where you have cloned the torchrl repo and install it (after\ninstalling `ninja`)\n```bash\ncd /path/to/torchrl/\npip3 install ninja -U\npython setup.py develop\n```\n\nOne can also build the wheels to distribute to co-workers using\n```bash\npython setup.py bdist_wheel\n```\nYour wheels will be stored there `./dist/torchrl\u003cname\u003e.whl` and installable via\n```bash\npip install torchrl\u003cname\u003e.whl\n```\n\n**Warning**: Unfortunately, `pip3 install -e .` does not currently work. Contributions to help fix this are welcome!\n\nOn M1 machines, this should work out-of-the-box with the nightly build of PyTorch.\nIf the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message\n`(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))` appears, then try\n\n```\nARCHFLAGS=\"-arch arm64\" python setup.py develop\n```\n\nTo run a quick sanity check, leave that directory (e.g. by executing `cd ~/`)\nand try to import the library.\n```\npython -c \"import torchrl\"\n```\nThis should not return any warning or error.\n\n**Optional dependencies**\n\nThe following libraries can be installed depending on the usage one wants to\nmake of torchrl:\n```\n# diverse\npip3 install tqdm tensorboard \"hydra-core\u003e=1.1\" hydra-submitit-launcher\n\n# rendering\npip3 install \"moviepy\u003c2.0.0\"\n\n# deepmind control suite\npip3 install dm_control\n\n# gym, atari games\npip3 install \"gym[atari]\" \"gym[accept-rom-license]\" pygame\n\n# tests\npip3 install pytest pyyaml pytest-instafail\n\n# tensorboard\npip3 install tensorboard\n\n# wandb\npip3 install wandb\n```\n\n**Troubleshooting**\n\nIf a `ModuleNotFoundError: No module named ‘torchrl._torchrl` errors occurs (or\na warning indicating that the C++ binaries could not be loaded),\nit means that the C++ extensions were not installed or not found.\n\n- One common reason might be that you are trying to import torchrl from within the\n  git repo location. The following code snippet should return an error if\n  torchrl has not been installed in `develop` mode:\n  ```\n  cd ~/path/to/rl/repo\n  python -c 'from torchrl.envs.libs.gym import GymEnv'\n  ```\n  If this is the case, consider executing torchrl from another location.\n- If you're not importing torchrl from within its repo location, it could be\n  caused by a problem during the local installation. Check the log after the\n  `python setup.py develop`. One common cause is a g++/C++ version discrepancy\n  and/or a problem with the `ninja` library.\n- If the problem persists, feel free to open an issue on the topic in the repo,\n  we'll make our best to help!\n- On **MacOs**, we recommend installing XCode first. \n  With Apple Silicon M1 chips, make sure you are using the arm64-built python\n  (e.g. [here](https://betterprogramming.pub/how-to-install-pytorch-on-apple-m1-series-512b3ad9bc6)).\n  Running the following lines of code\n  ```\n  wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py\n  python collect_env.py\n  ```\n  should display\n  ```\n  OS: macOS *** (arm64)\n  ```\n  and not\n  ```\n  OS: macOS **** (x86_64)\n  ```\n\nVersioning issues can cause error message of the type ```undefined symbol```\nand such. For these, refer to the [versioning issues document](https://github.com/pytorch/rl/blob/main/knowledge_base/VERSIONING_ISSUES.md)\nfor a complete explanation and proposed workarounds.\n\n## Asking a question\n\nIf you spot a bug in the library, please raise an issue in this repo.\n\nIf you have a more generic question regarding RL in PyTorch, post it on\nthe [PyTorch forum](https://discuss.pytorch.org/c/reinforcement-learning/6).\n\n## Contributing\n\nInternal collaborations to torchrl are welcome! Feel free to fork, submit issues and PRs.\nYou can checkout the detailed contribution guide [here](https://github.com/pytorch/rl/blob/main/CONTRIBUTING.md).\nAs mentioned above, a list of open contributions can be found in [here](https://github.com/pytorch/rl/issues/509).\n\nContributors are recommended to install [pre-commit hooks](https://pre-commit.com/) (using `pre-commit install`). pre-commit will check for linting related issues when the code is committed locally. You can disable th check by appending `-n` to your commit command: `git commit -m \u003ccommit message\u003e -n`\n\n\n## Disclaimer\n\nThis library is released as a PyTorch beta feature.\nBC-breaking changes are likely to happen but they will be introduced with a deprecation\nwarranty after a few release cycles.\n\n# License\nTorchRL is licensed under the MIT License. See [LICENSE](https://github.com/pytorch/rl/blob/main/LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Frl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2Frl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Frl/lists"}