{"id":19185509,"url":"https://github.com/rmst/rlrd","last_synced_at":"2025-05-08T00:32:55.252Z","repository":{"id":123043419,"uuid":"328128347","full_name":"rmst/rlrd","owner":"rmst","description":"PyTorch implementation of our paper Reinforcement Learning with Random Delays (ICLR 2020)","archived":false,"fork":false,"pushed_at":"2022-05-25T19:45:36.000Z","size":1540,"stargazers_count":40,"open_issues_count":1,"forks_count":9,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-20T05:32:14.402Z","etag":null,"topics":["deep-learning","deep-reinforcement-learning","pytorch","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rmst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-09T10:23:54.000Z","updated_at":"2025-02-19T03:20:47.000Z","dependencies_parsed_at":"2023-04-28T04:01:31.416Z","dependency_job_id":null,"html_url":"https://github.com/rmst/rlrd","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmst%2Frlrd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmst%2Frlrd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmst%2Frlrd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmst%2Frlrd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rmst","download_url":"https://codeload.github.com/rmst/rlrd/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252978397,"owners_count":21834907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","deep-reinforcement-learning","pytorch","reinforcement-learning"],"created_at":"2024-11-09T11:10:46.394Z","updated_at":"2025-05-08T00:32:55.244Z","avatar_url":"https://github.com/rmst.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Reinforcement Learning with Random Delays\n\nPyTorch implementation of our paper [Reinforcement Learning with Random Delays (ICLR 2020)](https://openreview.net/forum?id=QFYnKlBJYR) – [[Arxiv]](https://arxiv.org/abs/2010.02966)\n\n### Getting Started\nThis repository can be pip-installed via:\n```bash\npip install git+https://github.com/rmst/rlrd.git\n```\n\nDC/AC can be run on a simple 1-step delayed `Pendulum-v0` task via:\n```bash\npython -m rlrd run rlrd:DcacTraining Env.id=Pendulum-v0\n```\n\nHyperparameters can be set via command line. E.g.:\n```bash\npython -m rlrd run rlrd:DcacTraining \\\nEnv.id=Pendulum-v0 \\\nEnv.min_observation_delay=0 \\\nEnv.sup_observation_delay=2 \\\nEnv.min_action_delay=0 \\\nEnv.sup_action_delay=3 \\\nAgent.batchsize=128 \\\nAgent.memory_size=1000000 \\\nAgent.lr=0.0003 \\\nAgent.discount=0.99 \\\nAgent.target_update=0.005 \\\nAgent.reward_scale=5.0 \\\nAgent.entropy_scale=1.0 \\\nAgent.start_training=10000 \\\nAgent.device=cuda \\\nAgent.training_steps=1.0 \\\nAgent.loss_alpha=0.2 \\\nAgent.Model.hidden_units=256 \\\nAgent.Model.num_critics=2\n```\n\nNote that our gym wrapper adds a constant 1-step delay to the action delay, i.e. ```Env.min_action_delay=0``` actually means that the minimum action delay is 1 whereas ```Env.min_observation_delay=0``` means that the minimum observation delay is 0 (we assume that the action delay cannot be less than 1 time-step, e.g. for action inference).\nFor instance:\n- ```Env.min_observation_delay=0 Env.sup_observation_delay=2``` means that the observation delay is randomly 0 or 1.\n- ```Env.min_action_delay=0 Env.sup_action_delay=2``` means that the action delay is randomly 1 or 2.\n- ```Env.min_observation_delay=1 Env.sup_observation_delay=2``` means that the observation delay is always 1.\n- ```Env.min_observation_delay=0 Env.sup_observation_delay=3``` means that the observation delay is randomly 0, 1 or 2.\n- etc.\n\n\n### Mujoco Experiments\nTo install Mujoco, follow the instructions at [openai/gym](https://github.com/openai/gym).\nThe following environments were used in the paper:\n\n![MuJoCo](resources/mujoco_horizontal.png)\n\n\nTo train DC/AC on a 1-step delayed version of `HalfCheetah-v2`, run:\n```bash\npython -m rlrd run rlrd:DcacTraining Env.id=HalfCheetah-v2\n```\n\nTo train SAC on a 1-step delayed version of `Ant-v2` run:\n```bash\npython -m rlrd run rlrd:DelayedSacTraining Env.id=Ant-v2\n```\n\n### Weights and Biases API\nYour curves can be exported directly to the Weights and Biases (wandb) website by using `run-wandb`.\nFor example, to run DC/AC on Pendulum with a 1-step delay and export the curves to your wanb project:\n\n```terminal\npython -m rlrd run-wandb \\\nyourWandbID \\\nyourWandbProjectName \\\naNameForTheWandbRun \\\naFileNameForLocalCheckpoints \\\nrlrd:DcacTraining Env.id=Pendulum-v0\n```\n\nUse the optional hyperparameters descibed before to play with more meaningful delays.\n\n### Contribute / known issues\nContributions are welcome.\nPlease submit a PR with your name in the contributors list.\n\nWe did not yet optimize our python implementation of DC/AC, this is the most important thing to do right now as it is quite slow.\n\nIn particular, a lot of time is wasted when artificially re-creating a batched tensor for computing the value estimates in one forward pass, and the replay buffer is inefficient.\nSee the `#FIXME` in [dcac.py](https://github.com/rmst/rlrd/blob/master/rlrd/dcac.py)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frmst%2Frlrd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frmst%2Frlrd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frmst%2Frlrd/lists"}