{"id":20497793,"url":"https://github.com/thowell/rs","last_synced_at":"2025-04-13T18:41:23.564Z","repository":{"id":247259368,"uuid":"825355713","full_name":"thowell/rs","owner":"thowell","description":"A simple JAX-based implementation of random search for locomotion tasks using MuJoCo XLA (MJX).","archived":false,"fork":false,"pushed_at":"2024-07-18T15:06:51.000Z","size":3592,"stargazers_count":11,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T09:22:23.993Z","etag":null,"topics":["jax","learning","locomotion","mjx","motor-control","mujoco","optimization","parallel-computing","random-search","robotics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thowell.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-07T14:59:31.000Z","updated_at":"2025-03-12T18:45:10.000Z","dependencies_parsed_at":"2024-07-18T17:34:34.086Z","dependency_job_id":"fe206cdd-3bfa-4162-9cb9-b740cdc9cebf","html_url":"https://github.com/thowell/rs","commit_stats":null,"previous_names":["thowell/rs"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thowell%2Frs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thowell%2Frs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thowell%2Frs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thowell%2Frs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thowell","download_url":"https://codeload.github.com/thowell/rs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248764376,"owners_count":21158074,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jax","learning","locomotion","mjx","motor-control","mujoco","optimization","parallel-computing","random-search","robotics"],"created_at":"2024-11-15T18:12:02.409Z","updated_at":"2025-04-13T18:41:23.542Z","avatar_url":"https://github.com/thowell.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Random Search\nA simple [JAX](https://github.com/google/jax)-based implementation of [random search](https://arxiv.org/abs/1803.07055) for [locomotion tasks](https://github.com/openai/gym/tree/master/gym/envs/mujoco) using [MuJoCo XLA (MJX)](https://mujoco.readthedocs.io/en/stable/mjx.html).\n\n## Installation\nClone the repository:\n```sh\ngit clone https://github.com/thowell/rs\n```\n\nOptionally, create a conda environment:\n```sh\nconda create -n rs python=3.10\nconda activate rs\n```\n\npip install:\n```sh\npip install -e .\n```\n\n## Train cheetah\nTrain cheetah in ~1 minute with [Nvidia RTX 4090](https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4090/) on [Ubuntu 22.04.4 LTS](https://releases.ubuntu.com/jammy/).\n\n\u003cimg src=\"assets/cheetah.gif\" alt=\"drawing\" /\u003e\n\nRun:\n```sh\npython rs/train.py --env cheetah --search --visualize --nsample 2048 --ntop 512 --niter 50 --neval 5 --nhorizon_search 200 --nhorizon_eval 1000 --random_step 0.1 --update_step 0.1\n```\n\nOutput:\n```\nSettings:\n environment: cheetah\n  nsample: 2048 | ntop: 512\n  niter: 50 | neval: 5\n  nhorizon_search: 200 | nhorizon_eval: 1000\n  random_step: 0.1 | update_step: 0.1\n  nenveval: 128\n  reward_shift: 0.0\nSearch:\niteration (10 / 50): reward = 1172.42 +- 1144.11 | time = 17.52 | avg episode length: 1000 / 1000 | global steps: 8232960 | steps/second: 470022\niteration (20 / 50): reward = 2947.71 +- 1237.87 | time = 5.58 | avg episode length: 1000 / 1000 | global steps: 16465920 | steps/second: 1474670\niteration (30 / 50): reward = 3152.07 +- 1401.50 | time = 5.58 | avg episode length: 1000 / 1000 | global steps: 24698880 | steps/second: 1475961\niteration (40 / 50): reward = 4175.49 +- 783.41 | time = 5.59 | avg episode length: 1000 / 1000 | global steps: 32931840 | steps/second: 1472244\niteration (50 / 50): reward = 4293.36 +- 784.80 | time = 5.59 | avg episode length: 1000 / 1000 | global steps: 41164800 | steps/second: 1473380\n\ntotal time: 56.43\n```\n\nThe pretrained policy can be visualized in MuJoCo's passive viewer:\n```\npython train.py --env cheetah --load pretrained/cheetah --visualize\n```\n\n## Environments\nEnvironments available:\n\n- [Ant](rs/envs/ant.py)\n  - based on [ant_v5](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/mujoco/ant_v5.py)\n  - modified solver settings\n  - only contact between feet and floor\n  - no rewards or observations dependent on contact forces\n- [Cheetah](rs/envs/cheetah.py)\n  - based on [half_cheetah_v5](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/mujoco/half_cheetah_v5.py)\n  - modified solver settings\n- [Humanoid](rs/envs/humanoid.py)\n  - based on [humanoid_v5](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/mujoco/humanoid_v5.py)\n  - modified solver settings\n  - only contact between feet and floor\n  - no rewards or observations dependent on contact forces\n- [Walker](rs/envs/walker.py)\n  - based on [walker2d_v5](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/mujoco/walker2d_v5.py)\n  - modified solver settings\n  - only contact between feet and floor\n\n\n## Usage\n**Note**: run multiple times to find good policies.\n\nFirst, change to `rs/` directory:\n```sh\ncd rs\n```\n\n### Ant\nSearch:\n```sh\npython train.py --env ant --search\n```\n\nVisualize policy checkpoint:\n```sh\npython train.py --env ant --mode visualize --load pretrained/ant\n```\n\n### Cheetah\nSearch:\n```sh\npython train.py --env cheetah --search\n```\n\nVisualize policy checkpoint:\n```sh\npython train.py --env cheetah --load pretrained/cheetah --visualize\n```\n\n### Humanoid\nSearch:\n```sh\npython train.py --env humanoid --search\n```\n\nVisualize policy checkpoint:\n```sh\npython train.py --env humanoid --load pretrained/humanoid --visualize\n```\n\n### Walker\nSearch:\n```sh\npython train.py --env walker --search\n```\n\nVisualize policy checkpoint:\n```sh\npython train.py --env walker --load pretrained/walker --visualize\n```\n\n### Command line arguments\nSetup:\n- `--env`: `ant`, `cheetah`, `humanoid`, `walker`\n- `--search`: run random search to improve policy\n- `--checkpoint`: filename in `checkpoint/` to save policy\n- `--load`: provide string in `checkpoint/` \ndirectory to load policy from checkpoint\n- `--seed`: int for random number generation\n- `--visualize`: visualize policy \n\nSearch settings:\n- `--nsample`: number of random directions to sample\n- `--ntop`: number of random directions to use for policy update\n- `--niter`: number of policy updates\n- `--neval`: number of policy evaluations during search\n- `--nhorizon_search`: number of environment steps during policy improvement\n- `--nhorizon_eval`: number of environment steps during policy evaluation\n- `--random_step`: step size for random direction during policy perturbation\n- `--update_step`: step size for policy update during policy improvement\n- `--nenveval`: number of environments for policy evaluation\n- `--reward_shift`: subtract baseline from per-timestep reward\n\n## Mapping notation from the paper to code\n$\\alpha$: `update_step`\n\n$\\nu$: `random_step`\n\n$N$: `nsample`\n\n$b$: `ntop`\n\n## Notes\n- The environments are based on the [v5 MuJoCo Gym environments](https://github.com/Farama-Foundation/Gymnasium/tree/main/gymnasium/envs/mujoco) but may not be exact in all details.\n- The search settings are based on [Simple random search provides a competitive approach to reinforcement learning: Table 9](https://arxiv.org/abs/1803.07055) but may not be exact in all details either.\n  \nThis repository was developed to:\n- understand the [Augmented Random Search](https://arxiv.org/abs/1803.07055) algorithm\n- understand how to compute numerically stable running statistics\n- understand the details of [Gym environments](https://github.com/openai/gym)\n- experiment with code generation tools that are useful for improving development times, including: [ChatGPT](https://pytorch.org/cppdocs/) and [Claude](https://claude.ai/)\n- gain experience with [MuJoCo XLA (MJX)](https://mujoco.readthedocs.io/en/stable/mjx.html)\n- gain experience with [JAX](https://github.com/google/jax)\n\nMuJoCo models use resources from [Gymnasium](https://github.com/Farama-Foundation/Gymnasium/tree/main/gymnasium/envs/mujoco) and [dm_control](https://github.com/google-deepmind/dm_control)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthowell%2Frs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthowell%2Frs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthowell%2Frs/lists"}