{"id":20272155,"url":"https://github.com/mpatacchiola/imujoco","last_synced_at":"2025-09-22T13:30:36.694Z","repository":{"id":173918565,"uuid":"651481850","full_name":"mpatacchiola/imujoco","owner":"mpatacchiola","description":"Official repository of the iMuJoCo (iMitation MuJoCo) dataset","archived":false,"fork":false,"pushed_at":"2023-09-16T17:52:03.000Z","size":44,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-14T12:53:49.702Z","etag":null,"topics":["imitation-learning","mujoco","offline-learning","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mpatacchiola.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-09T10:30:56.000Z","updated_at":"2024-10-25T10:20:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"0cdf0788-f248-486a-840a-ad5a805fee14","html_url":"https://github.com/mpatacchiola/imujoco","commit_stats":null,"previous_names":["mpatacchiola/imujoco"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpatacchiola%2Fimujoco","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpatacchiola%2Fimujoco/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpatacchiola%2Fimujoco/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mpatacchiola%2Fimujoco/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mpatacchiola","download_url":"https://codeload.github.com/mpatacchiola/imujoco/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233851069,"owners_count":18740155,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["imitation-learning","mujoco","offline-learning","reinforcement-learning"],"created_at":"2024-11-14T12:42:07.193Z","updated_at":"2025-09-22T13:30:36.315Z","avatar_url":"https://github.com/mpatacchiola.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![arXiv](https://img.shields.io/badge/arXiv-2206.09843-b31b1b.svg)](https://arxiv.org/abs/2306.13554)\n\nOfficial repository of the iMuJoCo (iMitation MuJoCo) dataset, an offline dataset for imitation learning. Presented in: \n\n*\"Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation\", Patacchiola M., Sun M., Hofmann K., Turner R.E., Conference on Lifelong Learning Agents - CoLLAs 2023* [[arXiv]](https://arxiv.org/abs/2306.13554)\n\n**Overview**: iMuJoCo builds on top of OpenAI-Gym MuJoCo providing a heterogeneous benchmark for training and testing imitation learning methods and offline RL methods. Heterogeneity is achieved by producing a large number of variants of three base environments: Hopper, Halfcheetah, and Walker2d. For each variant a policy has been trained via SAC, then the policy has been used to generate 100 offline trajectories. \n\n**What's included?** iMuJoCo includes (1) 100 trajectories from pretrained policies for each environment variant (`./imujoco/dataset` folder), (2) pretrained SAC policies for each variant (`./imujoco/policies` folder), and (3) XML files builder for each environment (`./imujoco/xml` folder). The user can access the environment variant (via the OpenAI-Gym API and the XML configuration file), the offline trajectories (via a Python/Pytorch data loader), and the underlying SAC policy network (using the Stable Baselines API).\n\nThe overall structure of iMuJoCo is the following:\n\n```\n./imujoco\n    dataset\n        sac-halfcheetah-jointdec_25_bfoot.npz\n        sac-halfcheetah-jointdec_25_bshin.npz\n        ...\n        \n    policies\n        sac-halfcheetah-jointdec_25_bfoot.zip\n        sac-halfcheetah-jointdec_25_bshin.zip\n        ...\n        \n    xml\n        halfcheetah-jointdec_25_bfoot.xml\n        halfcheetah-jointdec_25_bshin.xml\n        ...\n```\n\n\nDifference with previous benchmarks\n----------------------------------\n\nA few benchmarks have been proposed to address meta-learning and offline learning in RL, such as [Meta-World](https://meta-world.github.io/), [Procgen](https://github.com/openai/procgen), and [D4RL](https://arxiv.org/abs/2004.07219). However, differently from the standard meta-learning setting, in imitation learning we need a large variety of offline trajectories, collected from policies trained on heterogeneous environments. \n\nExisting benchmarks are not suited for this case as they: do not provide pretrained policies and their associated trajectories (e.g. Meta-World and Procgen), lack in diversity (Meta-World and D4RL), or do not support continuous control problems (e.g. Procgen).\n\nEnvironment variants\n--------------------\n\nEach environment variant falls into one of these four categories:\n\n- **mass**: increase or decrease the mass of a limb by a percentage; e.g. if the mass is 2.5 and the percentage is 200% then the new mass for that limb will be 7.5.\n- **joint**: limit the mobility of a joint by a percentage range, e.g. if the joint range is 180 degrees and the percentage is -50% then the maximum range of motion becomes 90 degrees.\n- **length**: increase or decrease the length of a limb by a percentage; e.g. if the length of a limb is 1.5 and the percentage is 150% then the new length will be 3.75.\n- **friction**: increase or decrease the friction by a percentage (only for body parts that are in contact with the floor); e.g. if the friction is 1.9 and the percentage is -50% then the new friction will be 0.95.\n\nNote that each environment has unique dynamics and agent configurations, resulting in different numbers of variants. Specifically, we have 37 variants for Hopper, 53 for Halfcheetah, and 64 for Walker2d, making a total of 154 variants.\n\nInstallation\n------------\n\n1. Requirements: there are no particular requirements, you need to install Numpy and Pytorch to use the sampler, [OpenAI-Gym](https://github.com/openai/gym) and [StableBaselines3](https://stable-baselines3.readthedocs.io) for loading the environment/policies.\n\n2. Clone the repository `git clone https://github.com/mpatacchiola/imujoco.git` and set it as current folder with `cd imujoco`\n\n3. Download the dataset files (approximately **2.7 GB**) from our page on [zenodo.org](https://zenodo.org/record/7971395):\n \n```\nwget https://zenodo.org/record/7971395/files/xml.zip\nwget https://zenodo.org/record/7971395/files/policies.zip\nwget https://zenodo.org/record/7971395/files/dataset.zip\n```\n\n4. Unzip the files into the `imujoco` folder: \n\n```\nunzip xml.zip\nunzip policies.zip\nunzip dataset.zip\n```\n\nUsage\n------\n\n**Sampling offline trajectories**\n\nIn iMuJoCo there are a set of trajectories collected by agents trained using SAC. There is a total of 100 trajectores per each environment variant, which are stored as numpy compressed files (npz) into the `./dataset` folder. The following is an example of how to use [sampler.py](./sampler.py) to sample offline trajectories (pytorch).\n\n```python\nimport os\nfrom sampler import Sampler\n\nenv_name = \"Hopper-v3\" # can be: 'Hopper-v3', 'HalfCheetah-v3', 'Walker2d-v3'.\n\n# Here we simply accumulate all the npz files for Hopper-v3.\nfiles_list = list()\nfor filename in os.listdir(\"./dataset\"):\n    if filename.endswith(\".npz\") and env_name.lower()[0:-2] in filename: \n             files_list.append(os.path.abspath(\"./dataset/\"+filename))\nprint(\"\\n\", files_list, \"\\n\")\n\n# Defining train/test samplers by allocating 75% of the \n# trajectories for training and 25% for testing.\ntrain_sampler = Sampler(env_name=env_name, data_list=files_list, portion=(0.0,0.75))\ntest_sampler = Sampler(env_name=env_name, data_list=files_list, portion=(0.75,1.0))\n\n# Sampling 5 trajectories (without replacement) using the train sampler\n# The sampler returns the states/actions tensor for the sequences.\nx, y = train_sampler.sample(tot_shots=5, replace=False)\n```\n\n**Loading one of the environment variants**\n\nEach environment variant can be loaded as a standard OpenAI-Gym env, by using the XML file associated to it. For instance, here we load an environment for Hopper where the mass of the leg has been decreased by 25%:\n\n```python\nimport gym\nimport os\n\nenv_name = \"Hopper-v3\"\nxml_file = os.path.abspath(\"./xml/hopper-massdec_25_leggeom.xml\")\n\n# Generate and reset the env.\nenv = gym.make(env_name, xml_file=xml_file)\nenv.reset()\n\n# Move in the env with a random policy for one episode.\nfor _ in range(1000):\n    action = env.action_space.sample() \n    observation, reward, done, _ = env.step(action)\n    if done: break\n\nenv.close()\n```\n\n**Loading a pretrained SAC policy**\n\nEach environment variant has an associated SAC policy that has been trained on it. For this stage we used [Stable Baselines v3](https://stable-baselines3.readthedocs.io).\n\nHere is an example on how to load a pretrained policy for its associated environment and evaluate its performance:\n\n```python\nimport os\nfrom stable_baselines3 import SAC\nimport gym\n\ndef evaluate_policy(env, model, tot_episodes, deterministic=True): \n    model.policy.actor.eval()\n    episode_reward_list = list()\n    for episode in range(tot_episodes):\n        obs_t = env.reset()\n        cumulated_reward = 0.0\n        for i in range(1000):            \n            action, _states = model.predict(obs_t, deterministic=deterministic)\n            obs_t1, reward, done, info = env.step(action)\n            cumulated_reward += reward\n            if done: obs_t1 = env.reset()   \n            obs_t = obs_t1           \n        episode_reward_list.append(cumulated_reward)\n    return episode_reward_list\n\n\nenv_name = \"Hopper-v3\"\neval_episodes = 10\nxml_file = os.path.abspath(\"./xml/hopper-massdec_25_leggeom.xml\")\npolicy_file = os.path.abspath(\"./policies/sac-hopper-massdec_25_leggeom.zip\")\n\n# Create the Gym env.\nenv = gym.make(env_name, xml_file=xml_file)\n\n# Create the SAC env and load the pretrained policy\nmodel = SAC(\"MlpPolicy\", env, verbose=1)\nmodel = SAC.load(policy_file)\n\n# Evaluate the policy on the associated environment\nreward_list = evaluate_policy(env, model, tot_episodes=eval_episodes)\nprint(f\"Average reward on {eval_episodes} episodes .... {sum(reward_list)/len(reward_list)}\")\n```\n\nCitation\n--------\n\n```bibtex\n@inproceedings{patacchiola2023comparing,\n  title={Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation},\n  author={Patacchiola, Massimiliano and Sun, Mingfei and Hofmann, Katja and Turner, Richard E},\n  booktitle={Conference on Lifelong Learning Agents},\n  year={2023}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpatacchiola%2Fimujoco","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmpatacchiola%2Fimujoco","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmpatacchiola%2Fimujoco/lists"}