{"id":32946605,"url":"https://github.com/mees/calvin","last_synced_at":"2026-01-18T04:19:19.105Z","repository":{"id":38084473,"uuid":"387839793","full_name":"mees/calvin","owner":"mees","description":"CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks","archived":false,"fork":false,"pushed_at":"2025-09-08T08:52:01.000Z","size":1668,"stargazers_count":667,"open_issues_count":46,"forks_count":81,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-09-08T10:36:29.665Z","etag":null,"topics":["computer-vision","deep-learning","grounding","manipulation","natural-language-processing","pytorch","robotics","vision","vision-and-language","vision-language"],"latest_commit_sha":null,"homepage":"http://calvin.cs.uni-freiburg.de","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mees.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-07-20T15:43:26.000Z","updated_at":"2025-09-08T08:52:05.000Z","dependencies_parsed_at":"2023-02-09T02:46:00.509Z","dependency_job_id":"e9970efc-f7a9-42ac-88bb-d7bbc60ef645","html_url":"https://github.com/mees/calvin","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mees/calvin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mees%2Fcalvin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mees%2Fcalvin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mees%2Fcalvin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mees%2Fcalvin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mees","download_url":"https://codeload.github.com/mees/calvin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mees%2Fcalvin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28529500,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","grounding","manipulation","natural-language-processing","pytorch","robotics","vision","vision-and-language","vision-language"],"created_at":"2025-11-12T19:00:21.697Z","updated_at":"2026-01-18T04:19:19.098Z","avatar_url":"https://github.com/mees.png","language":"Python","funding_links":[],"categories":["Papers","4. Datasets and Benchmarks","Benchmarks","📦 Datasets \u0026 Benchmarks"],"sub_categories":["4.4 Evaluation Benchmarks","Medical \u0026 Assistive"],"readme":"# CALVIN\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/mees/calvin.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/mees/calvin/context:python)\n[![Total alerts](https://img.shields.io/lgtm/alerts/g/mees/calvin.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/mees/calvin/alerts/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n[\u003cb\u003eCALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks\u003c/b\u003e](https://arxiv.org/pdf/2112.03227.pdf)\n\n[Oier Mees](https://www.oiermees.com/), [Lukas Hermann](https://lukashermann.github.io/), [Erick Rosete](https://www.erickrosete.com/), [Wolfram Burgard](http://www2.informatik.uni-freiburg.de/~burgard)\n\n#### CALVIN won the 2022 IEEE Robotics and Automation Letters (RA-L) Best Paper Award!\n\n\n We present **CALVIN** (**C**omposing **A**ctions from **L**anguage and **Vi**sio**n**), an open-source simulated benchmark to learn long-horizon language-conditioned tasks.\nOur aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor\nsuites.\n\n![](media/teaser.png)\n\n# :computer:  Quick Start\nTo begin, clone this repository locally\n```bash\ngit clone --recurse-submodules https://github.com/mees/calvin.git\n$ export CALVIN_ROOT=$(pwd)/calvin\n\n```\nInstall requirements:\n```bash\n$ cd $CALVIN_ROOT\n$ conda create -n calvin_venv python=3.8  # or use virtualenv\n$ conda activate calvin_venv\n$ sh install.sh\n```\nIf you encounter problems installing pyhash, you might have to downgrade setuptools to a version below 58.\n\nDownload dataset (choose which split you want to download with the argument `D`, `ABC` or `ABCD`): \\\nIf you want to get started without downloading the whole dataset, use the argument `debug` to download a small debug dataset (1.3 GB).\n```bash\n$ cd $CALVIN_ROOT/dataset\n$ sh download_data.sh D | ABC | ABCD | debug\n```\n##\t:weight_lifting_man: Train Baseline Agent\nTrain baseline models:\n```bash\n$ cd $CALVIN_ROOT/calvin_models/calvin_agent\n$ python training.py datamodule.root_data_dir=/path/to/dataset/ datamodule/datasets=vision_lang_shm\n```\nThe `vision_lang_shm` option loads the CALVIN dataset into shared memory at the beginning of the training,\nspeeding up the data loading during training.\nThe preparation of the shared memory cache will take some time\n(approx. 20 min at our SLURM cluster). \\\nIf you want to use the original data loader (e.g. for debugging) just override the command with `datamodule/datasets=vision_lang`. \\\nFor an additional speed up, you can disable the evaluation callbacks during training by adding `~callbacks/rollout` and `~callbacks/rollout_lh`\n\nYou want to scale your training to a multi-gpu setup? Just specify the [number of GPUs](https://pytorch-lightning.readthedocs.io/en/latest/advanced/multi_gpu.html#select-gpu-devices) and DDP will automatically be used\n for training thanks to [Pytorch Lightning](https://www.pytorchlightning.ai/).\nTo train on all available GPUs:\n```bash\n$ python training.py trainer.gpus=-1\n```\nIf you have access to a Slurm cluster, follow this [guide](https://github.com/mees/calvin/blob/main/slurm_scripts/README.md).\n\nYou can use [Hydra's](https://hydra.cc/) flexible overriding system for changing hyperparameters.\nFor example, to train a model with  rgb images from both static camera and the gripper camera with relative actions:\n```bash\n$ python training.py datamodule/observation_space=lang_rgb_static_gripper_rel_act model/perceptual_encoder=gripper_cam\n```\nTo train a model with RGB-D from both cameras:\n```bash\n$ python training.py datamodule/observation_space=lang_rgbd_both model/perceptual_encoder=RGBD_both\n```\nTo train a model with rgb images from the static camera and visual tactile observations with absolute actions:\n```bash\n$ python training.py datamodule/observation_space=lang_rgb_static_tactile_abs_act model/perceptual_encoder=static_RGB_tactile\n```\n\nTo see all available hyperparameters:\n```console\n$ python training.py --help\n```\nTo resume a training, just override the hydra working directory :\n```console\n$ python training.py hydra.run.dir=runs/my_dir\n```\n\n## :framed_picture: Sensory Observations\n CALVIN  supports a range of sensors commonly utilized for visuomotor  control:\n1. **Static camera RGB images** - with shape `200x200x3`.\n2. **Static camera Depth maps** - with shape `200x200`.\n3. **Gripper camera RGB images** - with shape `84x84x3`.\n4. **Gripper camera Depth maps** - with shape `84x84`.\n5. **Tactile image** - with shape `120x160x6`.\n6. **Proprioceptive state** - EE position (3), EE orientation in euler angles (3), gripper width (1), joint positions (7), gripper action (1).\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"media/sensors.png\" alt=\"\" width=\"50%\"\u003e\n\u003c/p\u003e\n\n## :joystick: Action Space\nIn CALVIN, the  agent  must perform  closed-loop  continuous  control  to  follow  unconstrained  language  instructions  characterizing  complex  robot manipulation tasks, sending continuous actions to the robot at  30hz.\nIn  order  to  give  researchers  and  practitioners  the freedom to experiment with different action spaces, CALVIN supports  the following actions spaces:\n1. **Absolute cartesian pose**  - EE position (3), EE orientation in euler angles (3),  gripper action (1).\n2. **Relative cartesian displacement**  - EE position (3), EE orientation in euler angles (3),  gripper action (1).\n3. **Joint action** -  Joint positions (7),  gripper action (1).\n\nFor more information, please refer to this more detailed [README](https://github.com/mees/calvin/blob/main/dataset/README.md).\n\n## :muscle: Evaluation: The Calvin Challenge\n### Long-horizon Multi-task Language Control (LH-MTLC)\nThe  aim  of  the  CALVIN  benchmark  is  to  evaluate  the learning  of  long-horizon  language-conditioned  continuous control  policies.  In  this  setting,  a  single  agent  must  solve complex  manipulation  tasks  by  understanding  a  series  of unconstrained  language  expressions  in  a  row,  e.g.,  “open the  drawer. . . pick  up  the  blue  block. . . now  push  the  block into the drawer. . . now open the sliding door”.\nWe provide  an  evaluation  protocol  with  evaluation  modes  of varying  difficulty  by  choosing  different  combinations  of sensor  suites  and  amounts  of  training  environments.\nTo avoid a biased initial position, the robot is reset to a neutral position before every multi-step sequence.\n\nTo evaluate a trained calvin baseline agent, run the following command:\n\n```\n$ cd $CALVIN_ROOT/calvin_models/calvin_agent\n$ python evaluation/evaluate_policy.py --dataset_path \u003cPATH/TO/DATASET\u003e --train_folder \u003cPATH/TO/TRAINING/FOLDER\u003e\n```\nOptional arguments:\n\n- `--checkpoint \u003cPATH/TO/CHECKPOINT\u003e`: by default, the evaluation loads the last checkpoint in the training log directory.\nYou can instead specify the path to another checkpoint by adding this to the evaluation command.\n- `--debug`: print debug information and visualize environment.\n\nIf you want to evaluate your own model architecture on the CALVIN challenge, you can implement the `CustomModel` class in `evaluate_policy.py`\nas an interface to your agent. You need to implement the following methods:\n\n- \\_\\_init__():\n  gets called once at the beginning of the evaluation.\n- reset(): gets called at the beginning of each evaluation sequence.\n- step(obs, goal): gets called every step and returns the predicted action.\n\nThen evaluate the model by running:\n```\n$ python evaluation/evaluate_policy.py --dataset_path \u003cPATH/TO/DATASET\u003e --custom_model\n```\n\nYou are also free to use your own language model instead of using the precomputed language embeddings provided by CALVIN.\nFor this, implement `CustomLangEmbeddings` in `evaluate_policy.py` and add `--custom_lang_embeddings` to the evaluation command.\n\n### Multi-task Language Control (MTLC)\nAlternatively, you can evaluate the policy on single tasks and without resetting the robot to a neutral position.\nNote that this evaluation is currently only available for our baseline agent.\n```\n$ python evaluation/evaluate_policy_singlestep.py --dataset_path \u003cPATH/TO/DATASET\u003e --train_folder \u003cPATH/TO/TRAINING/FOLDER\u003e [--checkpoint \u003cPATH/TO/CHECKPOINT\u003e] [--debug]\n```\n\n### Pre-trained Model\nDownload the [MCIL](http://calvin.cs.uni-freiburg.de/model_weights/D_D_static_rgb_baseline.zip) model checkpoint trained on the static camera rgb images on environment D.\n```\n$ wget http://calvin.cs.uni-freiburg.de/model_weights/D_D_static_rgb_baseline.zip\n$ unzip D_D_static_rgb_baseline.zip\n```\n## :speech_balloon: Relabeling Raw Language Annotations\nYou want to try learning language conditioned policies in CALVIN with a new awesome language model?\n\nWe provide an [example script](https://github.com/mees/calvin/blob/main/calvin_models/calvin_agent/utils/relabel_with_new_lang_model.py) to relabel the annotations with different language model provided in [SBert](https://www.sbert.net/docs/pretrained_models.html), such as the larger MPNet (paraphrase-mpnet-base-v2) or its corresponding multilingual model (paraphrase-multilingual-mpnet-base-v2).\nThe supported options are \"mini\", \"mpnet\" and \"multi\". If you want to try different SBert models, just change the model name [here](https://github.com/mees/calvin/blob/main/calvin_models/calvin_agent/models/encoders/language_network.py#L18).\n```\ncd $CALVIN_ROOT/calvin_models/calvin_agent\npython utils/relabel_with_new_lang_model.py +path=$CALVIN_ROOT/dataset/task_D_D/ +name_folder=new_lang_model_folder model.nlp_model=mpnet\n```\nIf you additionally want to sample different language annotations for each sequence (from the same task annotations) in the training split run the same command with the parameter `reannotate=true`.\n\n## :chart_with_upwards_trend: SOTA Models\nOpen-source models that outperform the MCIL baselines from CALVIN:\n\nFor a detailed overview of the evaluation performances, have a look at our **[LEADERBOARD](http://calvin.cs.uni-freiburg.de/)**.\n\n\u003cbr\u003e\n\u003cb\u003e Grounding Language with Visual Affordances over Unstructured Data\u003c/b\u003e\n\u003cbr\u003e\nOier Mees, Jessica Borja-Diaz, Wolfram Burgard\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2210.01911.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/mees/hulc2\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies \u003c/b\u003e\n\u003cbr\u003e\nMoritz Reuss, Hongyi Zhou, Marcel Rühle, Ömer Erdinç Yağmurlu, Fabian Otto, Rudolf Lioutikov\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2509.04996\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://intuitive-robots.github.io/flower_vla/\"\u003e Code \u003c/a\u003e\n\n\n\u003cb\u003e Unified Vision-Language-Action Model \u003c/b\u003e\n\u003cbr\u003e\nYuqi Wang, Xinghang Li, Wenxuan Wang, Junbo Zhang, Yingyan Li, Yuntao Chen, Xinlong Wang, Zhaoxiang Zhang\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2506.19850\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://robertwyq.github.io/univla.github.io/\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation \u003c/b\u003e\n\u003cbr\u003e\nYang Tian, Sizhe Yang, Jia Zeng, Ping Wang, Dahua Lin, Hao Dong, Jiangmiao Pang\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2412.15109\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/OpenRobotLab/Seer/\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Diffusion Transformer Policy: Scaling Diffusion Transformer for Generalist Vision-Language-Action Learning \u003c/b\u003e\n\u003cbr\u003e\nZhi Hou, Tianyi Zhang, Yuwen Xiong, Hengjun Pu, Chengyang Zhao, Ronglei Tong, Yu Qiao, Jifeng Dai, Yuntao Chen\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2410.15959\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/zhihou7/dit_policy_vla\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy \u003c/b\u003e\n\u003cbr\u003e\nPeiyan Li, Hongtao Wu, Yan Huang, Chilam Cheang, Liang Wang, Tao Kong\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2408.14368\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/bytedance/GR-MG/\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e GHIL-Glue: Hierarchical Control with Filtered Subgoal Images \u003c/b\u003e\n\u003cbr\u003e\nKyle B Hatch, Ashwin Balakrishna, Oier Mees, Suraj Nair, Seohong Park, Blake Wulfe, Masha Itkina, Benjamin Eysenbach, Sergey Levine, Thomas Kollar, Benjamin Burchfiel\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2410.20018\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/kyle-hatch-tri/ghil-glue\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning \u003c/b\u003e\n\u003cbr\u003e\nMoritz Reuss, Jyothish Pari, Pulkit Agrawal, Rudolf Lioutikov\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2412.12953\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/intuitive-robots/MoDE_Diffusion_Policy\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits \u003c/b\u003e\n\u003cbr\u003e\nXuhui Kang, Yen-Ling Kuo\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2410.11013\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/Shua-Kang/TaKSIE\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation \u003c/b\u003e\n\u003cbr\u003e\nQingwen Bu, Jia Zeng, Li Chen, Yanchao Yang, Guyue Zhou, Junchi Yan, Ping Luo, Heming Cui, Yi Ma, Hongyang Li\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2409.09016\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/OpenDriveLab/CLOVER\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution \u003c/b\u003e\n\u003cbr\u003e\nYang Yue, Yulin Wang, Bingyi Kang, Yizeng Han, Shenzhi Wang, Shiji Song, Jiashi Feng, Gao Huang\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2411.02359\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/yueyang130/DeeR-VLA\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation \u003c/b\u003e\n\u003cbr\u003e\nFanfan Liu, Feng Yan, Liming Zheng, Yiyang Huang, Chengjian Feng, Lin Ma\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2406.18977v2\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/liufanfanlff/RoboUniview\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals \u003c/b\u003e\n\u003cbr\u003e\nMoritz Reuss, Ömer Erdinç Yağmurlu, Fabian Wenzel, Rudolf Lioutikov\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2407.05996\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/intuitive-robots/mdt_policy\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations\u003c/b\u003e\n\u003cbr\u003e\nTsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2402.10885.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/nickgkan/3d_diffuser_actor\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation\u003c/b\u003e\n\u003cbr\u003e\nHongtao Wu, Ya Jing, Chilam Cheang, Guangzeng Chen, Jiafeng Xu, Xinghang Li, Minghuan Liu, Hang Li, Tao Kong\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2312.13139.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/bytedance/GR-1\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Vision-Language Foundation Models as Effective Robot Imitators\u003c/b\u003e\n\u003cbr\u003e\nXinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, and Tao Kong\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2311.01378.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/RoboFlamingo/RoboFlamingo\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Zero-Shot Robotic Manipulation With Pretrained Image-Editing Diffusion Models\u003c/b\u003e\n\u003cbr\u003e\nKevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Walke, Chelsea Finn, Aviral Kumar, Sergey Levine\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2310.10639.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/kvablack/susie\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks\u003c/b\u003e\n\u003cbr\u003e\nEddie Zhang, Yujie Lu, William Wang, Amy Zhang\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2210.15629.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/ezhang7423/language-control-diffusion\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data\u003c/b\u003e\n\u003cbr\u003e\nOier Mees, Lukas Hermann, Wolfram Burgard\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2204.06252.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/lukashermann/hulc\"\u003e Code \u003c/a\u003e\n\n\u003cb\u003e Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data\u003c/b\u003e\n\u003cbr\u003e\nHongkuan Zhou, Zhenshan Bing, Xiangtong Yao, Xiaojie Su, Chenguang Yang, Kai Huang, Alios Knoll\n\u003cbr\u003e\n\u003ca href=\"https://arxiv.org/pdf/2305.19075.pdf\"\u003e Paper\u003c/a\u003e, \u003ca href=\"https://github.com/hk-zh/spil\"\u003e Code\n\nContact [Oier](https://www.oiermees.com/) to add your model here.\n\n## Reinforcement Learning with CALVIN\nAre you interested in trying  reinforcement learning agents for the different manipulation tasks in the CALVIN environment?\nWe provide a [google colab](https://github.com/mees/calvin/blob/main/RL_with_CALVIN.ipynb) to showcase how to leverage the CALVIN task indicators to learn RL agents with a sparse reward.\n\n## FAQ\n\n#### Why do you use EGL rendering?\nWe use EGL to move the bullet rendering from cpu (which is the default) to gpu, which is much faster.\nThis way, we can also do rollouts during the training of the agent to track its performance.\nBy changing from cpu to gpu, the rendered textures change slightly, so be aware of this if you plan on testing pretrained models.\n#### I am training with multiple GPUs and why am I get OOM errors during rollouts?\nPyBullet only recently added an option to select which GPU to use for rendering when using EGL (fix was commited in 3c4cb80\non Oct 22, 2021, see [here](https://github.com/bulletphysics/bullet3/blob/master/examples/OpenGLWindow/EGLOpenGLWindow.cpp#L134).\nIf you have an old version of PyBullet, there is no way to choose the GPU, which can lead to problems on cluster nodes with multiple GPUs, because all instances would be placed on the same GPU, slowing down the rendering and potentially leading to OOM erros.\n\nThe fix introduced an environment variable EGL_VISIBLE_DEVICES (similar to CUDA_VISIBLE_DEVICES) which lets you specify the GPU device to render on.\nHowever, there is one catch: On some machines, the device ids of CUDA and EGL do not match (e.g. CUDA device 0 could be EGL device 3).\nWe automatically handle this in our wrapper in calvin_env and find the corresponding egl device id, so you don't have to set EGL_VISIBLE_DEVICES yourself, see [here](https://github.com/mees/calvin_env/blob/main/calvin_env/envs/play_lmp_wrapper.py#L31).\n\n#### I am not interested in the manipulation tasks recorded, can I record different demonstration with teleop?\nYes, although it is not documented right now, all the code to record data with a VR headset is present in\ncalvin_env in [https://github.com/mees/calvin_env/blob/main/calvin_env/vrdatacollector.py](https://github.com/mees/calvin_env/blob/main/calvin_env/vrdatacollector.py)\n\n\n## Changelog\n### 24 Feb 2023\n- Wrong `scene_info.npy` in D dataset. Note that we have updated the corresponding checksum. Please replace as follows:\n```\ncd task_D_D\nwget http://calvin.cs.uni-freiburg.de/scene_info_fix/task_D_D_scene_info.zip\nunzip task_D_D_scene_info.zip \u0026\u0026 rm task_D_D_scene_info.zip\n```\n\n### 16 Sep 2022\n- **MAJOR BUG IN ABC and ABCD dataset:** If you downloaded these datasets before this date you have to do these fixes:\n   - Wrong language annotations in ABC and ABCD dataset. You can download the corrected language embeddings [here](https://github.com/mees/calvin/blob/main/dataset/README.md#language-embeddings).\n   - Bug in `calvin_env` that only affects the generation of language embeddings.\n   - Wrong `scene_info.npy` in ABC and ABCD dataset. Please replace as follows:\n```\ncd task_ABCD_D\nwget http://calvin.cs.uni-freiburg.de/scene_info_fix/task_ABCD_D_scene_info.zip\nunzip task_ABCD_D_scene_info.zip \u0026\u0026 rm task_ABCD_D_scene_info.zip\n```\n```\ncd task_ABC_D\nwget http://calvin.cs.uni-freiburg.de/scene_info_fix/task_ABC_D_scene_info.zip\nunzip task_ABC_D_scene_info.zip \u0026\u0026 rm task_ABC_D_scene_info.zip\n```\n- Added additional language embeddings to dataset.\n\n\n### 15 May 2022\n- Added shared memory dataset loader for faster training. Refactored data loading classes.\n\n### 7 Feb 2022\n- Minor changes to the distribution of tasks in the long-horizon multi-step sequences.\n- Changes to the task success criteria of pushing and lifting.\n- Set `use_nullspace: true` for robot in hydra cfg of dataset. If you downloaded one of the datasets prior to this date,\nedit this line in \u003cPATH_TO_DATASET\u003e/training/.hydra/merged_config.yaml and \u003cPATH_TO_DATASET\u003e/validation/.hydra/merged_config.yaml.\n- Renaming `model.decoder` to `model.action_decoder`.\n\n### 10 Jan 2022\n- Breaking change to evaluation, using different intitial states for environment.\n\n## Citation\n\nIf you find the dataset or code useful, please cite:\n\n```bibtex\n@article{mees2022calvin,\nauthor = {Oier Mees and Lukas Hermann and Erick Rosete-Beas and Wolfram Burgard},\ntitle = {CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks},\njournal={IEEE Robotics and Automation Letters (RA-L)},\nvolume={7},\nnumber={3},\npages={7327-7334},\nyear={2022}\n}\n```\n\n## License\n\nMIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmees%2Fcalvin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmees%2Fcalvin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmees%2Fcalvin/lists"}