{"id":29009879,"url":"https://github.com/tencentarc/moto","last_synced_at":"2025-10-11T09:38:22.230Z","repository":{"id":266389832,"uuid":"897985545","full_name":"TencentARC/Moto","owner":"TencentARC","description":"Latent Motion Token as the Bridging Language for Robot Manipulation","archived":false,"fork":false,"pushed_at":"2025-05-11T09:11:29.000Z","size":7298,"stargazers_count":105,"open_issues_count":2,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-25T15:51:53.831Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://chenyi99.github.io/moto/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-03T15:30:57.000Z","updated_at":"2025-06-24T08:14:30.000Z","dependencies_parsed_at":"2025-03-19T10:36:29.539Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/Moto","commit_stats":null,"previous_names":["tencentarc/moto"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TencentARC/Moto","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FMoto","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FMoto/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FMoto/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FMoto/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/Moto/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FMoto/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006747,"owners_count":26084181,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-25T15:33:30.054Z","updated_at":"2025-10-11T09:38:22.196Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e\nMoto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos\n\n\u003ca href='https://chenyi99.github.io/moto/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e\n\u003ca href='https://arxiv.org/abs/2412.04445'\u003e\u003cimg src='https://img.shields.io/badge/Paper-Arxiv-red'\u003e\u003c/a\u003e \n\u003ca href='https://huggingface.co/TencentARC/Moto'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Checkpoint-blue'\u003e\u003c/a\u003e\n\u003c/h1\u003e\n\n![image](assets/teaser.png?raw=true)\n \n\u003c/div\u003e\n\n## 🚀Introduction\n\nRecent developments in Large Language Models (LLMs) pre-trained on extensive corpora have shown significant success in various natural language processing (NLP) tasks with minimal fine-tuning.\nThis success offers new promise for robotics, which has long been constrained by the high cost of action-labeled data. We ask: given the abundant video data containing interaction-related knowledge available as a rich \"corpus\", \u003cb\u003e\u003ci\u003ecan a similar generative pre-training approach be effectively applied to enhance robot learning?\u003c/i\u003e\u003c/b\u003e The key challenge is to identify an effective representation for autoregressive pre-training that benefits robot manipulation tasks.\nInspired by the way humans learn new skills through observing dynamic environments, we propose that effective robotic learning should emphasize motion-related knowledge, which is closely tied to low-level actions and is hardware-agnostic, facilitating the transfer of learned motions to actual robot actions.\n\nTo this end, we introduce \u003cb\u003eMoto\u003c/b\u003e, which converts video content into latent \u003cb\u003eMo\u003c/b\u003etion \u003cb\u003eTo\u003c/b\u003eken sequences by a Latent Motion Tokenizer, learning a bridging \"language\" of motion from videos in an unsupervised manner.\nWe pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. After pre-training, Moto-GPT demonstrates the promising ability to produce semantically interpretable motion tokens, predict plausible motion trajectories, and assess trajectory rationality through output likelihood.\nTo transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control. Extensive experiments show that the fine-tuned Moto-GPT exhibits superior robustness and efficiency on robot manipulation benchmarks, underscoring its effectiveness in transferring knowledge from video data to downstream visual manipulation tasks.\n\n## 🛠️Quick Start\n\n### Installation\nClone this repo:\n```bash\ngit clone https://github.com/TencentARC/Moto.git\n```\n\nInstall minimal requirements for Moto training and inference:\n```bash\nconda create -n moto python=3.8\nconda activate moto\ncd Moto\npip install -r requirements.txt\ncd ..\n```\n\n\n[Optional] Setup the conda environment for evaluating Moto-GPT on the [CALVIN](https://github.com/mees/calvin) benchmark:\n\n```bash\nconda create -n moto_for_calvin python=3.8\nconda activate moto_for_calvin\n\ngit clone --recurse-submodules https://github.com/mees/calvin.git\npip install setuptools==57.5.0\ncd calvin\ncd calvin_env; git checkout main\ncd ../calvin_models\nsed -i 's/pytorch-lightning==1.8.6/pytorch-lightning/g' requirements.txt\nsed -i 's/torch==1.13.1/torch/g' requirements.txt\ncd ..\nsh ./install.sh\ncd ..\n\nsudo apt-get install -y libegl1-mesa libegl1\nsudo apt-get install -y libgl1\nsudo apt-get install -y libosmesa6-dev\nsudo apt-get install -y patchelf\n\ncd Moto\npip install -r requirements.txt\ncd ..\n```\n\n\n\n[Optional] Setup the conda environment for evaluating Moto-GPT on the [SIMPLER](https://github.com/simpler-env/SimplerEnv) benchmark:\n```bash\nsource /data/miniconda3/bin/activate\nconda create -n moto_for_simpler python=3.10 -y\nconda activate moto_for_simpler\n\n\ngit clone https://github.com/simpler-env/SimplerEnv --recurse-submodules\npip install numpy==1.24.4\ncd SimplerEnv/ManiSkill2_real2sim\npip install -e .\ncd SimplerEnv\npip install -e .\nsudo apt install ffmpeg\npip install setuptools==58.2.0\npip install tensorflow==2.15.0\npip install -r requirements_full_install.txt\npip install tensorflow[and-cuda]==2.15.1\npip install git+https://github.com/nathanrooy/simulated-annealing\ncd ..\n\ncd Moto\npip install -r requirements.txt\ncd ..\n```\n\n### Model Weights\nWe release the Latent Motion Tokenizer, the pre-traiend Moto-GPT and the fine-tuned Moto-GPT in [Moto Hugging Face](https://huggingface.co/TencentARC/Moto). You can download them separately and save them in corresponding directories ([`latent_motion_tokenizer/checkpoints/`](latent_motion_tokenizer/checkpoints) and [`moto_gpt/checkpoints/`](moto_gpt/checkpoints)).\n\n## 🤖Inference\n\n### Latent trajectory inference with the pre-trained Moto-GPT and the Latent Motion Tokenizer\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\ncd ${PROJECT_ROOT}/scripts\nnohup bash run_latent_motion_generation.sh \u003e run_latent_motion_generation.log 2\u003e\u00261 \u0026\ntail -f run_latent_motion_generation.log\n```\n\n\n### Evaluating the fine-tuned Moto-GPT on robot manipulation benchmarks\n\n Evaluation on CALVIN\n```bash\nconda activate moto_for_calvin\nexport PROJECT_ROOT=[your path to Moto project]\ncd ${PROJECT_ROOT}/scripts\nnohup bash evaluate_moto_gpt_in_calvin.sh \u003e evaluate_moto_gpt_in_calvin.log 2\u003e\u00261 \u0026\ntail -f evaluate_moto_gpt_in_calvin.log\n```\n\nEvaluation on SIMPLER\n```bash\nconda activate moto_for_simpler\nexport PROJECT_ROOT=[your path to Moto project]\ncd ${PROJECT_ROOT}/scripts\nnohup bash evaluate_moto_gpt_in_simpler.sh \u003e evaluate_moto_gpt_in_simpler.log 2\u003e\u00261 \u0026\ntail -f evaluate_moto_gpt_in_simpler.log\n```\n\n## 🔥Training\n### Prepare Datasets\n#### CALVIN dataset\n- Download and preprocess Split ABC-\u003eD dataset from [CALVIN](https://github.com/mees/calvin/tree/main/dataset):\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\nexport OUTPUT_ROOT=[your path to save datasets]\ncd ${PROJECT_ROOT}/scripts/\nnohup bash download_and_preprocess_calvin_data.sh \u003e download_and_preprocess_calvin_data.log 2\u003e\u00261 \u0026\ntail -f download_and_preprocess_calvin_data.log\n```\n\n#### Open X-Embodiment datasets\n- Install [gsutil](https://cloud.google.com/storage/docs/gsutil_install)\n\n- Download and preprocess datasets from [Open X-Embodiment](https://github.com/google-deepmind/open_x_embodiment):\n```bash\nconda activate moto\npip install tensorflow-datasets\nexport PROJECT_ROOT=[your path to Moto project]\nexport OUTPUT_ROOT=[your path to save datasets]\ncd ${PROJECT_ROOT}/scripts/\nnohup bash download_and_preprocess_oxe_data.sh \u003e download_and_preprocess_oxe_data.log 2\u003e\u00261 \u0026\ntail -f download_and_preprocess_oxe_data.log\n```\n\n\u003c!-- - Modify the `video_dir` and `lmdb_dir` fields in data configs from [latent_motion_tokenizer/configs/data/](latent_motion_tokenizer/configs/data/) and [moto_gpt/configs/data/](moto_gpt/configs/data/) --\u003e\n\n### Stage-1: Training Latent Motion Tokenizer\n#### Training on CALVIN dataset\n- Modify the `npz_dir` field in [latent_motion_tokenizer/configs/data/calvin.yaml](latent_motion_tokenizer/configs/data/calvin.yaml)\n\n- Config the paths in [latent_motion_tokenizer/configs/train/data_calvin-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.0001_bs256-aug_shiftTrue_resizedCropFalse.yaml](latent_motion_tokenizer/configs/train/data_calvin-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.0001_bs256-aug_shiftTrue_resizedCropFalse.yaml)\n\n- Run the following commands:\n\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\nexport CONFIG_NAME=\"data_calvin-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.0001_bs256-aug_shiftTrue_resizedCropFalse\"\ncd ${PROJECT_ROOT}/scripts/\nnohup bash train_latent_motion_tokenizer_on_calvin.sh \u003e train_latent_motion_tokenizer_on_calvin.log 2\u003e\u00261 \u0026\ntail -f train_latent_motion_tokenizer_on_calvin.log\n```\n\n#### Training on Open X-Embodiment datasets\n- Modify the `video_dir` field in [latent_motion_tokenizer/configs/data/rtx.yaml](latent_motion_tokenizer/configs/data/rtx.yaml)\n\n- Config the paths in [latent_motion_tokenizer/configs/train/data_rtx-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.001_bs256-aug_shiftTrue_resizedCropFalse.yaml](latent_motion_tokenizer/configs/train/data_rtx-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.001_bs256-aug_shiftTrue_resizedCropFalse.yaml)\n\n- Run the following commands:\n\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\nexport CONFIG_NAME=\"data_rtx-vq_size128_dim32_num8_legacyTrue-vision_MaeLarge-decoder_queryFusionModeAdd_Patch196_useMaskFalse-mformer_legacyTrue-train_lr0.001_bs256-aug_shiftTrue_resizedCropFalse\"\ncd ${PROJECT_ROOT}/scripts/\nnohup bash train_latent_motion_tokenizer_on_oxe.sh \u003e train_latent_motion_tokenizer_on_oxe.log 2\u003e\u00261 \u0026\ntail -f train_latent_motion_tokenizer_on_oxe.log\n```\n\n\n\n### Stage-2: Pre-training Moto-GPT\n#### Pre-training on CALVIN dataset\n- Modify the `lmdb_dir` field in [moto_gpt/configs/data/calvin.yaml](moto_gpt/configs/data/calvin.yaml)\n\n- Config the paths in [moto_gpt/configs/train/data_calvin-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse.yaml](moto_gpt/configs/train/data_calvin-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse.yaml)\n\n- Run the following commands:\n\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\nexport CONFIG_NAME=\"data_calvin-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse\"\ncd ${PROJECT_ROOT}/scripts/\nnohup bash pretrain_moto_gpt_on_calvin.sh \u003e pretrain_moto_gpt_on_calvin.log 2\u003e\u00261 \u0026\ntail -f pretrain_moto_gpt_on_calvin.log\n```\n\n\n\n#### Pre-training on Open X-Embodiment datasets\n- Modify the `video_dir` and `lmdb_dir` fields in [moto_gpt/configs/data/rtx.yaml](moto_gpt/configs/data/rtx.yaml)\n\n- Config the paths in [moto_gpt/configs/train/data_rtx-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse.yaml](moto_gpt/configs/train/data_rtx-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse.yaml)\n\n- Run the following commands:\n\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\nexport CONFIG_NAME=\"data_rtx-model_actPredFalse_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse\"\nps aux | grep ${CONFIG_NAME} | awk '{print $2}' | xargs kill -9\ncd ${PROJECT_ROOT}/scripts/\nnohup bash pretrain_moto_gpt_on_oxe.sh \u003e pretrain_moto_gpt_on_oxe.log 2\u003e\u00261 \u0026\ntail -f pretrain_moto_gpt_on_oxe.log\n```\n\n\n### Stage-3: Fine-tuning Moto-GPT\n#### Fine-tuning on CALVIN dataset\n- Modify the `lmdb_dir` field in [moto_gpt/configs/data/calvin.yaml](moto_gpt/configs/data/calvin.yaml)\n\n- Config the paths in [moto_gpt/configs/train/data_calvin-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0002_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_calvin_Epoch10.yaml](moto_gpt/configs/train/data_calvin-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0002_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_calvin_Epoch10.yaml)\n\n- Run the following commands:\n\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\nexport CONFIG_NAME=\"data_calvin-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk5_maskProb0.5-train_lr0.0002_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_calvin_Epoch10\"\ncd ${PROJECT_ROOT}/scripts/\nnohup bash finetune_moto_gpt_on_calvin.sh \u003e finetune_moto_gpt_on_calvin.log 2\u003e\u00261 \u0026\ntail -f finetune_moto_gpt_on_calvin.log\n```\n\n#### Fine-tuning on RT-1 dataset\n- Modify the `video_dir` and `lmdb_dir` fields in [moto_gpt/configs/data/rt1.yaml](moto_gpt/configs/data/rt1.yaml)\n\n- Config the paths in [moto_gpt/configs/train/data_rt1-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_oxe_Epoch10.yaml](\nmoto_gpt/configs/train/data_rt1-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.0001_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_oxe_Epoch10.yaml)\n\n- Run the following commands:\n\n```bash\nconda activate moto\nexport PROJECT_ROOT=[your path to Moto project]\nexport CONFIG_NAME=\"data_rt1-model_actPredTrue_motionPredTrue_visionMaeLarge_seq2_chunk3_maskProb0.5-train_lr0.001_bs512-aug_shiftTrue_resizedCropFalse-resume_from_predLatentOnly_oxe_Epoch10\"\ncd ${PROJECT_ROOT}/scripts/\nnohup bash finetune_moto_gpt_on_rt1.sh \u003e finetune_moto_gpt_on_rt1.log 2\u003e\u00261 \u0026\ntail -f finetune_moto_gpt_on_rt1.log\n```\n\n## 📝To Do\n- [x] Release the Latent Motion Tokenizer\n- [x] Release the pre-trained and fine-tuned Moto-GPT\n- [x] Release the inference code\n- [x] Release the training code\n\n\n## 📚Citation\nIf you find our project helpful, hope you can star our repository and cite our paper as follows:\n\n```\n@article{chen2024moto,\n  title={Moto: Latent Motion Token as the Bridging Language for Robot Manipulation},\n  author={Chen, Yi and Ge, Yuying and Li, Yizhuo and Ge, Yixiao and Ding, Mingyu and Shan, Ying and Liu, Xihui},\n  journal={arXiv preprint arXiv:2412.04445},\n  year={2024}\n}\n```\n\n## 🙌Acknowledgement\nThis repo benefits from [Taming Transformers](https://github.com/CompVis/taming-transformers/), [Phenaki-Pytorch](https://github.com/lucidrains/phenaki-pytorch), [GR-1](https://github.com/bytedance/GR-1),  [GR1-Training](https://github.com/EDiRobotics/GR1-Training). Thanks for their wonderful works!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fmoto","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Fmoto","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fmoto/lists"}