{"id":13656124,"url":"https://github.com/ChenFengYe/motion-latent-diffusion","last_synced_at":"2025-04-23T17:31:13.437Z","repository":{"id":65325969,"uuid":"575661063","full_name":"ChenFengYe/motion-latent-diffusion","owner":"ChenFengYe","description":"[CVPR 2023] Executing your Commands via Motion Diffusion in Latent Space, a fast and high-quality motion diffusion model","archived":false,"fork":false,"pushed_at":"2023-07-11T04:00:32.000Z","size":3604,"stargazers_count":628,"open_issues_count":38,"forks_count":56,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-04T17:09:07.139Z","etag":null,"topics":["3d-generation","cvpr2023","diffusion-model","motion","motion-generation","text-driven","text-to-motion"],"latest_commit_sha":null,"homepage":"https://chenxin.tech/mld/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ChenFengYe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-08T02:33:35.000Z","updated_at":"2025-03-23T01:03:02.000Z","dependencies_parsed_at":"2024-12-21T03:07:59.062Z","dependency_job_id":"5f4ebd8a-7453-4e86-aa77-c794eca793c1","html_url":"https://github.com/ChenFengYe/motion-latent-diffusion","commit_stats":{"total_commits":69,"total_committers":5,"mean_commits":13.8,"dds":0.5072463768115942,"last_synced_commit":"081ce3152e5a95b14a4495fb2d32939c257c6216"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenFengYe%2Fmotion-latent-diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenFengYe%2Fmotion-latent-diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenFengYe%2Fmotion-latent-diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChenFengYe%2Fmotion-latent-diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ChenFengYe","download_url":"https://codeload.github.com/ChenFengYe/motion-latent-diffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250480393,"owners_count":21437536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-generation","cvpr2023","diffusion-model","motion","motion-generation","text-driven","text-to-motion"],"created_at":"2024-08-02T04:00:51.453Z","updated_at":"2025-04-23T17:31:12.423Z","avatar_url":"https://github.com/ChenFengYe.png","language":"Python","funding_links":[],"categories":["Papers","Frameworks and Libraries"],"sub_categories":["Text-Driven motion generation","Motion Generation and Estimation"],"readme":"# MLD: Motion Latent Diffusion Models\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/executing-your-commands-via-motion-diffusion/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=executing-your-commands-via-motion-diffusion)\n![Pytorch_lighting](https://img.shields.io/badge/Pytorch_lighting-\u003e=1.7-Blue?logo=Pytorch) ![Diffusers](https://img.shields.io/badge/Diffusers-\u003e=0.7.2-Red?logo=diffusers)\n\n### [Executing your Commands via Motion Diffusion in Latent Space](https://chenxin.tech/mld)\n\n### [Project Page](https://chenxin.tech/mld) | [Arxiv](https://arxiv.org/abs/2212.04048) - CVPR 2023\n\nMotion Latent Diffusion (MLD) is a **text-to-motion** and **action-to-motion** diffusion model. Our work achieves **state-of-the-art** motion quality and two orders of magnitude **faster** than previous diffusion models on raw motion data.\n\n\u003cp float=\"center\"\u003e\n  \u003cimg src=\"https://user-images.githubusercontent.com/16475892/209251515-ea88127b-0783-4a88-a8c1-2e478f7210a2.png\" width=\"800\" /\u003e\n\u003c/p\u003e\n\n## 🚩 News\n- [2023/06/20] [MotionGPT](https://github.com/OpenMotionLab/MotionGPT) is released! **A unified motion-language model**. Do all your motion tasks in [MotionGPT](https://github.com/OpenMotionLab/MotionGPT)\n- [2023/03/08] add [the script](https://github.com/ChenFengYe/motion-latent-diffusion/blob/main/scripts/tsne.py) for latent space visualization and [the script](https://github.com/ChenFengYe/motion-latent-diffusion/blob/main/scripts/flops.py) for the floating point operations (FLOPs)\n- [2023/02/28] **MLD got accepted by CVPR 2023**!\n- [2023/02/02] release action-to-motion task, please refer to [the config](https://github.com/ChenFengYe/motion-latent-diffusion/blob/main/configs/config_mld_humanact12.yaml) and [the pre-train model](https://drive.google.com/file/d/1G9O5arldtHvB66OPr31oE_rJG1bH_R39/view)\n- [2023/01/18] add a detailed [readme](https://github.com/ChenFengYe/motion-latent-diffusion/tree/main/configs) of the configuration\n- [2023/01/09] release [no VAE config](https://github.com/ChenFengYe/motion-latent-diffusion/blob/main/configs/config_novae_humanml3d.yaml) and [pre-train model](https://drive.google.com/file/d/1_mgZRWVQ3jwU43tLZzBJdZ28gvxhMm23/view), you can use MLD framework to train diffusion on raw motion like [MDM](https://github.com/GuyTevet/motion-diffusion-model).\n- [2022/12/22] first release, demo, and training for text-to-motion\n- [2022/12/08] upload paper and init project, code will be released in two weeks\n\n## ⚡ Quick Start\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eSetup and download\u003c/b\u003e\u003c/summary\u003e\n  \n### 1. Conda environment\n\n```\nconda create python=3.9 --name mld\nconda activate mld\n```\n\nInstall the packages in `requirements.txt` and install [PyTorch 1.12.1](https://pytorch.org/)\n\n```\npip install -r requirements.txt\n```\n\nWe test our code on Python 3.9.12 and PyTorch 1.12.1.\n\n### 2. Dependencies\n\nRun the script to download dependencies materials:\n\n```\nbash prepare/download_smpl_model.sh\nbash prepare/prepare_clip.sh\n```\n\nFor Text to Motion Evaluation\n\n```\nbash prepare/download_t2m_evaluators.sh\n```\n\n### 3. Pre-train model\n\nRun the script to download the pre-train model\n\n```\nbash prepare/download_pretrained_models.sh\n```\n\n### 4. (Optional) Download manually\n\nVisit [the Google Driver](https://drive.google.com/drive/folders/1U93wvPsqaSzb5waZfGFVYc4tLCAOmB4C) to download the previous dependencies and model.\n\n\u003c/details\u003e\n\n## ▶️ Demo\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eText-to-motion\u003c/b\u003e\u003c/summary\u003e\n\nWe support text file or keyboard input, the generated motions are npy files.\nPlease check the `configs/asset.yaml` for path config, TEST.FOLDER as output folder.\n\nThen, run the following script:\n\n```\npython demo.py --cfg ./configs/config_mld_humanml3d.yaml --cfg_assets ./configs/assets.yaml --example ./demo/example.txt\n```\n\nSome parameters:\n\n- `--example=./demo/example.txt`: input file as text prompts\n- `--task=text_motion`: generate from the test set of dataset\n- `--task=random_sampling`: random motion sampling from noise\n- ` --replication`: generate motions for same input texts multiple times\n- `--allinone`: store all generated motions in a single npy file with the shape of `[num_samples, num_ replication, num_frames, num_joints, xyz]`\n\nThe outputs:\n\n- `npy file`: the generated motions with the shape of (nframe, 22, 3)\n- `text file`: the input text prompt\n\u003c/details\u003e\n\n## 💻 Train your own models\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eTraining guidance\u003c/b\u003e\u003c/summary\u003e\n\n### 1. Prepare the datasets\n\nPlease refer to [HumanML3D](https://github.com/EricGuo5513/HumanML3D) for text-to-motion dataset setup.\nWe will provide instructions for other datasets soon.\n\n### 2.1. Ready to train VAE model\n\nPlease first check the parameters in `configs/config_vae_humanml3d.yaml`, e.g. `NAME`,`DEBUG`.\n\nThen, run the following command:\n\n```\npython -m train --cfg configs/config_vae_humanml3d.yaml --cfg_assets configs/assets.yaml --batch_size 64 --nodebug\n```\n\n### 2.2. Ready to train MLD model\n\nPlease update the parameters in `configs/config_mld_humanml3d.yaml`, e.g. `NAME`,`DEBUG`,`PRETRAINED_VAE` (change to your `latest ckpt model path` in previous step)\n\nThen, run the following command:\n\n```\npython -m train --cfg configs/config_mld_humanml3d.yaml --cfg_assets configs/assets.yaml --batch_size 64 --nodebug\n```\n\n### 3. Evaluate the model\n\nPlease first put the tained model checkpoint path to `TEST.CHECKPOINT` in `configs/config_mld_humanml3d.yaml`.\n\nThen, run the following command:\n\n```\npython -m test --cfg configs/config_mld_humanml3d.yaml --cfg_assets configs/assets.yaml\n```\n\n\u003c/details\u003e\n\n## 👀 Visualization\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eRender SMPL\u003c/b\u003e\u003c/summary\u003e\n\n### 1. Set up blender - WIP\n\nRefer to [TEMOS-Rendering motions](https://github.com/Mathux/TEMOS) for blender setup, then install the following dependencies.\n\n```\nYOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/requirements_render.txt\n```\n\n### 2. (Optional) Render rigged cylinders\n\nRun the following command using blender:\n\n```\nYOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video --joint_type=HumanML3D\n```\n\n### 2. Create SMPL meshes with:\n\n```\npython -m fit --dir YOUR_NPY_FOLDER --save_folder TEMP_PLY_FOLDER --cuda\n```\n\nThis outputs:\n\n- `mesh npy file`: the generate SMPL vertices with the shape of (nframe, 6893, 3)\n- `ply files`: the ply mesh file for blender or meshlab\n\n### 3. Render SMPL meshes\n\nRun the following command to render SMPL using blender:\n\n```\nYOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video --joint_type=HumanML3D\n```\n\noptional parameters:\n\n- `--mode=video`: render mp4 video\n- `--mode=sequence`: render the whole motion in a png image.\n\u003c/details\u003e\n\n## ❓ FAQ\n\n\u003cdetails\u003e\n    \u003csummary\u003e\u003cb\u003eSolve foot sliding issue\u003c/b\u003e\u003c/summary\u003e\n  \n If your demo results have a severe issue on foot sliding, please take a look to the below. It could happen when ``self.feats2joints`` (use mean and std for de-normalization) is broken. \n https://github.com/ChenFengYe/motion-latent-diffusion/blob/af507c479d771f62a058b5b6abb51276b36d6c6d/mld/models/modeltype/mld.py#L264\n https://github.com/ChenFengYe/motion-latent-diffusion/blob/5c264c31fbc7ffc047be1ce003622f1865417e8f/mld/data/get_data.py#L26-L41\n \n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDetails of training\u003c/b\u003e\u003c/summary\u003e\n  \n1. **GPUs.** You can indicate the IDs to use all your GPUs.  https://github.com/ChenFengYe/motion-latent-diffusion/blob/6643f175fbcd914312fa5f570e3dc7ab57994075/configs/config_vae_humanml3d.yaml#L4\n2.  **Epoch Nums.** 1500~3000 epoch is enough for VAE or MLD. I suggest you use **wandb**(prefer) or **tensorborad** to check FID curve of your training.\n3. **Training Speed.** 2000 epoch could cost 1 day for a single GPU, and around 12 hours for 8 GPUs. Training speed also depends on ``VAL_EVERY_STEPS`` (Validation Frequency), DataIO Speed. Your training is a little slow.\nhttps://github.com/ChenFengYe/motion-latent-diffusion/blob/6643f175fbcd914312fa5f570e3dc7ab57994075/configs/config_vae_humanml3d.yaml#L77\n4. **Data Log.** Only loss print by default. After validation, more metrics of val will print. More details in wandb (prefer) or tensorborad.\n5. **Debug or not.** Please use ``--nodebug`` for all your training.\n6. **VAE loading.** Please load your pre-train VAE correctly for the MLD diffusion training.\n7. **FID.** FID of validation will drop to 0.5~1 after 1500 epochs for both VAE and MLD training. By default, validation is on test split...https://github.com/ChenFengYe/motion-latent-diffusion/blob/6643f175fbcd914312fa5f570e3dc7ab57994075/configs/config_vae_humanml3d.yaml#L30\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDetails of motion lengths\u003c/b\u003e\u003c/summary\u003e\nOur model is capable of generating motions with arbitrary lengths. To handle different lengths of motions in the same batch, padding and masking are utilized in our motion encoder and decoder. After latent vector \u003ci\u003ez\u003c/i\u003e is obtained by diffusion process, motion length \u003ci\u003eL\u003c/i\u003e represented as a sequence of positional encodings in the form of sinusoidal functions are also provided to the motion decoder, so our motion decoder is able to generate output with variable target lengths.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eMLD-1 VS MLD-7\u003c/b\u003e\u003c/summary\u003e\nMLD-7 only works best in evaluating VAE models (Tab. 4), and MLD-1 wins these generation tasks (Tab. 1, 2, 3, 6). In other words, MLD-7 wins the first training stage for the VAE part, while MLD-1 wins the second for the diffusion part. We thought MLD-7 should perform better than MLD-1 in several tasks, but the results differ. The main reason for this downgrade of a larger latent size, we believe, is the small amount of training data. HumanML3D only includes 15k motion sequences, much smaller than billions of images in image generation. MLD-7 could work better when the motion data amount reaches the million level.\n\u003c/details\u003e\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDetails of Inference Time\u003c/b\u003e\u003c/summary\u003e\nWe provide a detailed ablation study with DDIM below. We evaluate the total inference time to generate 2048 motion clips with different diffusion schedules, floating point operations (FLOPs) counted by THOP library, the size of diffusion input, and FID. MLD reduces the computational cost of diffusion models, which is the main reason for faster inference. The iterations of diffusion further widen the gap in computational cost.\n\u003cimg width=\"839\" alt=\"image\" src=\"https://user-images.githubusercontent.com/24362526/223096066-79ff5879-d685-4ab9-b85e-9b55613df17b.png\"\u003e\nIf you want to test the floating point operations (FLOPs) in your model setting, you can run the following command:\n\n```\npython -m scripts.flops --cfg configs/your_config.yaml\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eLatent Space Visualization\u003c/b\u003e\u003c/summary\u003e\nWe provide Visualization of the t-SNE results on evolved latent codes \u003ci\u003ez\u003c/i\u003e\u003csup\u003et\u003c/sup\u003e during the reverse diffusion process (inference) on action-to-motion task below. \u003ci\u003et\u003c/i\u003e is the diffusion step but ordered in the forward diffusion trajectory. \u003ci\u003ez\u003c/i\u003e\u003csup\u003et\u003c/sup\u003e=49 is the initial random noise. \u003ci\u003ez\u003c/i\u003e\u003csup\u003et\u003c/sup\u003e=0 is our prediction. We sample 30 motions for each action label. From left to right, it shows the evolved latent codes during the inference of diffusion models.\n\u003cimg width=\"1110\" alt=\"image\" src=\"https://user-images.githubusercontent.com/24362526/223096486-20c497f2-6f75-43af-a892-9e1215954ca4.png\"\u003e\nIf you want to visualize Latent Space in your model setting, you can run the following command:\n\n```\npython -m scripts.tsne --cfg configs/your_config.yaml\n```\n\n**Note**: This only support action-to-motion models for now.\n\n\u003c/details\u003e\n\n**[Details of configuration](./configs)**\n\n## Citation\n\nIf you find our code or paper helps, please consider citing:\n\n```bibtex\n@inproceedings{chen2023executing,\n  title={Executing your Commands via Motion Diffusion in Latent Space},\n  author={Chen, Xin and Jiang, Biao and Liu, Wen and Huang, Zilong and Fu, Bin and Chen, Tao and Yu, Gang},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={18000--18010},\n  year={2023}\n}\n```\n\n## Acknowledgments\n\nThanks to [TEMOS](https://github.com/Mathux/TEMOS), [ACTOR](https://github.com/Mathux/ACTOR), [HumanML3D](https://github.com/EricGuo5513/HumanML3D) and [joints2smpl](https://github.com/wangsen1312/joints2smpl), our code is partially borrowing from them.\n\n## License\n\nThis code is distributed under an [MIT LICENSE](LICENSE).\n\nNote that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChenFengYe%2Fmotion-latent-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FChenFengYe%2Fmotion-latent-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChenFengYe%2Fmotion-latent-diffusion/lists"}