{"id":13656118,"url":"https://github.com/priorMDM/priorMDM","last_synced_at":"2025-04-23T17:31:17.803Z","repository":{"id":154780179,"uuid":"608261664","full_name":"priorMDM/priorMDM","owner":"priorMDM","description":"The official implementation of the paper \"Human Motion Diffusion as a Generative Prior\"","archived":false,"fork":false,"pushed_at":"2025-01-25T09:55:17.000Z","size":10395,"stargazers_count":450,"open_issues_count":6,"forks_count":25,"subscribers_count":23,"default_branch":"main","last_synced_at":"2025-01-25T10:26:06.637Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/priorMDM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-01T16:48:11.000Z","updated_at":"2025-01-25T09:55:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"34598348-29f7-4b6a-bfa7-98777164eca2","html_url":"https://github.com/priorMDM/priorMDM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/priorMDM%2FpriorMDM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/priorMDM%2FpriorMDM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/priorMDM%2FpriorMDM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/priorMDM%2FpriorMDM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/priorMDM","download_url":"https://codeload.github.com/priorMDM/priorMDM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250480394,"owners_count":21437536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T04:00:51.335Z","updated_at":"2025-04-23T17:31:12.794Z","avatar_url":"https://github.com/priorMDM.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["Text-Driven motion generation"],"readme":"# PriorMDM: Human Motion Diffusion as a Generative Prior\n\n\n[![arXiv](https://img.shields.io/badge/arXiv-\u003c2303.01418\u003e-\u003cCOLOR\u003e.svg)](https://arxiv.org/abs/2303.01418)\n\nThe official PyTorch implementation of the paper [**\"Human Motion Diffusion as a Generative Prior\"(ArXiv)**](https://arxiv.org/abs/2303.01418).\n\nPlease visit our [**webpage**](https://priormdm.github.io/priorMDM-page/) for more details.\n\n![teaser](https://github.com/priorMDM/priorMDM-page/raw/main/static/figures/teaser.gif)\n\n#### Bibtex\nIf you find this code useful in your research, please cite:\n\n```\n@article{shafir2023human,\n  title={Human motion diffusion as a generative prior},\n  author={Shafir, Yonatan and Tevet, Guy and Kapon, Roy and Bermano, Amit H},\n  journal={arXiv preprint arXiv:2303.01418},\n  year={2023}\n}\n```\n\n\n## Release status\n\n|  | Training | Generation | Evaluation |\n| --- | ----------- | ----------- | ----------- |\n| **DoubleTake (long motion)** | ✅ | ✅ | ✅ |\n| **ComMDM (two-person)** | ✅ | ✅ | ✅ |\n| **Fine-tuned motion control** | ✅ | ✅ | ✅ |\n\n## News\n\n📢 **29/Apr/2023** - Evaluation release of the long-motions scripts, including both datasets (BABEL \u0026 HumanML3D) - please check the updated readme.\n\n📢 **25/Apr/2023** - Full release of the fine-tuned motion control scripts.\n\n📢 **14/Apr/2023** - First release - DoubleTake/ComMDM - Training and generation with pre-trained models is available.\n\n## Getting started\n\nThis code was tested on `Ubuntu 18.04.5 LTS` and requires:\n\n* Python 3.8\n* conda3 or miniconda3\n* CUDA capable GPU (one is enough)\n\n### 1. Setup environment \n\nInstall ffmpeg (if not already installed):\n\n```shell\nsudo apt update\nsudo apt install ffmpeg\n```\nFor windows use [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) instead.\n\nSetup conda env:\n```shell\nconda env create -f environment.yml\nconda activate PriorMDM\npython -m spacy download en_core_web_sm\npip install git+https://github.com/openai/CLIP.git\npip install git+https://github.com/GuyTevet/smplx.git\n```\n\n### 2. Get MDM dependencies\n\nPriorMDM share most of its dependencies with the original MDM. \nIf you already have an installed MDM from the official repo, you can save time and link the dependencies instead of getting them from scratch.\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eIf you already have an installed MDM\u003c/b\u003e\u003c/summary\u003e\n\n**Link from installed MDM**\n\nBefore running the following bash script, first change the path to the full path to your installed MDM\n\n```bash\nbash prepare/link_mdm.sh\n```\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eFirst time user\u003c/b\u003e\u003c/summary\u003e\n\n**Download dependencies:**\n\n```bash\nbash prepare/download_smpl_files.sh\nbash prepare/download_glove.sh\nbash prepare/download_t2m_evaluators.sh\n```\n\n**Get HumanML3D dataset** (For all applications):\n\nFollow the instructions in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git),\nthen copy the result dataset to our repository:\n\n```shell\ncp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D\n```\n\n\u003c/details\u003e\n\n### 3. Get PriorMDM dependencies\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDoubleTake (Long sequences)\u003c/b\u003e\u003c/summary\u003e\n\n**BABEL dataset**\n\nDownload the processed version [here](https://drive.google.com/file/d/18a4eRh8mbIFb55FMHlnmI8B8tSTkbp4t/view?usp=share_link), and place it at `./dataset/babel`\n\nDownload the following for evaluation [here](https://drive.google.com/file/d/1uTUthP5fzgRLF-q3WgVEQib54zG2ayFc/view?usp=sharing), and place it at `./dataset/babel`\n\nDownload the following [here](https://drive.google.com/file/d/1PBlbxawaeFTxtKkKDsoJwQGuDTdp52DD/view?usp=sharing), and place it at `./dataset/babel`\n\n**SMPLH dependencies**\n\nDownload [here](https://drive.google.com/file/d/1zHTQ1VrVgr-qGl_ahc0UDgHlXgnwx_lM/view?usp=share_link), and place it at `./body_models`\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eComMDM (two-person)\u003c/b\u003e\u003c/summary\u003e\n\n**3DPW dataset**\n\nFor ComMDM, we cleaned [3DPW](https://virtualhumans.mpi-inf.mpg.de/3DPW/) and converted it to HumanML3D format. \n\nDownload the processed version [here](https://drive.google.com/file/d/1INxPiUuyrBAF71WjVj4Ztb1blsI2trth/view?usp=share_link), and place it at `./dataset/3dpw`\n\n\u003c/details\u003e\n\n  **Fine-tuned motion control** - No extra dependencies.\n\n\n### 4. Download the pretrained models\n\nDownload the model(s) you wish to use, then unzip and place it in `./save/`.\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDoubleTake (long motions)\u003c/b\u003e\u003c/summary\u003e\n\n* [my_humanml-encoder-512](https://drive.google.com/file/d/1RCqyKfj7TLSp6VzwrKa84ldEaXmVma1a/view?usp=share_link) (This is a reproduction of MDM best model without any changes)\n* [Babel_TrasnEmb_GeoLoss](https://drive.google.com/file/d/1sHQncaaYhyheeItnAiDOsxw_mpcbpLYr/view?usp=share_link)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eComMDM (two-person)\u003c/b\u003e\u003c/summary\u003e\n\n* [pw3d_text](https://drive.google.com/file/d/1QFIEUd8TEto0AoVQnzsWflrrbHJBZJOG/view?usp=share_link) (for text-to-motion)\n* [pw3d_prefix](https://drive.google.com/file/d/10DL9iOr5VlgsikTVvV_sJ8oX86ycd9xE/view?usp=share_link) (for prefix completion)\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eFine-tuned motion control\u003c/b\u003e\u003c/summary\u003e\n\n* [root_horizontal_control](https://drive.google.com/file/d/1xLNza6S8Iz2MqSlMJnL38FPqTQhGnqfY/view?usp=share_link) \n(Finetuned the base model for 80,000 steps on (horizontal part of) root control objective)\n* [left_wrist_control](https://drive.google.com/file/d/17h98FQhu6dFj70YCopFHT4sL6jZOf42U/view?usp=share_link)\n(Finetuned the base model for 80,000 steps on left wrist control objective)\n* [right_foot_control](https://drive.google.com/file/d/1QqHAYZ3hbDtsHwJ2Gy4nsfgMwaHvnSOq/view?usp=share_link)\n(Finetuned the base model for 80,000 steps on right foor control objective)\n\n\u003c/details\u003e\n\n## Motion Synthesis \n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDoubleTake (long motions)\u003c/b\u003e\u003c/summary\u003e\n\nReproduce random text prompts:\n```shell\npython -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_samples 4 --handshake_size 20 --blend_len 10\n```\nReproduce out of text file:\n```shell\npython -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_text_example.txt \n```\n\nReproduce out of csv file (can determine each sequence length):\n```shell\npython -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_csv_example.csv \n```\n\nIt will look something like this:\n\n![example](assets/DoubleTake/doubleTake_example.gif)\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eComMDM (two-person)\u003c/b\u003e\u003c/summary\u003e\n\n**Text-to-Motion**\n\nReproduce paper text prompts:\n```shell\npython -m sample.two_person_text2motion --model_path ./save/pw3d_text/model000100000.pt --input_text ./assets/two_person_text_prompts.txt\n```\n\nIt will look something like this:\n\n![example](assets/ComMDM/example_capoeira.gif)\n\n**Prefix completion**\n\nComplete unseen motion prefixes:\n```shell\npython -m sample.two_person_prefix_completion --model_path ./save/pw3d_prefix/model000050000.pt\n```\n\nIt will look something like this:\n\n![example](assets/ComMDM/example_prefix.gif)\n\nBlue frames are the input prefix and orange frames are the generated completion.\n\n\n**Visualize dataset**\n\nUnfortunately, 3DPW dataset is not clean, even after our process. To get samples of it run:\n```shell\npython -m sample.two_person_text2motion --model_path ./save/humanml_trans_enc_512/model000200000.pt --sample_gt\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eFine-tuned motion control\u003c/b\u003e\u003c/summary\u003e\n\n**Horizontal Root Control**\n\nSample the horizontal part of the root trajectory from the test set of HumanML3D, and generate a motion with the given trajectory (note that the vertical part of the trajectory is predicted by the model). To make the generation unconditioned on text we add `--guidance_param 0`.\n```shell\npython -m sample.finetuned_motion_control --model_path save/root_horizontal_finetuned/model000280000.pt --guidance_param 0\n```\n\nIt will look something like this:\n\n![example](assets/Fine-tuned_motion_control/root_control_example.gif)\n\nUse `--show_input` if you wish to plot the motion from which the control features were taken from.\n\nAdd a text condition with `--text_condition`. Note that by default, we use classifier-free-guidance with scale of 2.5.\n```shell\npython -m sample.finetuned_motion_control --model_path save/root_horizontal_finetuned/model000280000.pt --text_condition \"a person is raising hands\"\n```\n\n**Left Wrist Control**\n\nSample the relative trajectory of the left wrist w.r.t the root trajectory from the test set of HumanML3D, and generate a motion with the given left wrist relative trajectory. To make the generation unconditioned on text we add `--guidance_param 0`.\n```shell\npython -m sample.finetuned_motion_control --model_path save/left_wrist_finetuned/model000280000.pt --guidance_param 0\n```\n\nIt will look something like this:\n\n![example](assets/Fine-tuned_motion_control/left_wrist_control_example.gif)\n\nAdd a text condition with `--text_condition`. Note that by default, we use classifier-free-guidance with scale of 2.5.\n```shell\npython -m sample.finetuned_motion_control --model_path save/left_wrist_finetuned/model000280000.pt --text_condition \"a person is walking in a circle\"\n```\n\n\n**Left Wrist + Right Foot Control With Model Blending**\n\nSample the relative trajectory of the left wrist w.r.t the root trajectory from the test set of HumanML3D, and generate a motion with the given left wrist relative trajectory. To make the generation unconditioned on text we add `--guidance_param 0`.\n```shell\npython -m sample.finetuned_motion_control --model_path save/left_wrist_finetuned/model000280000.pt,save/right_foot_finetuned/model000280000.pt --guidance_param 0\n```\n\nIt will look something like this:\n\n![example](assets/Fine-tuned_motion_control/left_wrist_right_foot_control_example.gif)\n\nAdd a text condition with `--text_condition`. Note that by default, we use classifier-free-guidance with scale of 2.5.\n```shell\npython -m sample.finetuned_motion_control --model_path save/left_wrist_finetuned/model000280000.pt,save/right_foot_finetuned/model000280000.pt --text_condition \"a person is walking in a circle\"\n```\n\n\u003c/details\u003e\n\n\n**You may also define:**\n* `--device` id.\n* `--seed` to sample different prompts.\n* `--motion_length` (text-to-motion only) in seconds (maximum is 9.8[sec]).\n\n**Running those will get you:**\n\n* `results.npy` file with text prompts and xyz positions of the generated animation\n* `sample##_rep##.mp4` - a stick figure animation for each generated motion.\n\n### Render SMPL mesh\n\nTo create SMPL mesh per frame run:\n\n```shell\npython -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file\n```\n\n**This script outputs:**\n* `sample##_rep##_smpl_params.npy` - SMPL parameters (thetas, root translations, vertices and faces)\n* `sample##_rep##_obj` - Mesh per frame in `.obj` format.\n\n**Notes:**\n* The `.obj` can be integrated into Blender/Maya/3DS-MAX and rendered using them.\n* This script is running [SMPLify](https://smplify.is.tue.mpg.de/) and needs GPU as well (can be specified with the `--device` flag).\n* **Important** - Do not change the original `.mp4` path before running the script.\n\n**Notes for 3d makers:**\n* You have two ways to animate the sequence:\n  1. Use the [SMPL add-on](https://smpl.is.tue.mpg.de/index.html) and the theta parameters saved to `sample##_rep##_smpl_params.npy` (we always use beta=0 and the gender-neutral model).\n  1. A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations. \n     Since the OBJs are not preserving vertices order, we also save this data to the `sample##_rep##_smpl_params.npy` file for your convenience.\n     \n\n## Train your own PriorMDM\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eDoubleTake (long motions)\u003c/b\u003e\u003c/summary\u003e\n\n**HumanML3D best model**\nRetraining HumanML3D is not needed as we use the original trained model from MDM. \nYet, for completeness this repository supports this training as well:\n```shell\npython -m train.train_mdm --save_dir save/my_humanML_bestmodel --dataset humanml \n```\n\n**Babel best model**\n```shell\npython -m train.train_mdm --save_dir ./save/my_Babel_TrasnEmb_GeoLoss --dataset babel --latent_dim 512 --batch_size 64 --diffusion_steps 1000 --num_steps 10000000 --min_seq_len 45 --max_seq_len 250 --lambda_rcxyz 1.0 --lambda_fc 1.0 --lambda_vel 1.0\n```\n\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eComMDM (two-person)\u003c/b\u003e\u003c/summary\u003e\n\n**Text-to-Motion**\n\nDownload the pretrained model for text-to-motion training [from here](https://drive.google.com/file/d/1PE0PK8e5a5j-7-Xhs5YET5U5pGh0c821/view?usp=sharing) and place it in `./save/`. Then train with:\n\n```shell\npython -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512/model000200000.pt --multi_train_mode text --multi_train_splits train,validation --save_dir ./save/my_pw3d_text\n```\n\n**Prefix Completion**\n\nDownload the pretrained model for prefix training [from here](https://drive.google.com/file/d/1PrUoHIiM1ICvL_oOBsB-J6YVJ1kzVRu_/view?usp=share_link) and place it in `./save/`. Then train with:\n\n```shell\npython -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512_prefix_finetune/model000330000.pt --multi_train_mode prefix --save_dir ./save/my_pw3d_prefix --save_interval 10000\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cb\u003eFinetuned Motion Control\u003c/b\u003e\u003c/summary\u003e\n\nTrain a model for left wrist control from scratch on HumanML3D dataset.\n```shell\npython -m train.train_mdm_motion_control --save_dir save/left_wrist_finetuned --dataset humanml --inpainting_mask left_wrist\n```\n\n\nFinetune a base model for left wrist control on HumanML3D dataset. We advise setting `--save_interval` to 10,000 to have it saved more frequently, as this is a finetune and not training from scratch.\n```shell\npython -m train.train_mdm_motion_control --save_dir save/left_wrist_finetuned --dataset humanml --inpainting_mask left_wrist --resume_checkpoint save/humanml_trans_enc_512/model000200000.pt --save_interval 10_000\n```\n\n\u003c/details\u003e\n\n* Use `--device` to define GPU id.\n* Add `--train_platform_type {ClearmlPlatform, TensorboardPlatform}` to track results with either [ClearML](https://clear.ml/) or [Tensorboard](https://www.tensorflow.org/tensorboard).\n* Add `--eval_during_training` to run a short evaluation for each saved checkpoint. \n  This will slow down training but will give you better monitoring.\n\n## Evaluate\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eDoubleTake (long motions)\u003c/b\u003e\u003c/summary\u003e\n\nTo reproduce humanML3D evaluation over the motion run:\n\n```shell\npython -m eval.eval_humanml_double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_unfoldings 2 --handshake_size 20 --transition_margins 40  --eval_on motion --blend_len 10\n```\n\nTo reproduce humanML3D evaluation over the transiton run:\n\n```shell\npython -m eval.eval_humanml_double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_unfoldings 2 --handshake_size 20 --transition_margins 40  --eval_on transition --blend_len 10\n```\n\nTo reproduce BABEL evaluation over the motion run:\n\n```shell\npython -m eval.eval_multi --model_path ./save/Babel_TrasnEmb_GeoLoss/model001250000.pt --num_unfoldings 2 --cropping_sampler --handshake_size 30 --transition_margins 40  --eval_on motion --blend_len 10\n\n```\n\nTo reproduce BABEL evaluation over the transiton run:\n\n```shell\npython -m eval.eval_multi --model_path ./save/Babel_TrasnEmb_GeoLoss/model001250000.pt --num_unfoldings 2 --cropping_sampler --handshake_size 30 --transition_margins 40  --eval_on transition --blend_len 10\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eComMDM (two-person)\u003c/b\u003e\u003c/summary\u003e\n\nThe reported evaluation for prefix completion is in `./save/pw3d_prefix/eval_prefix_pw3d_paper_results_000240000_wo_mm_1000samples.log`.\n\nTo reproduce evaluation run:\n\n```shell\npython -m eval.eval_multi --model_path ./save/pw3d_prefix/model000240000.pt\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\u003cb\u003eFine-tuned motion control\u003c/b\u003e\u003c/summary\u003e\n\nEvaluate the motion control models on the horizontal part of trajectories sampled from the test set of HumanML3D dataset.\n```shell\npython -m eval.eval_finetuned_motion_control --model_path save/root_horizontal_finetuned/model000280000.pt --replication_times 10\n```\n\nThis code should produce a file named `eval_humanml_root_horizontal_finetuned_000280000_gscale2.5_mask_root_horizontal_wo_mm.log`, or generally:\n`eval_humanml\\_\u003cmodel_name\u003e\\_gscale\u003cguidance_free_scale\u003e\\_mask\\_\u003cname_of_control_features\u003e_\u003cevaluation_mode\u003e.log`\n\n\u003c/details\u003e\n\n## Acknowledgments\n\nThis code is standing on the shoulders of giants. We want to thank the following contributors\nthat our code is based on:\n\n[MDM](https://github.com/GuyTevet/motion-diffusion-model),\n[guided-diffusion](https://github.com/openai/guided-diffusion), \n[MotionCLIP](https://github.com/GuyTevet/MotionCLIP), \n[text-to-motion](https://github.com/EricGuo5513/text-to-motion), \n[actor](https://github.com/Mathux/ACTOR), \n[joints2smpl](https://github.com/wangsen1312/joints2smpl),\n[TEACH](https://github.com/athn-nik/teach).\n\n## License\nThis code is distributed under an [MIT LICENSE](LICENSE).\n\nNote that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FpriorMDM%2FpriorMDM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FpriorMDM%2FpriorMDM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FpriorMDM%2FpriorMDM/lists"}