{"id":14961261,"url":"https://github.com/ericguo5513/momask-codes","last_synced_at":"2025-05-16T11:03:46.946Z","repository":{"id":210101589,"uuid":"725261939","full_name":"EricGuo5513/momask-codes","owner":"EricGuo5513","description":"Official implementation of \"MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)\"","archived":false,"fork":false,"pushed_at":"2024-09-13T19:09:40.000Z","size":1167,"stargazers_count":1025,"open_issues_count":17,"forks_count":81,"subscribers_count":29,"default_branch":"main","last_synced_at":"2025-05-16T11:02:49.285Z","etag":null,"topics":["3d-generation","animation","motion","motion-generation","motion-synthesis","text-driven","text-to-motion"],"latest_commit_sha":null,"homepage":"https://ericguo5513.github.io/momask/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EricGuo5513.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-29T19:21:27.000Z","updated_at":"2025-05-16T01:35:38.000Z","dependencies_parsed_at":"2023-11-30T20:27:48.801Z","dependency_job_id":"0b503be3-837a-4754-abbf-ac9bf633d324","html_url":"https://github.com/EricGuo5513/momask-codes","commit_stats":{"total_commits":51,"total_committers":4,"mean_commits":12.75,"dds":0.5294117647058824,"last_synced_commit":"94a6636c9c463b7a9414c3401a6f1b67e6c51824"},"previous_names":["ericguo5513/momask-codes"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricGuo5513%2Fmomask-codes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricGuo5513%2Fmomask-codes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricGuo5513%2Fmomask-codes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricGuo5513%2Fmomask-codes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EricGuo5513","download_url":"https://codeload.github.com/EricGuo5513/momask-codes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254518384,"owners_count":22084374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-generation","animation","motion","motion-generation","motion-synthesis","text-driven","text-to-motion"],"created_at":"2024-09-24T13:24:17.570Z","updated_at":"2025-05-16T11:03:46.924Z","avatar_url":"https://github.com/EricGuo5513.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MoMask: Generative Masked Modeling of 3D Human Motions (CVPR 2024)\n### [[Project Page]](https://ericguo5513.github.io/momask) [[Paper]](https://arxiv.org/abs/2312.00063) [[Huggingface Demo]](https://huggingface.co/spaces/MeYourHint/MoMask) [[Colab Demo]](https://github.com/camenduru/MoMask-colab)\n![teaser_image](https://ericguo5513.github.io/momask/static/images/teaser.png)\n\nIf you find our code or paper helpful, please consider starring our repository and citing:\n```\n@inproceedings{guo2024momask,\n  title={Momask: Generative masked modeling of 3d human motions},\n  author={Guo, Chuan and Mu, Yuxuan and Javed, Muhammad Gohar and Wang, Sen and Cheng, Li},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={1900--1910},\n  year={2024}\n}\n```\n\n## :postbox: News\n📢 **2024-08-02** --- The [WebUI demo 🤗](https://huggingface.co/spaces/MeYourHint/MoMask) is now running smoothly on a CPU. No GPU is required to use MoMask.\n\n📢 **2024-02-26** --- 🔥🔥🔥 Congrats! MoMask is accepted to CVPR 2024.\n\n📢 **2024-01-12** --- Now you can use MoMask in Blender as an add-on. Thanks to [@makeinufilm](https://twitter.com/makeinufilm) for sharing the [tutorial](https://medium.com/@makeinufilm/notes-on-how-to-set-up-the-momask-environment-and-how-to-use-blenderaddon-6563f1abdbfa).\n\n📢 **2023-12-30** --- For easy WebUI BVH visulization, you could try this website [bvh2vrma](https://vrm-c.github.io/bvh2vrma/) from this [github](https://github.com/vrm-c/bvh2vrma?tab=readme-ov-file).\n\n📢 **2023-12-29** --- Thanks to Camenduru for supporting the [🤗Colab](https://github.com/camenduru/MoMask-colab) demo.\n\n📢 **2023-12-27** --- Release WebUI demo. Try now on [🤗HuggingFace](https://huggingface.co/spaces/MeYourHint/MoMask)!\n\n📢 **2023-12-19** --- Release scripts for temporal inpainting.\n\n📢 **2023-12-15** --- Release codes and models for momask. Including training/eval/generation scripts.\n\n📢 **2023-11-29** --- Initialized the webpage and git project.  \n\n\n## :round_pushpin: Get You Ready\n\n\u003cdetails\u003e\n  \n### 1. Conda Environment\n```\nconda env create -f environment.yml\nconda activate momask\npip install git+https://github.com/openai/CLIP.git\n```\nWe test our code on Python 3.7.13 and PyTorch 1.7.1\n\n#### Alternative: Pip Installation\n\u003cdetails\u003e\nWe provide an alternative pip installation in case you encounter difficulties setting up the conda environment.\n\n```\npip install -r requirements.txt\n```\nWe test this installation on Python 3.10\n\n\u003c/details\u003e\n\n### 2. Models and Dependencies\n\n#### Download Pre-trained Models\n```\nbash prepare/download_models.sh\n```\n\n#### Download Evaluation Models and Gloves\nFor evaluation only.\n```\nbash prepare/download_evaluator.sh\nbash prepare/download_glove.sh\n```\n\n#### Troubleshooting\nTo address the download error related to gdown: \"Cannot retrieve the public link of the file. You may need to change the permission to 'Anyone with the link', or have had many accesses\". A potential solution is to run `pip install --upgrade --no-cache-dir gdown`, as suggested on https://github.com/wkentaro/gdown/issues/43. This should help resolve the issue.\n\n#### (Optional) Download Manually\nVisit [[Google Drive]](https://drive.google.com/drive/folders/1sHajltuE2xgHh91H9pFpMAYAkHaX9o57?usp=drive_link) to download the models and evaluators mannually.\n\n### 3. Get Data\n\nYou have two options here:\n* **Skip getting data**, if you just want to generate motions using *own* descriptions.\n* **Get full data**, if you want to *re-train* and *evaluate* the model.\n\n**(a). Full data (text + motion)**\n\n**HumanML3D** - Follow the instruction in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), then copy the result dataset to our repository:\n```\ncp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D\n```\n**KIT**-Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), then place result in `./dataset/KIT-ML`\n\n#### \n\n\u003c/details\u003e\n\n## :rocket: Demo\n\u003cdetails\u003e\n\n### (a) Generate from a single prompt\n```\npython gen_t2m.py --gpu_id 1 --ext exp1 --text_prompt \"A person is running on a treadmill.\"\n```\n### (b) Generate from a prompt file\nAn example of prompt file is given in `./assets/text_prompt.txt`. Please follow the format of `\u003ctext description\u003e#\u003cmotion length\u003e` at each line. Motion length indicates the number of poses, which must be integeter and will be rounded by 4. In our work, motion is in 20 fps.\n\nIf you write `\u003ctext description\u003e#NA`, our model will determine a length. Note once there is **one** NA, all the others will be **NA** automatically.\n\n```\npython gen_t2m.py --gpu_id 1 --ext exp2 --text_path ./assets/text_prompt.txt\n```\n\n\nA few more parameters you may be interested:\n* `--repeat_times`: number of replications for generation, default `1`.\n* `--motion_length`: specify the number of poses for generation, only applicable in (a).\n\nThe output files are stored under folder `./generation/\u003cext\u003e/`. They are\n* `numpy files`: generated motions with shape of (nframe, 22, 3), under subfolder `./joints`.\n* `video files`: stick figure animation in mp4 format, under subfolder `./animation`.\n* `bvh files`: bvh files of the generated motion, under subfolder `./animation`.\n\nWe also apply naive foot ik to the generated motions, see files with suffix `_ik`. It sometimes works well, but sometimes will fail.\n  \n\u003c/details\u003e\n\n## :dancers: Visualization\n\u003cdetails\u003e\n\nAll the animations are manually rendered in blender. We use the characters from [mixamo](https://www.mixamo.com/#/). You need to download the characters in T-Pose with skeleton.\n\n### Retargeting\nFor retargeting, we found rokoko usually leads to large error on foot. On the other hand, [keemap.rig.transfer](https://github.com/nkeeline/Keemap-Blender-Rig-ReTargeting-Addon/releases) shows more precise retargetting. You could watch the [tutorial](https://www.youtube.com/watch?v=EG-VCMkVpxg) here.\n\nFollowing these steps:\n* Download keemap.rig.transfer from the github, and install it in blender.\n* Import both the motion files (.bvh) and character files (.fbx) in blender.\n* `Shift + Select` the both source and target skeleton. (Do not need to be Rest Position)\n* Switch to `Pose Mode`, then unfold the `KeeMapRig` tool at the top-right corner of the view window.\n* For `bone mapping file`, direct to `./assets/mapping.json`(or `mapping6.json` if it doesn't work), and click `Read In Bone Mapping File`. This file is manually made by us. It works for most characters in mixamo.\n* (Optional) You could manually fill in the bone mapping and adjust the rotations by your own, for your own character. `Save Bone Mapping File` can save the mapping configuration in local file, as specified by the mapping file path.\n* Adjust the `Number of Samples`, `Source Rig`, `Destination Rig Name`.\n* Clik `Transfer Animation from Source Destination`, wait a few seconds.\n\nWe didn't tried other retargetting tools. Welcome to comment if you find others are more useful.\n\n### Scene\n\nWe use this [scene](https://drive.google.com/file/d/16SbrnG9JsJ2w7UwCFmh10PcBdl6HxlrA/view?usp=drive_link) for animation.\n\n\n\u003c/details\u003e\n\n## :clapper: Temporal Inpainting\n\u003cdetails\u003e\nWe conduct mask-based editing in the m-transformer stage, followed by the regeneration of residual tokens for the entire sequence. To load your own motion, provide the path through `--source_motion`. Utilize `-msec` to specify the mask section, supporting either ratio or frame index. For instance, `-msec 0.3,0.6` with `max_motion_length=196` is equivalent to `-msec 59,118`, indicating the editing of the frame section [59, 118]. \n\n```\npython edit_t2m.py --gpu_id 1 --ext exp3 --use_res_model -msec 0.4,0.7 --text_prompt \"A man picks something from the ground using his right hand.\"\n```\n\nNote: Presently, the source motion must adhere to the format of a HumanML3D dim-263 feature vector. An example motion vector data from the HumanML3D test set is available in `example_data/000612.npy`. To process your own motion data, you can utilize the `process_file` function from `utils/motion_process.py`.\n\n\u003c/details\u003e\n\n## :space_invader: Train Your Own Models\n\u003cdetails\u003e\n\n\n**Note**: You have to train RVQ **BEFORE** training masked/residual transformers. The latter two can be trained simultaneously.\n\n### Train RVQ\nYou may also need to download evaluation models to run the scripts.\n```\npython train_vq.py --name rvq_name --gpu_id 1 --dataset_name t2m --batch_size 256 --num_quantizers 6  --max_epoch 50 --quantize_dropout_prob 0.2 --gamma 0.05\n```\n\n### Train Masked Transformer\n```\npython train_t2m_transformer.py --name mtrans_name --gpu_id 2 --dataset_name t2m --batch_size 64 --vq_name rvq_name\n```\n\n### Train Residual Transformer\n```\npython train_res_transformer.py --name rtrans_name  --gpu_id 2 --dataset_name t2m --batch_size 64 --vq_name rvq_name --cond_drop_prob 0.2 --share_weight\n```\n\n* `--dataset_name`: motion dataset, `t2m` for HumanML3D and `kit` for KIT-ML.  \n* `--name`: name your model. This will create to model space as `./checkpoints/\u003cdataset_name\u003e/\u003cname\u003e`\n* `--gpu_id`: GPU id.\n* `--batch_size`: we use `512` for rvq training. For masked/residual transformer, we use `64` on HumanML3D and `16` for KIT-ML.\n* `--num_quantizers`: number of quantization layers, `6` is used in our case.\n* `--quantize_drop_prob`: quantization dropout ratio, `0.2` is used.\n* `--vq_name`: when training masked/residual transformer, you need to specify the name of rvq model for tokenization.\n* `--cond_drop_prob`: condition drop ratio, for classifier-free guidance. `0.2` is used.\n* `--share_weight`: whether to share the projection/embedding weights in residual transformer.\n\nAll the pre-trained models and intermediate results will be saved in space `./checkpoints/\u003cdataset_name\u003e/\u003cname\u003e`.\n\u003c/details\u003e\n\n## :book: Evaluation\n\u003cdetails\u003e\n\n### Evaluate RVQ Reconstruction:\nHumanML3D:\n```\npython eval_t2m_vq.py --gpu_id 0 --name rvq_nq6_dc512_nc512_noshare_qdp0.2 --dataset_name t2m --ext rvq_nq6\n\n```\nKIT-ML:\n```\npython eval_t2m_vq.py --gpu_id 0 --name rvq_nq6_dc512_nc512_noshare_qdp0.2_k --dataset_name kit --ext rvq_nq6\n```\n\n### Evaluate Text2motion Generation:\nHumanML3D:\n```\npython eval_t2m_trans_res.py --res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw --dataset_name t2m --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns --gpu_id 1 --cond_scale 4 --time_steps 10 --ext evaluation\n```\nKIT-ML:\n```\npython eval_t2m_trans_res.py --res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw_k --dataset_name kit --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns_k --gpu_id 0 --cond_scale 2 --time_steps 10 --ext evaluation\n```\n\n* `--res_name`: model name of `residual transformer`.  \n* `--name`: model name of `masked transformer`.  \n* `--cond_scale`: scale of classifer-free guidance.\n* `--time_steps`: number of iterations for inference.\n* `--ext`: filename for saving evaluation results.\n* `--which_epoch`: checkpoint name of `masked transformer`.\n\nThe final evaluation results will be saved in `./checkpoints/\u003cdataset_name\u003e/\u003cname\u003e/eval/\u003cext\u003e.log`\n\n\u003c/details\u003e\n\n## Acknowlegements\n\nWe sincerely thank the open-sourcing of these works where our code is based on: \n\n[deep-motion-editing](https://github.com/DeepMotionEditing/deep-motion-editing), [Muse](https://github.com/lucidrains/muse-maskgit-pytorch), [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch), [T2M-GPT](https://github.com/Mael-zys/T2M-GPT), [MDM](https://github.com/GuyTevet/motion-diffusion-model/tree/main) and [MLD](https://github.com/ChenFengYe/motion-latent-diffusion/tree/main)\n\n## License\nThis code is distributed under an [MIT LICENSE](https://github.com/EricGuo5513/momask-codes/tree/main?tab=MIT-1-ov-file#readme).\n\nNote that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.\n\n### Misc\nContact cguo2@ualberta.ca for further questions.\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=EricGuo5513/momask-codes\u0026type=Date)](https://star-history.com/#EricGuo5513/momask-codes\u0026Date)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericguo5513%2Fmomask-codes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fericguo5513%2Fmomask-codes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericguo5513%2Fmomask-codes/lists"}