https://github.com/openmotionlab/motiongpt3
MotionGPT3: Human Motion as a Second Modality, a MoT-based framework for unified motion understanding and generation
https://github.com/openmotionlab/motiongpt3
chatgpt gpt language-model motion motiongpt motiongpt3 multi-modal text-to-motion
Last synced: about 1 month ago
JSON representation
MotionGPT3: Human Motion as a Second Modality, a MoT-based framework for unified motion understanding and generation
- Host: GitHub
- URL: https://github.com/openmotionlab/motiongpt3
- Owner: OpenMotionLab
- Created: 2025-06-27T02:52:34.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2026-01-14T07:01:59.000Z (5 months ago)
- Last Synced: 2026-01-14T10:33:30.018Z (5 months ago)
- Topics: chatgpt, gpt, language-model, motion, motiongpt, motiongpt3, multi-modal, text-to-motion
- Language: Python
- Homepage: https://motiongpt3.github.io
- Size: 9.1 MB
- Stars: 164
- Watchers: 4
- Forks: 11
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Official repo for MotionGPT3
## 🏃 Intro MotionGPT3
MotionGPT3 is a **bimodal** motion-language framework using MoT architecture designed to address the challenges of **unified** motion understanding and generation.
Technical details
Inspired by the mixture of experts, we propose MotionGPT3, a bimodal motion-language model that treats human motion as a second modality, decoupling motion modeling via separate model parameters and enabling both effective cross-modal interaction and efficient multimodal scaling training.
To preserve language intelligence, the text branch remains the same with the pretrained language model, while a motion branch is integrated via shared attention, enabling bidirectional information flow between two modalities. We employ a motion VAE to encode raw human motion into latent representations, while motion branch predicts motion latents directly from intermediate hidden states using a diffusion head, bypassing discrete tokenization.
Extensive experiments show that our approach achieves competitive performance on both motion understanding and generation tasks while preserving strong language capabilities, establishing a unified bimodal motion diffusion framework within an autoregressive manner.

## 🚩 News
- [2025/10/17] 🔥🔥 Release **MotionGPT3 models** on [huggingface](https://huggingface.co/OpenMotionLab/motiongpt3)
- [2025/06/30] Upload and init project
## ⚡ Quick Start
Setup and download
### 1. Conda environment
```
conda create python=3.11 --name mgpt
conda activate mgpt
```
Install the packages in `requirements.txt` and install [PyTorch 2.0](https://pytorch.org/)
```
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```
We test our code on Python 3.11.11 and PyTorch 2.0.0.
### 2. Dependencies
Run the script to download dependencies materials:
```
bash prepare/download_smpl_model.sh
bash prepare/prepare_gpt2.sh
```
For Text to Motion Evaluation
```
bash prepare/download_t2m_evaluators.sh
```
For pre-trained MotionVAE:
```
bash prepare/download_mld_pretrained_models.sh
```
Then run following script to process checkpoints:
```
python -m scripts.gen_mot_gpt
```
### 3. Pre-trained model
Run the script to download the pre-trained model
```
bash prepare/download_pretrained_motiongpt3_model.sh
```
### 4. (Optional) Download manually
Visit [the Google Driver](https://drive.google.com/drive/folders/1NMDuI2F0UO2Opl778C37DWCZdHcy5DOh?usp=drive_link) to download the previous dependencies.
Visit [the Hugging Face](https://huggingface.co/OpenMotionLab/motiongpt3) to download the pretrained models.
## ▶️ Demo
Webui
Run the following script to launch webui, then visit [0.0.0.0:8888](http://0.0.0.0:8888)
```
python app.py
```
Batch demo
We support txt file input, the output motions are npy files and output texts are txt files. Please check the `configs/assets.yaml` for path config, TEST.FOLDER as output folder.
Then, run the following script:
```
python demo.py --cfg ./configs/test.yaml --example ./assets/texts/t2m.txt
```
Some parameters:
- `--example=./demo/t2m.txt`: input file as text prompts
- `--task=t2m`: evaluation tasks including t2m, m2t, pred, inbetween
The outputs:
- `npy file`: the generated motions with the shape of (nframe, 22, 3)
- `txt file`: the input text prompt or text output
## 💻 Train your own models
Training guidance
### 1. Prepare the datasets
1. Please refer to [HumanML3D](https://github.com/EricGuo5513/HumanML3D) for text-to-motion dataset setup.
2. Put the instructions data in `prepare/instructions` to the same folder of HumanML3D dataset.
4. (Optional) Refer to [MotionGPT-Training guidance](https://github.com/OpenMotionLab/MotionGPT/tree/main#22-ready-to-pretrain-motiongpt-model) to generate motion code for VQ-based training.
```
bash prepare/download_motiongpt_pretrained_models.sh
python -m scripts.get_motion_code --cfg configs/config_motiongpt.yaml
```
### 2.1. Ready to train MotionGPT3 model
Please first check the parameters in `configs/MoT_vae_stage1_t2m.yaml`, e.g. `NAME`, `instruction_type`, `lm_ablation`, `DEBUG`.
Then, run the following command:
```
python -m scripts.gen_mot_gpt
python -m train --cfg configs/MoT_vae_stage1_t2m.yaml --nodebug
```
### 2.2. Ready to pretrain MotionGPT3 model
Please update the parameters in `configs/MoT_vae_stage2_instruct.yaml` and `configs/MoT_vae_stage2_all.yaml`, e.g. `NAME`, `instruction_type`, `lm_ablation`, `DEBUG`, `PRETRAINED_VAE`(change to your `latest ckpt model path` in previous step)
Then, run the following command:
```
python -m train --cfg configs/MoT_vae_stage2_all.yaml --nodebug
python -m train --cfg configs/MoT_vae_stage2_instruct.yaml --nodebug
```
### 2.3. Ready to instruct-tuning MotionGPT3 model
Please update the parameters in `configs/MoT_vae_stage3.yaml`, e.g. `NAME`, `instruction_type`, `lm_ablation`, `DEBUG`, `PRETRAINED` (change to your `latest ckpt model path` in previous step)
Then, run the following command:
```
python -m train --cfg configs/MoT_vae_stage3.yaml --nodebug
```
### 3. Evaluate the model
Please first put the tained model checkpoint path to `TEST.CHECKPOINT` in config files, e.g. `configs/MoT_vae_stage3.yaml`.
Then, run the following command:
```
python -m test --cfg configs/MoT_vae_stage3.yaml --task t2m
```
Some parameters:
- `--task`: evaluation tasks including t2m(Text-to-Motion), m2t(Motion translation), pred(Motion prediction), inbetween(Motion inbetween)
## 👀 Visualization
Render SMPL
### 1. Set up blender - WIP
Refer to [TEMOS-Rendering motions](https://github.com/Mathux/TEMOS) for blender setup, then install the following dependencies.
```
YOUR_BLENDER_PYTHON_PATH/python -m pip install -r prepare/requirements_render.txt
```
### 2. (Optional) Render rigged cylinders
Run the following command using blender:
```
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
```
### 2. Create SMPL meshes with:
```
python -m fit --dir YOUR_NPY_FOLDER --save_folder TEMP_PLY_FOLDER --cuda
```
This outputs:
- `mesh npy file`: the generate SMPL vertices with the shape of (nframe, 6893, 3)
- `ply files`: the ply mesh file for blender or meshlab
### 3. Render SMPL meshes
Run the following command to render SMPL using blender:
```
YOUR_BLENDER_PATH/blender --background --python render.py -- --cfg=./configs/render.yaml --dir=YOUR_NPY_FOLDER --mode=video
```
optional parameters:
- `--mode=video`: render mp4 video
- `--mode=sequence`: render the whole motion in a png image.
## 📖 Citation
If you find our code or paper helps, please consider citing:
```bibtex
@misc{zhu2025motiongpt3humanmotionsecond,
title={MotionGPT3: Human Motion as a Second Modality},
author={Bingfan Zhu and Biao Jiang and Sunyi Wang and Shixiang Tang and Tao Chen and Linjie Luo and Youyi Zheng and Xin Chen},
year={2025},
eprint={2506.24086},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.24086},
}
```
## Acknowledgments
Thanks to [MotionGPT](https://github.com/OpenMotionLab/MotionGPT), [Motion-latent-diffusion](https://github.com/ChenFengYe/motion-latent-diffusion), [HumanML3D](https://github.com/EricGuo5513/HumanML3D) and [MAR](https://github.com/LTH14/mar), our code is partially borrowing from them.
## License
This code is distributed under an [MIT LICENSE](LICENSE).
Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.