Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Walter0807/MotionBERT

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"
https://github.com/Walter0807/MotionBERT

3d-pose-estimation iccv2023 mesh-recovery skeleton-based-action-recognition

Last synced: 3 months ago
JSON representation

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"

Host: GitHub
URL: https://github.com/Walter0807/MotionBERT
Owner: Walter0807
License: apache-2.0
Created: 2022-05-02T13:29:37.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-06-18T16:01:46.000Z (7 months ago)
Last Synced: 2024-08-03T04:06:12.698Z (6 months ago)
Topics: 3d-pose-estimation, iccv2023, mesh-recovery, skeleton-based-action-recognition
Language: Python
Homepage:
Size: 103 KB
Stars: 967
Watchers: 22
Forks: 118
Open Issues: 36
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-conditional-content-generation - [Code

README

        # MotionBERT: A Unified Perspective on Learning Human Motion Representations

 [![arXiv](https://img.shields.io/badge/arXiv-2210.06551-b31b1b.svg)](https://arxiv.org/abs/2210.06551)   [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-ffab41)](https://huggingface.co/walterzhu/MotionBERT)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionbert-unified-pretraining-for-human/monocular-3d-human-pose-estimation-on-human3)](https://paperswithcode.com/sota/monocular-3d-human-pose-estimation-on-human3?p=motionbert-unified-pretraining-for-human)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionbert-unified-pretraining-for-human/one-shot-3d-action-recognition-on-ntu-rgbd)](https://paperswithcode.com/sota/one-shot-3d-action-recognition-on-ntu-rgbd?p=motionbert-unified-pretraining-for-human)

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionbert-unified-pretraining-for-human/3d-human-pose-estimation-on-3dpw)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw?p=motionbert-unified-pretraining-for-human)

This is the official PyTorch implementation of the paper *"[MotionBERT: A Unified Perspective on Learning Human Motion Representations](https://arxiv.org/pdf/2210.06551.pdf)"* (ICCV 2023).



## Installation

```bash

conda create -n motionbert python=3.7 anaconda

conda activate motionbert

# Please install PyTorch according to your CUDA version.

conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

pip install -r requirements.txt

```

## Getting Started

| Task                              | Document                                                     |

| --------------------------------- | ------------------------------------------------------------ |

| Pretrain                          | [docs/pretrain.md](docs/pretrain.md)                                                          |

| 3D human pose estimation          | [docs/pose3d.md](docs/pose3d.md) |

| Skeleton-based action recognition | [docs/action.md](docs/action.md) |

| Mesh recovery                     | [docs/mesh.md](docs/mesh.md) |

## Applications

### In-the-wild inference (for custom videos)

Please refer to [docs/inference.md](docs/inference.md).

### Using MotionBERT for *human-centric* video representations

```python

'''	    

  x: 2D skeletons 

    type = 

    shape = [batch size * frames * joints(17) * channels(3)]

    

  MotionBERT: pretrained human motion encoder

    type = 

    

  E: encoded motion representation

    type = 

    shape = [batch size * frames * joints(17) * channels(512)]

'''

E = MotionBERT.get_representation(x)

```

> **Hints**

>

> 1. The model could handle different input lengths (no more than 243 frames). No need to explicitly specify the input length elsewhere.

> 2. The model uses 17 body keypoints ([H36M format](https://github.com/JimmySuen/integral-human-pose/blob/master/pytorch_projects/common_pytorch/dataset/hm36.py#L32)). If you are using other formats, please convert them before feeding to MotionBERT. 

> 3. Please refer to [model_action.py](lib/model/model_action.py) and [model_mesh.py](lib/model/model_mesh.py) for examples of (easily) adapting MotionBERT to different downstream tasks.

> 4. For RGB videos, you need to extract 2D poses ([inference.md](docs/inference.md)), convert the keypoint format ([dataset_wild.py](lib/data/dataset_wild.py)), and then feed to MotionBERT ([infer_wild.py](infer_wild.py)).

>

## Model Zoo



| Model                           | Download Link                                                | Config                                                       | Performance      |

| ------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------- |

| MotionBERT (162MB)              | [OneDrive](https://1drv.ms/f/s!AvAdh0LSjEOlgS425shtVi9e5reN?e=6UeBa2) | [pretrain/MB_pretrain.yaml](configs/pretrain/MB_pretrain.yaml) | -                |

| MotionBERT-Lite (61MB)          | [OneDrive](https://1drv.ms/f/s!AvAdh0LSjEOlgS27Ydcbpxlkl0ng?e=rq2Btn) | [pretrain/MB_lite.yaml](configs/pretrain/MB_lite.yaml)       | -                |

| 3D Pose (H36M-SH, scratch)      | [OneDrive](https://1drv.ms/f/s!AvAdh0LSjEOlgSvNejMQ0OHxMGZC?e=KcwBk1) | [pose3d/MB_train_h36m.yaml](configs/pose3d/MB_train_h36m.yaml) | 39.2mm (MPJPE)   |

| 3D Pose (H36M-SH, ft)           | [OneDrive](https://1drv.ms/f/s!AvAdh0LSjEOlgSoTqtyR5Zsgi8_Z?e=rn4VJf) | [pose3d/MB_ft_h36m.yaml](configs/pose3d/MB_ft_h36m.yaml)     | 37.2mm (MPJPE)   |

| Action Recognition (x-sub, ft)  | [OneDrive](https://1drv.ms/f/s!AvAdh0LSjEOlgTX23yT_NO7RiZz-?e=nX6w2j) | [action/MB_ft_NTU60_xsub.yaml](configs/action/MB_ft_NTU60_xsub.yaml) | 97.2% (Top1 Acc) |

| Action Recognition (x-view, ft) | [OneDrive](https://1drv.ms/f/s!AvAdh0LSjEOlgTaNiXw2Nal-g37M?e=lSkE4T) | [action/MB_ft_NTU60_xview.yaml](configs/action/MB_ft_NTU60_xview.yaml) | 93.0% (Top1 Acc) |

| Mesh (with 3DPW, ft)            | [OneDrive](https://1drv.ms/f/s!AvAdh0LSjEOlgTmgYNslCDWMNQi9?e=WjcB1F) | [mesh/MB_ft_pw3d.yaml](configs/mesh/MB_ft_pw3d.yaml)              | 88.1mm (MPVE)    |

In most use cases (especially with finetuning), `MotionBERT-Lite` gives a similar performance with lower computation overhead. 

## TODO

- [x] Scripts and docs for pretraining

- [x] Demo for custom videos

## Citation

If you find our work useful for your project, please consider citing the paper:

```bibtex

@inproceedings{motionbert2022,

  title     =   {MotionBERT: A Unified Perspective on Learning Human Motion Representations}, 

  author    =   {Zhu, Wentao and Ma, Xiaoxuan and Liu, Zhaoyang and Liu, Libin and Wu, Wayne and Wang, Yizhou},

  booktitle =   {Proceedings of the IEEE/CVF International Conference on Computer Vision},

  year      =   {2023},

}

```