https://github.com/mit-han-lab/litepose

[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
https://github.com/mit-han-lab/litepose

efficient-models litepose pose-estimation

Last synced: 13 days ago
JSON representation

[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

Host: GitHub
URL: https://github.com/mit-han-lab/litepose
Owner: mit-han-lab
License: mit
Created: 2022-03-28T17:08:54.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-06-05T22:04:06.000Z (about 1 year ago)
Last Synced: 2025-05-26T08:57:28.962Z (about 2 months ago)
Topics: efficient-models, litepose, pose-estimation
Language: Python
Homepage: https://hanlab.mit.edu
Size: 37.1 MB
Stars: 317
Watchers: 21
Forks: 39
Open Issues: 20
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-human-pose-estimation - [code - CVPR 22, Lite Pose (Multi-Person 2D Pose Estimation / 2022)

README

# Lite Pose

### [slides](assets/LitePose-slides.pdf)|[paper](https://arxiv.org/abs/2205.01271)|[video](https://www.youtube.com/watch?v=TodvXYrswDI)

![demo](assets/LitePose-Mobile.gif)

## Abstract

Pose estimation plays a critical role in human-centered vision applications. However, it is difficult to deploy state-of-the-art HRNet-based pose estimation models on resource-constrained edge devices due to the high computational cost (more than 150 GMACs per frame). In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. We reveal that HRNet's high-resolution branches are redundant for models at the low-computation region via our **gradual shrinking** experiments. Removing them improves both efficiency and performance. Inspired by this finding, we design **LitePose**, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including **Fusion Deconv Head** and **Large Kernel Convs**. Fusion Deconv Head removes the redundancy in high-resolution branches, allowing scale-aware feature fusion with low overhead. Large Kernel Convs significantly improve the model's capacity and receptive field while maintaining a low computational cost. With only 25\% computation increment, $7\times7$ kernels achieve $+14.0$ mAP better than $3\times 3$ kernels on the CrowdPose dataset. On mobile platforms, LitePose reduces the latency by up to $5.0\times$ without sacrificing performance, compared with prior state-of-the-art efficient pose estimation models, pushing the frontier of real-time multi-person pose estimation on edge.

## Results

### CrowdPose Test

![image](assets/Figure-CrowdPose.png)

Model
mAP
#MACs
Latency (ms)

Nano
Mobile
Pi

HigherHRNet-W24
57.4
25.3G
330
289
1414

EfficientHRNet-H_-1
56.3
14.2G
283
267
1229

LitePose-Auto-S (Ours)
58.3
5.0G
97
76
420

LitePose-Auto-XS (Ours)
49.4
1.2G
22
27
109

### COCO Val/Test 2017

Model
mAP
(val)
mAP
(test-dev)
#MACs
Latency (ms)

Nano
Mobile
Pi

EfficientHRNet-H_-1
59.2
59.1
14.4G
283
267
1229

Lightweight OpenPose
42.8
-
9.0G
-
97
-

LitePose-Auto-M (Ours)
59.8
59.7
7.8G
144
97
588

*Note*: For more details, please refer to our paper.

## Usage

- [Prerequisites](#prerequisites)
- [Data Preparation](#data-preparation)
- [Results](#results)
- [Training Process Overview](#training-process-overview)
- [Evaluation](#evaluation)
- [Models](#models)

### Prerequisites

1. Install [PyTorch](https://pytorch.org/) and other dependencies:
```
pip install -r requirements.txt
```

2. Install COCOAPI and CrowdPoseAPI following [Official HigherHRNet Repository](https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation).

### Data Preparation

1. Please download from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for training and evalutation.
2. Please download from [CrowdPose download](https://github.com/Jeff-sjtu/CrowdPose#dataset), Train/Val is training and evaluation.
3. Refer to [Official HigherHRNet Repository](https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation) for more details about the data arrangement.

### Training Process Overview

#### Super-net Training

To train a supernet from scratch with the search space specified by [arch_manager.py](https://github.com/mit-han-lab/litepose-dev/blob/main/arch_manager.py), use

```
python dist_train.py --cfg experiments/crowd_pose/mobilenet/supermobile.yaml
```

#### Weight Transfer

After training the super-net, you may want to extract a specific sub-network (e.g. search-XS) from the super-net. The following script will be useful:

```
python weight_transfer.py --cfg experiments/crowd_pose/mobilenet/supermobile.yaml --superconfig mobile_configs/search-XS.json TEST.MODEL_FILE your_supernet_checkpoint_path
```

#### Normal Training

To train a normal network with a specific architecture (e.g. search-XS), please use the following script:

*Note*: Please change the **resolution** in configuration (e.g. experiments/crowd_pose/mobilenet/mobile.yaml) in accord with the architecture configuration (e.g. search-XS.json) before training.

```
python dist_train.py --cfg experiments/crowd_pose/mobilenet/mobile.yaml --superconfig mobile_configs/search-XS.json
```

### Evaluation

To evaluate the model with a specific architecture (e.g. search-XS), please use the following script:

```
python valid.py --cfg experiments/crowd_pose/mobilenet/mobile.yaml --superconfig mobile_configs/search-XS.json TEST.MODEL_FILE your_checkpoint_path
```

### Models

#### Pre-trained Models

To re-implement results in the paper, we need to load pre-trained checkpoints before training super-nets. These checkpoints are provided in [COCO-Pretrain](https://drive.google.com/file/d/18WOtQ6yi-pn69bAOeYojXMI7l8sZZG3p/view?usp=sharing) and [CrowdPose-Pretrain](https://drive.google.com/file/d/1fojt0DJA5WPg3IqdkGTpyOps4mdxpGn9/view?usp=sharing).

#### Result Models

We provide the checkpoints corresponding to the results in our paper.

Dataset
Model
#MACs
mAP

CrowdPose
LitePose-Auto-L
13.7
61.9

LitePose-Auto-M
7.8
59.9

LitePose-Auto-S
5.0
58.3

LitePose-Auto-XS
1.2
49.5

COCO
LitePose-Auto-L
13.8
62.5

LitePose-Auto-M
7.8
59.8

LitePose-Auto-S
5.0
56.8

LitePose-Auto-XS
1.2
40.6

## Acknowledgements

Lite Pose is based on [HRNet-family](https://github.com/HRNet), mainly on [HigherHRNet](https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation). Thanks for their well-organized code!

About Large Kernel Convs, several recent papers have found similar conclusions: [ConvNeXt](https://arxiv.org/abs/2201.03545), [RepLKNet](https://arxiv.org/abs/2203.06717). We are looking forward to more applications of large kernels on different tasks!

## Citation

If Lite Pose is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

```bibtex
@article{wang2022lite,
title={Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation},
author={Wang, Yihan and Li, Muyang and Cai, Han and Chen, Wei-Ming and Han, Song},
journal={arXiv preprint arXiv:2205.01271},
year={2022}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mit-han-lab/litepose

Awesome Lists containing this project

README