Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/leeyegy/SimCC

[ECCV'2022 Oral] PyTorch implementation for: SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation (http://arxiv.org/abs/2107.03332). Old name: SimDR
https://github.com/leeyegy/SimCC

Last synced: 2 months ago
JSON representation

[ECCV'2022 Oral] PyTorch implementation for: SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation (http://arxiv.org/abs/2107.03332). Old name: SimDR

Awesome Lists containing this project

README

        

# Is 2D Heatmap Even Necessary for Human Pose Estimation?

**NOTE: SimDR is the old name of this work, and we now use SimCC officially in our paper. For simplicity, we won't change the name in our codes, considering it has already been used by a lot of people**.

The 2D heatmap representation has dominated human pose estimation for years due to its high performance. However, heatmap-based approaches suffer from several shortcomings:

- 1) The performance drops dramatically in the low-resolution images, which are frequently encountered in real-world scenarios.

- 2) To improve the localization precision, multiple upsample layers may be needed to recover the feature map resolution from low to high, which are computationally expensive.

- 3) Extra coordinate refinement is usually necessary to reduce the quantization error of downscaled heatmaps.

**_Intro:_** Given the shortcomings revealed above, we don't think 2D heatmap is the final solution for keypoint coordinate representation to this field. By contrast, SimDR is a simple yet effective scheme which gets rid of extra post-processing and reduces the quantisation error by the coordinate representation design. **For the first time**, SimDR brings **heatmap-free** methods to the competitive performance level of **heatmap-based** methods, outperforming the latter by a large margin in low input resolution cases. Additionally, SimDR allows one to directly remove the time-consuming upsampling module of some methods, which may inspire new researches on lightweight models for Human Pose Estimation

We hope proposed SimDR will motivate the community to rethink the design of coordinate representation for 2D human pose estimation.

For details see [SimCC: a Simple Coordinate Classification
Perspective for Human Pose Estimation](http://arxiv.org/abs/2107.03332) by Yanjie Li, Sen Yang, Peidong Liu, Shoukui Zhang, Yunxiao Wang, Zhicheng Wang, Wankou Yang and Shu-Tao Xia.

image

## News!
- [2022.07.17] Our paper ''SimCC: a Simple Coordinate Classification
Perspective for Human Pose Estimation'' has been accpeted by **ECCV'2022** as **Oral** presentation (acceptance rate: 2.7\%). If you find this repository useful please give it a star 🌟.
- [2021.08.17] The pretrained models are released in [Google Drive](https://drive.google.com/drive/folders/1HtIkWDpHasULk_MArlGLtyf-XRAyAsuP?usp=sharing)!
- [2021.07.09] The codes for SimDR and SimDR* (space-aware SimDR) are released!

## Experiments
### Results on COCO test-dev set
|Method|Representation|Input size|GFLOPs|AP|AR|
|-|-|-|-|-|-|
|[SimBa-Res50](https://arxiv.org/abs/1804.06208)|heatmap|384x288|20.0|71.5|76.9|
|[SimBa-Res50](https://arxiv.org/abs/1804.06208)|**SimDR\***|384x288|20.2|**72.7**|**78.0**|
|[HRNet-W48](https://arxiv.org/abs/1902.09212)|heatmap|256x192|14.6|74.2|79.5|
|[HRNet-W48](https://arxiv.org/abs/1902.09212)|**SimDR\***|256x192|14.6|**75.4**|**80.5**|
|[HRNet-W48](https://arxiv.org/abs/1902.09212)|heatmap|384x288|32.9|75.5|80.5|
|[HRNet-W48](https://arxiv.org/abs/1902.09212)|**SimDR\***|384x288|32.9|**76.0**|**81.1**|

### Note:
* Flip test is used.
* Person detector has person AP of 60.9 on COCO test-dev2017 dataset.
* GFLOPs is for convolution and linear layers only.

### Results on COCO validation set


Method
Representation
Input size
#Params
GFLOPs
Extra post.
AP
AR


SimBa-Res50
heatmap
64x64
34.0M
0.7
Y
34.4
43.7


heatmap
64x64
34.0M
0.7
N
25.8
36.0


SimDR (ours)
64x64
34.1M
0.7
N
40.8
49.6


heatmap
128x128
34.0M
3.0
Y
60.3
67.6


heatmap
128x128
34.0M
3.0
N
55.4
63.6


SimDR (ours)
128x128
34.8M
3.0
N
62.6
69.5


heatmap
256x192
34.0M
8.9
Y
70.4
76.3


heatmap
256x192
34.0M
8.9
N
68.5
74.8


SimDR (ours)
256x192
36.8M
9.0
N
71.4
77.4


TokenPose-S
heatmap
64x64
4.9M
1.4
Y
57.1
64.8


heatmap
64x64
4.9M
1.4
N
35.9
47.0


SimDR (ours)
64x64
4.9M
1.4
N
62.8
70.1


heatmap
128x128
5.2M
1.6
Y
65.4
71.6


heatmap
128x128
5.2M
1.6
N
57.6
64.9


SimDR (ours)
128x128
5.1M
1.6
N
71.4
76.4


heatmap
256x192
6.6M
2.2
Y
72.5
78.0


heatmap
256x192
6.6M
2.2
N
69.9
75.8


SimDR (ours)
256x192
5.5M
2.2
N
73.6
78.9


SimBa-Res101
heatmap
64x64
53.0M
1.0
Y
34.1
43.5


heatmap
64x64
53.0M
1.0
N
25.7
36.1


SimDR (ours)
64x64
53.1M
1.0
N
39.6
48.9


heatmap
128x128
53.0M
4.1
Y
59.2
66.7


heatmap
128x128
53.0M
4.1
N
54.4
62.5


SimDR (ours)
128x128
53.5M
4.1
N
63.1
70.1


heatmap
256x192
53.0M
12.4
Y
71.4
77.1


heatmap
256x192
53.0M
12.4
N
69.5
75.6


SimDR (ours)
256x192
53.7M
12.4
N
72.3
78.0


HRNet-W32
heatmap
64x64
28.5M
0.6
Y
45.8
55.3


heatmap
64x64
28.5M
0.6
N
34.6
45.6


SimDR (ours)
64x64
28.6M
0.6
N
56.4
64.9


heatmap
128x128
28.5M
2.4
Y
67.2
74.1


heatmap
128x128
28.5M
2.4
N
61.9
69.4


SimDR (ours)
128x128
29.1M
2.4
N
70.7
76.7


heatmap
256x192
28.5M
7.1
Y
74.4
79.8


heatmap
256x192
28.5M
7.1
N
72.3
78.2


SimDR
256x192
31.3M
7.1
N
75.3
80.8


HRNet-W48
heatmap
64x64
63.6M
1.2
Y
48.5
57.8


heatmap
64x64
63.6M
1.2
N
36.9
47.8


SimDR (ours)
64x64
63.7M
1.2
N
59.7
67.5


heatmap
128x128
63.6M
4.9
Y
68.9
75.3


heatmap
128x128
63.6M
4.9
N
63.3
70.5


SimDR (ours)
128x128
64.1M
4.9
N
72.0
77.9


heatmap
256x192
63.6M
14.6
Y
75.1
80.4


heatmap
256x192
63.6M
14.6
N
73.1
78.7


SimDR (ours)
256x192
66.3M
14.6
N
75.9
81.2

### Note:
* Flip test is used.
* Person detector has person AP of 56.4 on COCO val2017 dataset.
* GFLOPs is for convolution and linear layers only.
* Extra post. = extra post-processing towards refining the predicted keypoint coordinate.

#### Results on higher input resolution
Results on the COCO validation set with the input size of 384×288.


Method
Representation
AP
AP_50
AP_75
AP_M
AP_L
AR


SimBa-Res50
heatmap
72.2
89.3
78.9
68.1
79.7
77.6


SimDR (ours)
73.0
89.3
79.7
69.5
79.9
78.6


SimDR* (ours)
73.4
89.2
80.0
69.7
80.6
78.8


SimBa-Res101
heatmap
73.6
89.6
80.3
69.9
81.1
79.1


SimDR (ours)
74.2
89.6
80.9
70.7
80.9
79.8


SimBa-Res152
heatmap
74.3
89.6
81.1
70.5
81.6
79.7


SimDR (ours)
74.9
89.9
81.5
71.4
81.7
80.4


HRNet-W48
heatmap
76.3
90.8
82.9
72.3
83.4
81.2


SimDR* (ours)
76.9
90.9
83.2
73.2
83.8
82.0

### Note:
* Flip test is used.
* Person detector has person AP of 56.4 on COCO val2017 dataset.

### Results on MPII val set


Method
Representation
Input size
Hea
Sho
Elb
Wri
Hip
Kne
Ank
Mean


[email protected]


HRNet-W32
heatmap
64x64
89.7
86.6
75.1
65.7
77.2
69.2
63.6
76.4


SimDR (ours)
64x64
96.5
89.5
77.5
67.6
79.8
71.5
65.0
78.7


heatmap
256x256
97.1
95.9
90.3
86.4
89.1
87.1
83.3
90.3


SimDR (ours)
256x256
96.8
95.9
90.0
85.0
89.1
85.4
81.3
89.6


SimDR* (ours)
256x256
97.2
96.0
90.4
85.6
89.5
85.8
81.8
90.0


[email protected]


HRNet-W32
heatmap
64x64
12.9
11.7
9.7
7.1
7.2
7.2
6.6
9.2


SimDR (ours)
64x64
30.9
23.3
18.1
15.0
10.5
13.1
12.8
18.5


heatmap
256x256
44.5
37.3
37.5
36.9
15.1
25.9
27.2
33.1


SimDR (ours)
256x256
50.1
41.0
45.3
42.4
16.6
29.7
30.3
37.8

### Note:
* Flip test is used.
* It seems that there is a bug while computing [email protected] in the original code, we have it fixed in this repo.

### Results on CrowdPose


Method
Representation
Input size
AP
AP_50
AP_75
AP_E
AP_M
AP_H


HRNet-W32
heatmap
64x64
42.4
69.6
45.5
51.2
43.1
31.8


SimDR (ours)
64x64
46.5
70.9
50.0
56.0
47.5
34.7


heatmap
256x192
66.4
81.1
71.5
74.0
67.4
55.6


SimDR (ours)
256x192
66.7
82.1
72.0
74.1
67.8
56.2

## Start to use
### 1. Dependencies installation & data preparation
Please refer to [THIS](https://github.com/leoxiaobin/deep-high-resolution-net.pytorch) to prepare the environment step by step.

### 2. Model Zoo
Pretrained models are provided in our [model zoo](https://drive.google.com/drive/folders/1HtIkWDpHasULk_MArlGLtyf-XRAyAsuP?usp=sharing).

### 3. Trainging
#### Training on COCO train2017 dataset
To train with **_SimDR_** as keypoint coordinate representation :
```
python tools/train.py \
--cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml\
```
To train with **_SimDR\*_** as keypoint coordinate representation :
```
python tools/train.py \
--cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml\
```

**_*Note:_**
After using **_SimDR_**, the decovonlution layers of SimpleBaseline can be reserved or removed.

#### Training on MPII dataset
To train with **_SimDR_** as keypoint coordinate representation :
```
python tools/train.py \
--cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml
```
To train with **_SimDR\*_** as keypoint coordinate representation :
```
python tools/train.py \
--cfg experiments/mpii/hrnet/sa_simdr/w32_256x256_adam_lr1e-3_split2_sigma6.yaml
```
### 4. Testing
#### Testing on COCO val2017 dataset using model zoo's models
```
python tools/test.py \
--cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml \
TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
TEST.USE_GT_BBOX False
```
```
python tools/test.py \
--cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml \
TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
TEST.USE_GT_BBOX False
```

#### Testing on MPII dataset using model zoo's models
```
python tools/test.py \
--cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml \
TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ TEST.PCKH_THRE 0.5
```

## Citations
If you use our code or models in your research, please cite with:
```
@misc{li20212d,
title={Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?},
author={Yanjie Li and Sen Yang and Shoukui Zhang and Zhicheng Wang and Wankou Yang and Shu-Tao Xia and Erjin Zhou},
year={2021},
eprint={2107.03332},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Acknowledgement
Thanks for the open-source HRNet.
* [Deep High-Resolution Representation Learning for Human Pose Estimation, Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong](https://github.com/leoxiaobin/deep-high-resolution-net.pytorch/)