https://github.com/idea-research/click-pose
[ICCV 2023] Official implementation of the paper "Neural Interactive Keypoint Detection"
https://github.com/idea-research/click-pose
annotation-tool human-in-the-loop iccv2023 pose-estimation
Last synced: 11 months ago
JSON representation
[ICCV 2023] Official implementation of the paper "Neural Interactive Keypoint Detection"
- Host: GitHub
- URL: https://github.com/idea-research/click-pose
- Owner: IDEA-Research
- License: other
- Created: 2023-08-21T12:40:18.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-10-12T08:29:50.000Z (over 2 years ago)
- Last Synced: 2025-04-13T20:16:33.089Z (about 1 year ago)
- Topics: annotation-tool, human-in-the-loop, iccv2023, pose-estimation
- Language: Python
- Homepage:
- Size: 66.5 MB
- Stars: 80
- Watchers: 3
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Neural Interactive Keypoint Detection
This is the official pytorch implementation of our ICCV 2023 paper "[Neural Interactive Keypoint Detection](https://arxiv.org/pdf/2308.10174.pdf)."
[Jie Yang](https://github.com/yangjie-cv), [Ailing Zeng](https://ailingzeng.site/), [Feng Li](https://scholar.google.com/citations?user=ybRe9GcAAAAJ&hl=zh-CN), [Shilong Liu](http://www.lsl.zone/), [Ruimao Zhang](http://www.zhangruimao.site/), [Lei Zhang](https://www.leizhang.org/)
**Keywords**: π― Multi-person 2D pose estimation, π Human-in-the-loop, π€Interactive model
## β€οΈ Highlights
- Click-Pose has been supported in our **[DeepDataSpace](https://github.com/IDEA-Research/deepdataspace)** platform. See details for *How to perform intelligent labeling with DDS [here](https://docs.deepdataspace.com/tutorials/ai-annotation.html).*
- All models for COCO, Human-Art, OCHuman, and CrowdPose are released!
- **Work flow**: π€ Model localizes all keypoints -> π¨ User corrects a few wrong keypoints -> π€ Model refines other keypoints
## π Click-Pose
- π We first propose an **interactive keypoint detection task** for efficient keypoint annotation.

- π We present the first neural interactive keypoint detection framework, Click-Pose, an end-to-end baseline to annotate multi-person 2D keypoints given an image.
- π Click-Pose is more than **10** times faster than manual annotation. Importantly, it significantly alleviates model bias in out-of-domain annotation (e.g., on Human-Art), reducing the time required by **83%** compared to state-of-the-art model annotation ([ViTPose](https://github.com/ViTAE-Transformer/ViTPose)) with manual correction.

## π Model Zoo
### 1. Model-Only Results
#### COCO val2017 set
| Model | Backbone | Lr schd | mAP | AP50 | AP75 | APM | APL | Time (ms) | Model |
|:----------:|:---------:|:-------:|:----:|:---------------:|:---------------:|:--------------:|:--------------:|:---------:|:-----------------------------------------------------------------------------------------------------:|
| ED-Pose | ResNet-50 | 60e | 71.7 | 89.7 | 78.8 | 66.2 | 79.7 | 51 | [GitHub](https://github.com/IDEA-Research/ED-Pose), [Model](https://drive.google.com/file/d/1Q5OpZeCvaSgqC0NlKeRiJFmHBtusxnjX/view?usp=sharing) |
| Click-Pose | ResNet-50 | 40e | 73.0 | 90.4 | 80.0 | 68.1 | 80.5 | 48 | [Google Drive](https://drive.google.com/file/d/1_rp12m0fkpSc7LQ1oXeifdt8SbwcSHtS/view?usp=sharing) |
#### Human-Art val set
| Model | Backbone | mAP | APM | APL | Model |
|:----------:|:-------------:|:----:|:--------------:|:--------------:|:-----------------------------------------------------------------------------------------------------:|
| ED-Pose | ResNet-50 | 37.5 | 7.6 | 41.1 | [GitHub](https://github.com/IDEA-Research/ED-Pose), [Model](https://drive.google.com/file/d/1Q5OpZeCvaSgqC0NlKeRiJFmHBtusxnjX/view?usp=sharing) |
| Click-Pose | ResNet-50 | 40.5 | 8.3 | 44.2 | [Google Drive](https://drive.google.com/file/d/1_rp12m0fkpSc7LQ1oXeifdt8SbwcSHtS/view?usp=sharing) |
#### OCHuman test set
| Model | Backbone | mAP | AP50 | AP75 | Model |
|:----------:|:-------------:|:----:|:---------------:|:---------------:|:-----------------------------------------------------------------------------------------------------:|
| ED-Pose | ResNet-50 | 31.4 | 39.5 | 35.1 | [GitHub](https://github.com/IDEA-Research/ED-Pose), [Model](https://drive.google.com/file/d/1Q5OpZeCvaSgqC0NlKeRiJFmHBtusxnjX/view?usp=sharing) |
| Click-Pose | ResNet-50 | 33.9 | 43.4 | 37.5 | [Google Drive](https://drive.google.com/file/d/1_rp12m0fkpSc7LQ1oXeifdt8SbwcSHtS/view?usp=sharing) |
Note that the model is trained on COCO train2017 set and tested on COCO val2017 set, Human-Art val set, and OCHuman test set.
### 2. Neural Interactive Results
#### In-domain Annotation (COCO val2017)
| Model | Backbone | NoC@85 | NoC@90 | NoC@95 | Model |
|:----------:|:-------------:|:------:|:------:|:------:|:-----------------------------------------------------------------------------------------------------:|
| ViTPose | ViT-Huge | 1.46 | 2.15 | 2.87 | [GitHub](https://github.com/ViTAE-Transformer/ViTPose), [Model](https://1drv.ms/u/s!AimBgYV7JjTlgShLMI-kkmvNfF_h?e=dEhGHe) |
| Click-Pose | ResNet-50 | 0.95 | 1.48 | 1.97 | [Google Drive](https://drive.google.com/file/d/184RIVxFVrDho4Nw5Yquh6fedTKpsZVYX/view?usp=sharing) |
#### Out-of-domain Annotation (Human-Art val)
| Model | Backbone | NoC@85 | NoC@90 | NoC@95 | Model |
|:----------:|:-------------:|:------:|:------:|:------:|:-----------------------------------------------------------------------------------------------------:|
| ViTPose | ViT-Huge | 9.12 | 9.79 | 10.13 | [GitHub](https://github.com/ViTAE-Transformer/ViTPose), [Model](https://1drv.ms/u/s!AimBgYV7JjTlgShLMI-kkmvNfF_h?e=dEhGHe) |
| Click-Pose | ResNet-50 | 4.82 | 5.81 | 6.45 | [Google Drive](https://drive.google.com/file/d/184RIVxFVrDho4Nw5Yquh6fedTKpsZVYX/view?usp=sharing) |
## π¨ Environment Setup
Installation
We use the [ED-Pose](https://github.com/IDEA-Research/ED-Pose) as our codebase. We test our models under ```python=3.7.3,pytorch=1.9.0,cuda=11.1```. Other versions might be available as well.
1. Clone this repo
```sh
git clone https://github.com/IDEA-Research/Click-Pose.git
cd Click-Pose
```
2. Install Pytorch and torchvision
Follow the instruction on https://pytorch.org/get-started/locally/.
```sh
# an example:
conda install -c pytorch pytorch torchvision
```
3. Install other needed packages
```sh
pip install -r requirements.txt
```
4. Compiling CUDA operators
```sh
cd models/clickpose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
```
Data Preparation
**For COCO data**, please download from [COCO download](http://cocodataset.org/#download).
The coco_dir should look like this:
```
|-- Click-Pose
`-- |-- coco_dir
`-- |-- annotations
| |-- person_keypoints_train2017.json
| `-- person_keypoints_val2017.json
`-- images
|-- train2017
| |-- 000000000009.jpg
| |-- 000000000025.jpg
| |-- 000000000030.jpg
| |-- ...
`-- val2017
|-- 000000000139.jpg
|-- 000000000285.jpg
|-- 000000000632.jpg
|-- ...
```
**For Human-Art data**, please download from [Human-Art download](https://github.com/IDEA-Research/HumanArt),
The humanart_dir should look like this:
```
|-- Click-Pose
`-- |-- humanart_dir
`-- |-- annotations
| |-- training_humanart.json
| |-- validation_humanart.json
`-- images
|-- 2D_virtual_human
|-- ...
|-- 3D_virtual_human
|-- ...
|-- real_human
|-- ...
```
**For CrowdPose data**, please download from [CrowdPose download](https://github.com/Jeff-sjtu/CrowdPose#dataset),
The crowdpose_dir should look like this:
```
|-- Click-Pose
`-- |-- crowdpose_dir
`-- |-- json
| |-- crowdpose_train.json
| |-- crowdpose_val.json
| |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py)
| `-- crowdpose_test.json
`-- images
|-- 100000.jpg
|-- 100001.jpg
|-- 100002.jpg
|-- 100003.jpg
|-- 100004.jpg
|-- 100005.jpg
|-- ...
```
**For OCHuman data**, please download from [OCHuman download](https://github.com/liruilong940607/OCHumanApi).
The ochuman_dir should look like this:
```
|-- Click-Pose
`-- |-- ochuman_dir
`-- |-- annotations
`-- images
```
## π₯³ Run
### Train on COCO:
Model-Only
```
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="coco"
```
Neural Interactive
```
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Neural_Interactive" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="coco"
```
### Evaluation on COCO:
Model-Only
```
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="coco" \
--pretrain_model_path "./models/ClickPose_model_only_R50.pth" \
--eval
```
Neural Interactive-NoC metric
```
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
export CLICKPOSE_NoC_Test="TRUE"
export CLICKPOSE_SAVE_PATH = "./NoC_95_coco.json"
export NoC_thr = 0.95
python -m torch.distributed.launch --nproc_per_node=1 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=1 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=TRUE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="coco" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
```
Neural Interactive-AP metric
```
export CLICKPOSE_COCO_PATH=/path/to/your/coco_dir
export CLICKPOSE_NoC_Test="TRUE"
for CLICKPOSE_Click_Number in {1..17}
do
python -m torch.distributed.launch --nproc_per_node=4 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="coco" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
done
```
### Evaluation on Human-Art:
Model-Only
```
export CLICKPOSE_HumanArt_PATH=/path/to/your/humanart_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="humanart" \
--pretrain_model_path "./models/ClickPose_model_only_R50.pth" \
--eval
```
Neural Interactive-NoC metric
```
export CLICKPOSE_HumanArt_PATH=/path/to/your/humanart_dir
export CLICKPOSE_NoC_Test="TRUE"
export CLICKPOSE_SAVE_PATH = "./NoC_95_humanart.json"
export NoC_thr = 0.95
python -m torch.distributed.launch --nproc_per_node=1 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=1 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=TRUE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="humanart" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
```
Neural Interactive-AP metric
```
export CLICKPOSE_HumanArt_PATH=/path/to/your/humanart_dir
export CLICKPOSE_NoC_Test="TRUE"
for CLICKPOSE_Click_Number in {1..17}
do
python -m torch.distributed.launch --nproc_per_node=4 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="humanart" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
done
```
### Evaluation on OCHuman:
Model-Only
```
export CLICKPOSE_OCHuman_PATH=/path/to/your/ochuman_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/ClickPose_Model-Only_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=FLASE feedback_loop_NOC_test=FALSE feedback_inference=FALSE only_correction=FALSE \
--dataset_file="ochuman" \
--pretrain_model_path "./models/ClickPose_model_only_R50.pth" \
--eval
```
Neural Interactive-NoC metric
```
export CLICKPOSE_OCHuman_PATH=/path/to/your/ochuman_dir
export CLICKPOSE_NoC_Test = "TRUE"
export CLICKPOSE_SAVE_PATH = "./NoC_95_ochuman.json"
export NoC_thr = 0.95
python -m torch.distributed.launch --nproc_per_node=1 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=1 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=TRUE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="ochuman" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
```
Neural Interactive-AP metric
```
export CLICKPOSE_OCHuman_PATH=/path/to/your/ochuman_dir
export CLICKPOSE_NoC_Test="TRUE"
for CLICKPOSE_Click_Number in {1..17}
do
python -m torch.distributed.launch --nproc_per_node=4 --master_port 3458 main.py \
--output_dir "logs/ClickPose_Neural_Interactive_eval" \
-c config/clickpose.cfg.py \
--options batch_size=4 epochs=100 lr_drop=80 use_ema=TRUE human_feedback=TRUE feedback_loop_NOC_test=FALSE feedback_inference=TRUE only_correction=FALSE num_select=20 \
--dataset_file="ochuman" \
--pretrain_model_path "./models/ClickPose_interactive_R50.pth" \
--eval
done
```
### Cite Click-Pose
If you find this repository useful for your work, please consider citing it as follows:
```
@inproceedings{yang2023neural,
title={Neural Interactive Keypoint Detection},
author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15122--15132},
year={2023}
}
```
```
@inproceedings{yang2022explicit,
title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
booktitle={The Eleventh International Conference on Learning Representations},
year={2022}
}
```