https://github.com/bytedance/ohta

[CVPR2024] OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
https://github.com/bytedance/ohta

3d-hand-reconstruction 3d-vision avatar computer-vision cvpr cvpr2024 deep-learning hand-pose-estimation neural-rendering research

Last synced: about 1 month ago
JSON representation

[CVPR2024] OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

Host: GitHub
URL: https://github.com/bytedance/ohta
Owner: bytedance
License: mit
Created: 2024-06-13T12:29:51.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-06-14T01:38:11.000Z (11 months ago)
Last Synced: 2025-03-27T00:13:18.610Z (about 2 months ago)
Topics: 3d-hand-reconstruction, 3d-vision, avatar, computer-vision, cvpr, cvpr2024, deep-learning, hand-pose-estimation, neural-rendering, research
Language: Python
Homepage: https://zxz267.github.io/OHTA
Size: 3.46 MB
Stars: 25
Watchers: 4
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

Xiaozheng Zheng^*
Chao Wen^*
Zhuo Su
Zeran Xu
Zhaohu Li
Yang Zhao
Zhou Xue^†

PICO, ByteDance

^*Equal contribution ^†Corresponding author

:star_struck: Accepted to CVPR 2024

---

OHTA is a novel approach capable of creating implicit animatable hand avatars using just a single image. It facilitates 1) text-to-avatar conversion, 2) hand texture and geometry editing, and 3) interpolation and sampling within the latent space.

---

[![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/VPjjHNgtzJI)

## :mega: Updates

[06/2024] :star_struck: Code released!

[02/2024] :partying_face: OHTA is accepted to CVPR 2024! Working on code release!

## :desktop_computer: Installation
### Environment
Create the conda environment for OHTA with the given script:
```
bash scripts/create_env.sh
```

### SMPL-X
You should accept [SMPL-X Model License](https://smpl-x.is.tue.mpg.de/modellicense.html) and install [SMPL-X](https://github.com/vchoutas/smplx).

### MANO
You should accept [MANO License](https://mano.is.tue.mpg.de/license.html) and download the [MANO](https://mano.is.tue.mpg.de/) model from the official website.

### PairOF and MANO-HD
Download the pre-trained PairOF and MANO-HD from [here](https://drive.google.com/drive/folders/19X0XOPWCrTPx4IAs2jpj34qbO0bC2Pew), which are provided by [HandAvatar](https://github.com/SeanChenxy/HandAvatar).
We refer to the MANO-HD implementation from [HandAvatar](https://github.com/SeanChenxy/HandAvatar).

## 🔥 Pre-trained Model
We provide the pre-trained model after prior learning, which can be used for one-shot creation. Please download the weights from [link](https://drive.google.com/file/d/1QnmU5qJcM-TLoVhpZIUA2ct1aXaQ5hvH/).

## :file_folder: Data Preparation

### Training and evaluation on InterHand2.6M
You should download the dataset from the official website to train the prior model or evaluate the one-shot performance on [InterHand2.6M](https://mks0601.github.io/InterHand2.6M/).
After downloading the pre-trained models and data, you should organize the folder as follows:
```
ROOT
├── data
│   └── InterHand
│   └── 5
│   └── annotations
│   └── InterHand2.6M_5fps_batch1
├── output
│   └── pretrained_prior_learning.tar
├── third_parties
│   ├── mano
│   │   ├── MANO_RIGHT.pkl -> models/MANO_RIGHT.pkl
│   │   ├── models
│   ├── pairof
│   │   ├── out
│   ├── smplx
│   │   ├── out
```

For training and evaluation, you also need to generate hand segmentations.
First, you should follow [HandAvatar](https://github.com/SeanChenxy/HandAvatar) to generate masks by MANO rendering.
Please refer to `scripts/seg_interhand2.6m_from_mano.py` for generating the MANO segmentation:
```
python scripts/seg_interhand2.6m_from_mano.py
```

To better train the prior model, we further utilize [SAM](https://github.com/facebookresearch/segment-anything) to generate more hand-aligned segmentations with joint and bounding box prompts.
We strongly recommend using segmentations as well as possible for prior learning.
Please refer to `scripts/seg_with_sam.py` for more details:
```
python scripts/seg_with_sam.py
```

### Data for One-shot Creation
For one-shot creation, you should use the hand pose estimator to predict the MANO parameters of the input image, and then process the data to the input format.

We have provided a tool for obtaining HandMesh through fitting, along with metadata in the required format. You can refer to [HandMesh](https://github.com/walsvid/HandMesh) for data preparation tools. Our method is not limited to using [HandMesh](https://github.com/walsvid/HandMesh); you can also use other Hand Mesh Estimators such as [Hamer](https://github.com/geopavlakos/hamer). You can also refer to `scripts/seg_with_sam.py` for generating the hand mask of in-the-wild hand images.

We provide the process script in `scripts/process_interhand2.6m`, which can process the data of InterHand2.6M to the format for one-shot creation.
```
python scripts/process_interhand2.6m.py
```

We also provide some processed samples in `example_data`.

## :runner: Avatar Creation
### One-shot creation
After processing the image to the input format, you can use the `create.py` script to create the hand avatar as below:
```
python create.py --cfg configs/interhand/ohta_create.yaml \
--input example_data/in_the_wild/img/02023.jpg \
--checkpoint output/pretrained_prior_learning.tar
```

### Texture editing
You can also edit the avatar with the given content and the corresponding mask:
```
python create.py --cfg configs/interhand/ohta_create.yaml \
--input example_data/editing/img/rainbow.jpg
--checkpoint output/pretrained_prior_learning.tar \
--edit
```

### Text-to-avatar
If you are interested in generating hand avatars using text prompts, you can utilize image generation tools (e.g., [ControlNet](https://github.com/lllyasviel/ControlNet)) with text and depth map (obtained by MANO rendering) prompts. After that, you can convert the data to the input format described above for avatar generation.

## :running_woman: Evaluation on InterHand2.6M
After creating the one-shot avatar using InterHand2.6M, you can evaluate the performance on the subset.
```
python train.py --cfg configs/interhand/ohta_create.yaml
```

## :walking: Prior learning on InterHand2.6M
You can use the script to train the prior model on InterHand2.6M:
```
python train.py --cfg configs/interhand/ohta_train.yaml
```

## :love_you_gesture: Citation
If you find our work useful for your research, please consider citing the paper:
```
@inproceedings{
zheng2024ohta,
title={OHTA: One-shot Hand Avatar via Data-driven Implicit Priors},
author={Zheng, Xiaozheng and Wen, Chao and Zhuo, Su and Xu, Zeran and Li, Zhaohu and Zhao, Yang and Xue, Zhou},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
```

## :newspaper_roll: License

Distributed under the MIT License. See `LICENSE` for more information.

## :raised_hands: Acknowledgements
This project is built on source codes shared by [HandAvatar](https://github.com/SeanChenxy/HandAvatar) and [PyTorch3D](https://github.com/facebookresearch/pytorch3d). We thank the authors for their great job!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bytedance/ohta

Awesome Lists containing this project

README

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors