{"id":20464526,"url":"https://github.com/bytedance/ohta","last_synced_at":"2025-04-13T08:38:04.182Z","repository":{"id":244331911,"uuid":"814651300","full_name":"bytedance/OHTA","owner":"bytedance","description":"[CVPR2024] OHTA: One-shot Hand Avatar via Data-driven Implicit Priors","archived":false,"fork":false,"pushed_at":"2024-06-14T01:38:11.000Z","size":3630,"stargazers_count":25,"open_issues_count":1,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-27T00:13:18.610Z","etag":null,"topics":["3d-hand-reconstruction","3d-vision","avatar","computer-vision","cvpr","cvpr2024","deep-learning","hand-pose-estimation","neural-rendering","research"],"latest_commit_sha":null,"homepage":"https://zxz267.github.io/OHTA","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytedance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-13T12:29:51.000Z","updated_at":"2025-03-12T11:52:58.000Z","dependencies_parsed_at":"2024-06-14T03:59:30.525Z","dependency_job_id":null,"html_url":"https://github.com/bytedance/OHTA","commit_stats":null,"previous_names":["bytedance/ohta"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOHTA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOHTA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOHTA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytedance%2FOHTA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytedance","download_url":"https://codeload.github.com/bytedance/OHTA/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248685148,"owners_count":21145215,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-hand-reconstruction","3d-vision","avatar","computer-vision","cvpr","cvpr2024","deep-learning","hand-pose-estimation","neural-rendering","research"],"created_at":"2024-11-15T13:15:32.217Z","updated_at":"2025-04-13T08:38:04.143Z","avatar_url":"https://github.com/bytedance.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003eOHTA: One-shot Hand Avatar via Data-driven Implicit Priors\u003c/h1\u003e\n\n\u003cdiv\u003e\n    \u003ca href='https://scholar.google.com/citations?user=3hSD41oAAAAJ' target='_blank'\u003eXiaozheng Zheng\u003csup\u003e*\u003c/sup\u003e\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?user=v8TFZI4AAAAJ' target='_blank'\u003eChao Wen\u003csup\u003e*\u003c/sup\u003e\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://suzhuo.github.io/' target='_blank'\u003eZhuo Su\u003csup\u003e\u003c/sup\u003e\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?user=Yeawk5sAAAAJ' target='_blank'\u003eZeran Xu\u003csup\u003e\u003c/sup\u003e\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://github.com/lizhaohu' target='_blank'\u003eZhaohu Li\u003csup\u003e\u003c/sup\u003e\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://github.com/uzhaoyang' target='_blank'\u003eYang Zhao\u003csup\u003e\u003c/sup\u003e\u003c/a\u003e\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?\u0026user=ECKq3aUAAAAJ' target='_blank'\u003eZhou Xue\u003csup\u003e†\u003c/sup\u003e\u003c/a\u003e\u0026emsp;\n\n\n\u003c/div\u003e\n\u003cdiv\u003e\n    PICO, ByteDance\n\u003c/div\u003e\n\u003cdiv\u003e\n    \u003csup\u003e*\u003c/sup\u003eEqual contribution \u0026emsp; \u003csup\u003e†\u003c/sup\u003eCorresponding author\n\u003c/div\u003e\n\u003cdiv\u003e\n    :star_struck: \u003cstrong\u003eAccepted to CVPR 2024\u003c/strong\u003e\n\u003c/div\u003e\n\n---\n\n\u003cimg src=\"assets/teaser.png\" width=\"100%\"/\u003e\n\n\u003cstrong\u003e OHTA is a novel approach capable of creating implicit animatable hand avatars using just a single image. It facilitates 1) text-to-avatar conversion, 2) hand texture and geometry editing, and 3) interpolation and sampling within the latent space.\u003c/strong\u003e\n\n---\n\n\u003ca href='https://zxz267.github.io/OHTA/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Page-Green'\u003e\u003c/a\u003e \u003ca href='http://arxiv.org/abs/2402.18969'\u003e\u003cimg src='https://img.shields.io/badge/Paper-Arxiv-red'\u003e\u003c/a\u003e [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/VPjjHNgtzJI)\n\n\n\u003c/div\u003e\n\n## :mega: Updates\n\n[06/2024] :star_struck: Code released!\n\n[02/2024] :partying_face: OHTA is accepted to CVPR 2024! Working on code release!\n\n## :desktop_computer: Installation\n### Environment\nCreate the conda environment for OHTA with the given script:\n```\nbash scripts/create_env.sh\n```\n\n### SMPL-X\nYou should accept [SMPL-X Model License](https://smpl-x.is.tue.mpg.de/modellicense.html) and install [SMPL-X](https://github.com/vchoutas/smplx). \n\n### MANO\nYou should accept [MANO License](https://mano.is.tue.mpg.de/license.html) and download the [MANO](https://mano.is.tue.mpg.de/) model from the official website.\n\n### PairOF and MANO-HD\nDownload the pre-trained PairOF and MANO-HD from [here](https://drive.google.com/drive/folders/19X0XOPWCrTPx4IAs2jpj34qbO0bC2Pew), which are provided by [HandAvatar](https://github.com/SeanChenxy/HandAvatar). \nWe refer to the MANO-HD implementation from [HandAvatar](https://github.com/SeanChenxy/HandAvatar).\n\n## 🔥 Pre-trained Model \nWe provide the pre-trained model after prior learning, which can be used for one-shot creation. Please download the weights from [link](https://drive.google.com/file/d/1QnmU5qJcM-TLoVhpZIUA2ct1aXaQ5hvH/).\n\n\n## :file_folder: Data Preparation\n\n### Training and evaluation on InterHand2.6M\nYou should download the dataset from the official website to train the prior model or evaluate the one-shot performance on [InterHand2.6M](https://mks0601.github.io/InterHand2.6M/).\nAfter downloading the pre-trained models and data, you should organize the folder as follows:\n```\nROOT\n    ├── data\n    │   └── InterHand\n    │       └── 5\n    │           └── annotations\n    │           └── InterHand2.6M_5fps_batch1\n    ├── output\n    │   └── pretrained_prior_learning.tar\n    ├── third_parties\n    │   ├── mano\n    │   │   ├── MANO_RIGHT.pkl -\u003e models/MANO_RIGHT.pkl\n    │   │   ├── models\n    │   ├── pairof\n    │   │   ├── out\n    │   ├── smplx\n    │   │   ├── out\n```\n\nFor training and evaluation, you also need to generate hand segmentations.\nFirst, you should follow [HandAvatar](https://github.com/SeanChenxy/HandAvatar) to generate masks by MANO rendering.\nPlease refer to `scripts/seg_interhand2.6m_from_mano.py` for generating the MANO segmentation:\n```\npython scripts/seg_interhand2.6m_from_mano.py\n```\n\nTo better train the prior model, we further utilize [SAM](https://github.com/facebookresearch/segment-anything) to generate more hand-aligned segmentations with joint and bounding box prompts.\nWe strongly recommend using segmentations as well as possible for prior learning.\nPlease refer to `scripts/seg_with_sam.py` for more details:\n```\npython scripts/seg_with_sam.py\n```\n\n\n### Data for One-shot Creation\nFor one-shot creation, you should use the hand pose estimator to predict the MANO parameters of the input image, and then process the data to the input format.\n\nWe have provided a tool for obtaining HandMesh through fitting, along with metadata in the required format. You can refer to [HandMesh](https://github.com/walsvid/HandMesh) for data preparation tools. Our method is not limited to using [HandMesh](https://github.com/walsvid/HandMesh); you can also use other Hand Mesh Estimators such as [Hamer](https://github.com/geopavlakos/hamer). You can also refer to `scripts/seg_with_sam.py` for generating the hand mask of in-the-wild hand images.\n\n\nWe provide the process script in `scripts/process_interhand2.6m`, which can process the data of InterHand2.6M to the format for one-shot creation.\n```\npython scripts/process_interhand2.6m.py\n```\n\nWe also provide some processed samples in `example_data`.\n\n\n## :runner: Avatar Creation\n### One-shot creation\nAfter processing the image to the input format, you can use the `create.py` script to create the hand avatar as below:\n```\npython create.py --cfg configs/interhand/ohta_create.yaml \\\n--input example_data/in_the_wild/img/02023.jpg \\\n--checkpoint output/pretrained_prior_learning.tar\n```\n\n### Texture editing\nYou can also edit the avatar with the given content and the corresponding mask:\n```\npython create.py --cfg configs/interhand/ohta_create.yaml \\\n--input example_data/editing/img/rainbow.jpg\n--checkpoint output/pretrained_prior_learning.tar \\\n--edit\n```\n\n### Text-to-avatar\nIf you are interested in generating hand avatars using text prompts, you can utilize image generation tools (e.g., [ControlNet](https://github.com/lllyasviel/ControlNet)) with text and depth map (obtained by MANO rendering) prompts. After that, you can convert the data to the input format described above for avatar generation.\n\n\n## :running_woman: Evaluation on InterHand2.6M\nAfter creating the one-shot avatar using InterHand2.6M, you can evaluate the performance on the subset.\n```\npython train.py --cfg configs/interhand/ohta_create.yaml\n```\n\n## :walking: Prior learning on InterHand2.6M\nYou can use the script to train the prior model on InterHand2.6M:\n```\npython train.py --cfg configs/interhand/ohta_train.yaml\n```\n\n\n\n\n## :love_you_gesture: Citation\nIf you find our work useful for your research, please consider citing the paper:\n```\n@inproceedings{\n  zheng2024ohta,\n  title={OHTA: One-shot Hand Avatar via Data-driven Implicit Priors},\n  author={Zheng, Xiaozheng and Wen, Chao and Zhuo, Su and Xu, Zeran and Li, Zhaohu and Zhao, Yang and Xue, Zhou},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  year={2024}\n}\n```\n\n## :newspaper_roll: License\n\nDistributed under the MIT License. See `LICENSE` for more information.\n\n\n\n## :raised_hands: Acknowledgements\nThis project is built on source codes shared by [HandAvatar](https://github.com/SeanChenxy/HandAvatar) and [PyTorch3D](https://github.com/facebookresearch/pytorch3d). We thank the authors for their great job!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fohta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytedance%2Fohta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytedance%2Fohta/lists"}