{"id":18614450,"url":"https://github.com/chungyiweng/humannerf","last_synced_at":"2025-04-11T00:30:43.699Z","repository":{"id":37354887,"uuid":"501380780","full_name":"chungyiweng/humannerf","owner":"chungyiweng","description":"HumanNeRF turns a monocular video of moving people into a 360 free-viewpoint video.","archived":false,"fork":false,"pushed_at":"2023-09-29T07:36:39.000Z","size":52,"stargazers_count":791,"open_issues_count":40,"forks_count":87,"subscribers_count":18,"default_branch":"main","last_synced_at":"2024-11-07T03:31:37.434Z","etag":null,"topics":["cvpr2022","humannerf","nerf","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chungyiweng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-08T19:15:35.000Z","updated_at":"2024-10-23T08:12:09.000Z","dependencies_parsed_at":"2024-11-07T03:40:36.324Z","dependency_job_id":null,"html_url":"https://github.com/chungyiweng/humannerf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chungyiweng%2Fhumannerf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chungyiweng%2Fhumannerf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chungyiweng%2Fhumannerf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chungyiweng%2Fhumannerf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chungyiweng","download_url":"https://codeload.github.com/chungyiweng/humannerf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248322220,"owners_count":21084333,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2022","humannerf","nerf","pytorch"],"created_at":"2024-11-07T03:25:56.901Z","updated_at":"2025-04-11T00:30:43.046Z","avatar_url":"https://github.com/chungyiweng.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["NeRF Related Tasks"],"readme":"# HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video (CVPR 2022)\n\n[Project Page](https://grail.cs.washington.edu/projects/humannerf/) | [Paper](https://arxiv.org/abs/2201.04127) | [Video](https://youtu.be/GM-RoZEymmw)\n\nThis is an official implementation. The codebase is implemented using [PyTorch](https://pytorch.org/) and tested on [Ubuntu](https://ubuntu.com/) 20.04.4 LTS.\n\n## Prerequisite\n\n### `Configure environment`\n\nInstall [Miniconda](https://docs.conda.io/en/latest/miniconda.html) (recommended) or [Anaconda](https://www.anaconda.com/).\n\nCreate and activate a virtual environment.\n\n    conda create --name humannerf python=3.7\n    conda activate humannerf\n\nInstall the required packages.\n\n    pip install -r requirements.txt\n\n### `Download SMPL model`\n\nDownload the gender neutral SMPL model from [here](https://smplify.is.tue.mpg.de/), and unpack **mpips_smplify_public_v2.zip**.\n\nCopy the smpl model.\n\n    SMPL_DIR=/path/to/smpl\n    MODEL_DIR=$SMPL_DIR/smplify_public/code/models\n    cp $MODEL_DIR/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl third_parties/smpl/models\n\nFollow [this page](https://github.com/vchoutas/smplx/tree/master/tools) to remove Chumpy objects from the SMPL model.\n\n\n## Run on ZJU-Mocap Dataset\n\nBelow we take the subject 387 as a running example.\n\n### `Prepare a dataset`\n\nFirst, download ZJU-Mocap dataset from [here](https://github.com/zju3dv/neuralbody/blob/master/INSTALL.md#zju-mocap-dataset). \n\nSecond, modify the yaml file of subject 387 at `tools/prepare_zju_mocap/387.yaml`. In particular,  `zju_mocap_path` should be the directory path of the ZJU-Mocap dataset.\n\n```\ndataset:\n    zju_mocap_path: /path/to/zju_mocap\n    subject: '387'\n    sex: 'neutral'\n\n...\n```\n    \nFinally, run the data preprocessing script.\n\n    cd tools/prepare_zju_mocap\n    python prepare_dataset.py --cfg 387.yaml\n    cd ../../\n\n### `Train/Download models`\n\nNow you can either download a pre-trained model by running the script.\n\n    ./scripts/download_model.sh 387\n\nor train a model by yourself. We used 4 GPUs (NVIDIA RTX 2080 Ti) to train a model. \n\n    python train.py --cfg configs/human_nerf/zju_mocap/387/adventure.yaml\n\nFor sanity check, we provide a configuration that supports training on a single GPU (NVIDIA RTX 2080 Ti). Notice the performance is not guranteed for this configuration.\n\n    python train.py --cfg configs/human_nerf/zju_mocap/387/single_gpu.yaml\n\n### `Render output`\n\nRender the frame input (i.e., observed motion sequence).\n\n    python run.py \\\n        --type movement \\\n        --cfg configs/human_nerf/zju_mocap/387/adventure.yaml \n\nRun free-viewpoint rendering on a particular frame (e.g., frame 128).\n\n    python run.py \\\n        --type freeview \\\n        --cfg configs/human_nerf/zju_mocap/387/adventure.yaml \\\n        freeview.frame_idx 128\n\n\nRender the learned canonical appearance (T-pose).\n\n    python run.py \\\n        --type tpose \\\n        --cfg configs/human_nerf/zju_mocap/387/adventure.yaml \n\nIn addition, you can find the rendering scripts in `scripts/zju_mocap`.\n\n\n## Run on a Custom Monocular Video\n\nTo get the best result, we recommend a video clip that meets these requirements:\n\n- The clip has less than 600 frames (~20 seconds).\n- The human subject shows most of body regions (e.g., front and back view of the body) in the clip.\n\n### `Prepare a dataset`\n\nTo train on a monocular video, prepare your video data in `dataset/wild/monocular` with the following structure:\n\n    monocular\n        ├── images\n        │   └── ${item_id}.png\n        ├── masks\n        │   └── ${item_id}.png\n        └── metadata.json\n\nWe use `item_id` to match a video frame with its subject mask and metadata. An `item_id` is typically some alphanumeric string such as `000128`.\n\n#### **images**\n\nA collection of video frames, stored as PNG files.\n\n#### **masks**\n\nA collection of subject segmentation masks, stored as PNG files.\n\n#### **metadata.json**\n\nThis json file contains metadata for video frames, including:\n\n- human body pose (SMPL poses and betas coefficients)\n- camera pose (camera intrinsic and extrinsic matrices). We follow [OpenCV](https://learnopencv.com/geometry-of-image-formation/) camera coordinate system and use [pinhole camera model](https://staff.fnwi.uva.nl/r.vandenboomgaard/IPCV20162017/LectureNotes/CV/PinholeCamera/PinholeCamera.html).\n\nYou can run SMPL-based human pose detectors (e.g., [SPIN](https://github.com/nkolot/SPIN), [VIBE](https://github.com/mkocabas/VIBE), or [ROMP](https://github.com/Arthur151/ROMP)) on a monocular video to get body poses as well as camera poses. \n\n\n```javascript\n{\n  // Replace the string item_id with your file name of video frame.\n  \"item_id\": {\n        // A (72,) array: SMPL coefficients controlling body pose.\n        \"poses\": [\n            -3.1341, ..., 1.2532\n        ],\n        // A (10,) array: SMPL coefficients controlling body shape. \n        \"betas\": [\n            0.33019, ..., 1.0386\n        ],\n        // A 3x3 camera intrinsic matrix.\n        \"cam_intrinsics\": [\n            [23043.9, 0.0,940.19],\n            [0.0, 23043.9, 539.23],\n            [0.0, 0.0, 1.0]\n        ],\n        // A 4x4 camera extrinsic matrix.\n        \"cam_extrinsics\": [\n            [1.0, 0.0, 0.0, -0.005],\n            [0.0, 1.0, 0.0, 0.2218],\n            [0.0, 0.0, 1.0, 47.504],\n            [0.0, 0.0, 0.0, 1.0],\n        ],\n  }\n\n  ...\n\n  // Iterate every video frame.\n  \"item_id\": {\n      ...\n  }\n}\n```\n\nOnce the dataset is properly created, run the script to complete dataset preparation.\n\n    cd tools/prepare_wild\n    python prepare_dataset.py --cfg wild.yaml\n    cd ../../\n\n### `Train a model`\n\nNow we are ready to lanuch a training. By default, we used 4 GPUs (NVIDIA RTX 2080 Ti) to train a model. \n\n    python train.py --cfg configs/human_nerf/wild/monocular/adventure.yaml\n\nFor sanity check, we provide a single-GPU (NVIDIA RTX 2080 Ti) training config. Note the performance is not guaranteed for this configuration.\n\n    python train.py --cfg configs/human_nerf/wild/monocular/single_gpu.yaml\n\n### `Render output`\n\nRender the frame input (i.e., observed motion sequence).\n\n    python run.py \\\n        --type movement \\\n        --cfg configs/human_nerf/wild/monocular/adventure.yaml \n\nRun free-viewpoint rendering on a particular frame (e.g., frame 128).\n\n    python run.py \\\n        --type freeview \\\n        --cfg configs/human_nerf/wild/monocular/adventure.yaml \\\n        freeview.frame_idx 128\n\n\nRender the learned canonical appearance (T-pose).\n\n    python run.py \\\n        --type tpose \\\n        --cfg configs/human_nerf/wild/monocular/adventure.yaml \n\nIn addition, you can find the rendering scripts in `scripts/wild`.\n\n## Acknowledgement\n\nThe implementation took reference from [NeRF-PyTorch](https://github.com/yenchenlin/nerf-pytorch), [Neural Body](https://github.com/zju3dv/neuralbody), [Neural Volume](https://github.com/facebookresearch/neuralvolumes), [LPIPS](https://github.com/richzhang/PerceptualSimilarity), and [YACS](https://github.com/rbgirshick/yacs). We thank the authors for their generosity to release code.\n\n## Citation\n\nIf you find our work useful, please consider citing:\n\n```BibTeX\n@InProceedings{weng_humannerf_2022_cvpr,\n    title     = {Human{N}e{RF}: Free-Viewpoint Rendering of Moving People From Monocular Video},\n    author    = {Weng, Chung-Yi and \n                 Curless, Brian and \n                 Srinivasan, Pratul P. and \n                 Barron, Jonathan T. and \n                 Kemelmacher-Shlizerman, Ira},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2022},\n    pages     = {16210-16220}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchungyiweng%2Fhumannerf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchungyiweng%2Fhumannerf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchungyiweng%2Fhumannerf/lists"}