Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gulvarol/bodynet
BodyNet: Volumetric Inference of 3D Human Body Shapes, ECCV 2018
https://github.com/gulvarol/bodynet
Last synced: about 6 hours ago
JSON representation
BodyNet: Volumetric Inference of 3D Human Body Shapes, ECCV 2018
- Host: GitHub
- URL: https://github.com/gulvarol/bodynet
- Owner: gulvarol
- License: mit
- Created: 2018-08-09T15:42:06.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-09-06T17:36:42.000Z (about 6 years ago)
- Last Synced: 2024-08-03T17:12:31.450Z (3 months ago)
- Language: Lua
- Homepage: http://www.di.ens.fr/willow/research/bodynet
- Size: 19 MB
- Stars: 261
- Watchers: 16
- Forks: 42
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-3D-human - BodyNet\
README
# BodyNet: Volumetric Inference of 3D Human Body Shapes
[Gül Varol](http://www.di.ens.fr/~varol/), [Duygu Ceylan](http://www.duygu-ceylan.com/), [Bryan Russell](http://bryanrussell.org/), [Jimei Yang](https://research.adobe.com/person/jimei-yang/), [Ersin Yumer](http://www.meyumer.com/), [Ivan Laptev](http://www.di.ens.fr/~laptev/) and [Cordelia Schmid](http://lear.inrialpes.fr/~schmid/),
*BodyNet: Volumetric Inference of 3D Human Body Shapes*, ECCV 2018.[[Project page]](http://www.di.ens.fr/willow/research/bodynet/) [[arXiv]](https://arxiv.org/abs/1804.04875)
## Contents
* [1. Preparation](https://github.com/gulvarol/bodynet#1-preparation)
* [2. Training](https://github.com/gulvarol/bodynet#2-training)
* [3. Testing](https://github.com/gulvarol/bodynet#3-testing)
* [4. Fitting SMPL model](https://github.com/gulvarol/bodynet#4-fitting-smpl-model)
* [Citation](https://github.com/gulvarol/bodynet#citation)
* [Acknowledgements](https://github.com/gulvarol/bodynet#acknowledgements)## 1. Preparation
### 1.1. Requirements
* Datasets
* Download [SURREAL](https://github.com/gulvarol/surreal#1-download-surreal-dataset) and/or [Unite the People (UP)](http://files.is.tuebingen.mpg.de/classner/up/) dataset(s)
* Training
* Install [Torch](https://github.com/torch/distro) with [cuDNN](https://developer.nvidia.com/cudnn) support.
* Install [matio](https://github.com/soumith/matio-ffi.torch) by `luarocks install matio`
* Install [OpenCV-Torch](https://github.com/VisionLabs/torch-opencv) by `luarocks install cv`
* Tested on Linux with cuda v8 and cudNN v5.1.
* Pre-processing and fitting python scripts
* Python 2 environment with the following installed:
* [OpenDr](https://github.com/mattloper/opendr)
* [Chumpy](https://github.com/mattloper/chumpy)
* [OpenCV](https://pypi.org/project/opencv-python/)
* SMPL related
* Download [SMPL for python](http://smpl.is.tue.mpg.de/) and set `SMPL_PATH`
* Fix the naming: `mv basicmodel_m_lbs_10_207_0_v1.0.0 basicModel_m_lbs_10_207_0_v1.0.0`
* Do the following changes in the code `smpl_webuser/verts.py`:
```diff
- v_template, J, weights, kintree_table, bs_style, f,
+ v_template, J_regressor, weights, kintree_table, bs_style, f,
- if sp.issparse(J):
- regressor = J
- J_tmpx = MatVecMult(regressor, v_shaped[:,0])
- J_tmpy = MatVecMult(regressor, v_shaped[:,1])
- J_tmpz = MatVecMult(regressor, v_shaped[:,2])
+ if sp.issparse(J_regressor):
+ J_tmpx = MatVecMult(J_regressor, v_shaped[:,0])
+ J_tmpy = MatVecMult(J_regressor, v_shaped[:,1])
+ J_tmpz = MatVecMult(J_regressor, v_shaped[:,2])
+ assert(ischumpy(J_regressor))
- assert(ischumpy(J))
+ result.J_regressor = J_regressor
```
* Download [neutral SMPL model](https://github.com/classner/up/blob/master/models/3D/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl) and place under `models` folder of SMPL
* Download [SMPLify](http://smplify.is.tue.mpg.de/) and set `SMPLIFY_PATH`
* Voxelization related
* Download [binvox executable](http://www.patrickmin.com/binvox/) and set `BINVOX_PATH`
* Download [binvox python package](https://github.com/dimatura/binvox-rw-py) and set `BINVOX_PYTHON_PATH`### 1.2. Pre-processing for training
#### SURREAL voxelization
Loop over the dataset and run `preprocess_surreal_voxelize.py` for each `_info.mat` file by setting it with the `--input` option (for foreground and/or part voxels with the `--parts` option). The surface voxels are filled with `imfill` with the `preprocess_surreal_fillvoxels.m` script, but you could do it in python (e.g. `ndimage.binary_fill_holes(binvoxModel.data)`). Sample preprocessed data is included in `preprocessing/sample_data/surreal`.#### Preparing UP data
Loop over the dataset by running `preprocess_up_voxelize.py` to voxelize and to re-organize the dataset. Fill the voxels with `preprocess_up_fillvoxels.m`. Preprocess the segmentation maps with `preprocess_up_segm.m`. Sample preprocessed data is included in `preprocessing/sample_data/up`.### 1.3. Setup paths for training
Place the data under `~/datasets/SURREAL` and `~/datasets/UP` or change the `opt.dataRoot` in opts.lua. The outputs will be written to `~/cnn_saves//`, you can change the `opt.logRoot` to change the `cnn_saves` location.### 1.4. Download pre-trained models
We provide several pre-trained models used in the paper [bodynet.tar.gz (980MB)](https://lsh.paris.inria.fr/bodynet/bodynet.tar.gz). The content is explained in the [training section](https://github.com/gulvarol/bodynet#2-training). Extract the `.t7` files and place them under `models/t7` directory.
``` bash
# Trained on SURREAL
model_segm_cmu.t7
model_joints3D_cmu.t7
model_voxels_cmu.t7
model_voxels_FVSV_cmu.t7
model_partvoxels_FVSV_cmu.t7
model_bodynet_cmu.t7
# Trained on UP
model_segm_UP.t7
model_joints3D_UP.t7
model_voxels_FVSV_UP.t7
model_voxels_FVSV_UP_manualsegm.t7
model_bodynet_UP.t7
# Trained on MPII
model_joints2D.t7
```## 2. Training
There are sample scripts under `training/exp/backup` directory. These were created automatically using the `training/exp/run.sh` script. For example the following `run.sh` script:
``` bash
source create_exp.sh -hinput="rgb"
supervision="segm15joints2Djoints3Dvoxels"
inputtype="gt"
extra_args="_FVSV"
running_mode="train"
#modelno=1
dataset="cmu"create_cmd
cmd="${return_str} \\
-batchSize 4 \\
-modelVoxels models/t7/model_voxels_FVSV_cmu.t7 \\
-proj silhFVSV \\
"
run_cmd
```generates and runs the following script:
``` bash
cd ..
qlua main.lua \
-dirName segm15joints2Djoints3Dvoxels/rgb/gt_FVSV \
-input rgb \
-supervision segm15joints2Djoints3Dvoxels \
-datasetname cmu \
-batchSize 4 \
-modelVoxels models/t7/model_voxels_FVSV_cmu.t7 \
-proj silhFVSV \
```This trains the final version of the model described in the paper, i.e., training end-to-end network with pre-trained subnetworks with multi-task losses and multi-view re-projection losses. If you manage to run this on the SURREAL dataset, the standard output should resemble the following:
```
Epoch: [1][1/2000] Time: 66.197, Err: 0.170 PCK: 87.50, PixelAcc: 68.36, IOU: 55.03, RMSE: 0.00, PE3Dvol: 33.39, IOUvox: 66.56, IOUprojFV: 92.89, IOUprojSV: 75.56, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 192.286
Epoch: [1][2/2000] Time: 1.240, Err: 0.472 PCK: 87.50, PixelAcc: 21.38, IOU: 18.79, RMSE: 0.00, PE3Dvol: 44.63, IOUvox: 44.89, IOUprojFV: 73.05, IOUprojSV: 65.19, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 0.237
Epoch: [1][3/2000] Time: 1.040, Err: 0.318 PCK: 65.00, PixelAcc: 49.58, IOU: 35.99, RMSE: 0.00, PE3Dvol: 52.92, IOUvox: 57.04, IOUprojFV: 86.97, IOUprojSV: 66.29, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 0.570
Epoch: [1][4/2000] Time: 1.678, Err: 0.771 PCK: 50.00, PixelAcc: 42.95, IOU: 36.04, RMSE: 0.00, PE3Dvol: 99.04, IOUvox: 52.74, IOUprojFV: 83.87, IOUprojSV: 64.07, IOUp
artvox: 0.00, LR: 1e-03, DataLoadingTime 0.101
```2D pose (PCK), 2D body part segmentation (`PixelAcc`, `IOU`), depth (`RMSE`), 3D pose (`PE3Dvol`), voxel prediction (`IOUvox`), side-view and front-view re-projection (`IOUprojFV`, `IOUprojSV`) performances are reported at each iteration.
The final network is a result of a multi-stage training.
* SubNet1 - `model_segm_cmu.t7`. RGB -> **Segm**
* obtained from [here](https://github.com/gulvarol/surreal) and the first two stacks are extracted
* SubNet2 - `model_joints2D.t7`. RGB -> **Joints2D**
* trained on MPII with 8 stacks, and the first two stacks are extracted
* SubNet3 - `model_joints3D_cmu.t7`. RGB + Segm + Joints2D -> **Joints3D**
* trained from scratch with 2 stacks using predicted segmentation (SubNet1) and 2D pose (SubNet2)
* SubNet4 - `model_voxels_cmu.t7`. RGB + Segm + Joints2D + Joints3D -> **Voxels**
* trained from scratch with 2 stacks using predicted segmentation (SubNet1), 2D pose (SubNet2), and 3D pose (SubNet3)
* SubNet5 - `model_voxels_FVSV_cmu.t7`. RGB + Segm + Joints2D + Joints3D -> **Voxels + FV + SV**
* pre-trained from SubNet4 with the additional losses on re-projection
* BodyNet - `model_bodynet_cmu.t7`. RGB -> **Segm + Joints2D + Joints3D + Voxels + FV + SV**
* a combination of SubNet1, SubNet2, SubNet3, SubNet4, and SubNet5
* fine-tuned end-to-end with multi-task lossesNote that the performance with 8 stacks is generally better, but we preferred to reduce the complexity with the cost of a little performance.
Above recipe is used for the SURREAL dataset. For the UP dataset, we first fine-tuned the SubNet1 `model_segm_UP.t7` (SubNet1_UP). Then, we fine-tuned SubNet3 `model_joints3D_UP.t7` (SubNet3_UP) using SubNet1_UP and SubNet2. Finally, we fine-tuned SubNet5 `model_voxels_FVSV_UP.t7` (SubNet5_UP) using SubNet1_UP, SubNet2, and SubNet3_UP. All these are fine-tuned end-to-end to obtain `model_bodynet_UP.t7`. The model used in the paper for experimenting with the manual segmentations is also provided `model_voxels_FVSV_UP_manualsegm.t7`.
### Part Voxels
We use the script `models/init_partvoxels.lua` to copy the last layer weights 7 times (6 body parts + 1 background) to initialize the part voxels model (`models/t7/init_partvoxels.t7`). After training this model without re-projection losses, we fine-tune it with re-projection loss. `model_partvoxels_cmu.t7` is the best model obtained. With end-to-end fine-tuning, we had divergence problems and did not put too much effort to make it work. Note that this model is preliminary and needs improvement.### Misc
A few functionalities of the code are not used in the paper; however, still provided. These include training 3D pose and voxels networks using ground truth (GT) segmentation/2D pose/3D pose inputs, as well as mixing the predicted and GT inputs at each batch. This is achieved by setting the `mix` option to true. The results of only using predicted inputs are often comparable to using a mix, therefore we always used only predictions. Predictions are passed as input using the `applyHG` option, which is not very efficient.## 3. Testing
Use the demo script to apply the provided models on sample images.
```
qlua demo/demo.lua
```
You can also use `demo/demo.m` Matlab script to produce visualizations.## 4. Fitting SMPL model
Fitting scripts for SURREAL (`fitting/fit_surreal.py`) and UP (`fitting/fit_up.py`) datasets are provided with sample experiment outputs. The scripts use the optimization functions from `tools/smpl_utils.py`.## Citation
If you use this code, please cite the following:```
@INPROCEEDINGS{varol18_bodynet,
title = {{BodyNet}: Volumetric Inference of {3D} Human Body Shapes},
author = {Varol, G{\"u}l and Ceylan, Duygu and Russell, Bryan and Yang, Jimei and Yumer, Ersin and Laptev, Ivan and Schmid, Cordelia},
booktitle = {ECCV},
year = {2018}
}
```## Acknowledgements
The training code is an extension of the [SURREAL training code](https://github.com/gulvarol/surreal) which is largely built on the ImageNet training example [https://github.com/soumith/imagenet-multiGPU.torch](https://github.com/soumith/imagenet-multiGPU.torch) by [Soumith Chintala](https://github.com/soumith/), and [Stacked Hourglass Networks](https://github.com/umich-vl/pose-hg-train) by [Alejandro Newell](https://github.com/anewell).The fitting code is an extension of the [SMPLify code](http://smplify.is.tue.mpg.de/).