{"id":29166296,"url":"https://github.com/lxxue/hsr-data-preprocessing","last_synced_at":"2025-07-29T23:05:36.575Z","repository":{"id":299853477,"uuid":"906598760","full_name":"lxxue/HSR-data-preprocessing","owner":"lxxue","description":"A preprocessing pipeline for monocular videos containing human motion in static scenes","archived":false,"fork":false,"pushed_at":"2025-06-18T16:04:59.000Z","size":53,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-18T17:23:30.008Z","etag":null,"topics":["camera-pose-estimation","human-pose-estimation"],"latest_commit_sha":null,"homepage":"https://lxxue.github.io/human-scene-recon/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lxxue.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-21T11:00:14.000Z","updated_at":"2025-06-18T16:05:03.000Z","dependencies_parsed_at":"2025-06-18T17:23:41.044Z","dependency_job_id":"6fd0c9a1-961f-4a07-aba2-decb3c1d7539","html_url":"https://github.com/lxxue/HSR-data-preprocessing","commit_stats":null,"previous_names":["lxxue/hsr-data-preprocessing"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lxxue/HSR-data-preprocessing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxxue%2FHSR-data-preprocessing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxxue%2FHSR-data-preprocessing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxxue%2FHSR-data-preprocessing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxxue%2FHSR-data-preprocessing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lxxue","download_url":"https://codeload.github.com/lxxue/HSR-data-preprocessing/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lxxue%2FHSR-data-preprocessing/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262927625,"owners_count":23385994,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["camera-pose-estimation","human-pose-estimation"],"created_at":"2025-07-01T08:30:53.166Z","updated_at":"2025-07-01T08:33:17.260Z","avatar_url":"https://github.com/lxxue.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HSR-data-preprocessing\n\nThis repository provides a preprocessing pipeline for monocular videos containing human motion in static scenes. \n\nGiven an input video, our pipeline estimates camera poses, reconstructs human poses in world coordinates, and extracts monocular geometric cues (depth and surface normals). \nThe processed data can then be used by [HSR](https://github.com/lxxue/HSR) to create human-scene reconstructions.\n\nThis preprocessing pipeline is maintained as a standalone repository to facilitate its use in other applications beyond HSR.\n\n\n\n## General pipeline\n\nThe pipeline consists of the following sequential steps:\n\n0. Extract and select sharp frames from a video or an image sequence\n\n1. Generate human masks\n\n2. Estimate camera poses \n\n3. Generate monocular depth and normal maps\n\n4. Estimate human poses in the camera coordinate frame\n\n5. Extract human 2D keypoints\n\n6. Refine human poses with 2D keypoints and temporal smoothness\n\n7. Align human poses in the world coordinate frame and scale scene to metric units using human body scale\n\n8. Save processed data in HSR-compatible format\n\n## Setup\n\n\nClone the repository and its submodules:\n\n```bash\ngit clone https://github.com/lxxue/HSR-data-preprocessing.git --recursive\n```\n\nSetup the environment for the main repository and `Grounded-SAM2` / `hloc` / `ROMP`:\n```bash\nconda create -n hsr-data python=3.10\nconda activate hsr-data\n# SAM2.1 requires torch \u003e=2.5.1\npip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121\n\n# For Grounded-SAM2\ncd third_party/Grouned-SAM-2\ncd checkpoints\nbash download_ckpts.sh\ncd ../\ncd gdino_checkpoints\nbash download_ckpts.sh\ncd ../\nexport CUDA_HOME=\"/usr/local/cuda-12.1\"\npip install -e .\npip install --no-build-isolation -e grounding_dino\npip install opencv-python supervision transformers addict yapf pycocotools timm\n\n# For hloc\ncd ../../\ncd third_party/Hierarchical-Localization\ngit submodule update --init --recursive\npip install -e .\npip install pyquaternion scipy\n\npip install cython \npip install simple-romp\npip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu121_pyt251/download.html\npip install smplx open3d\n```\n\nCreate a separate environment for Metric3Dv2 following the [official instructions](https://github.com/YvanYin/Metric3D?tab=readme-ov-file#-installation).\n\nBuild openpose python package following the [official guide](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/installation/0_index.md).\n\nUpdate python paths in [process_data.py](./process_data.py) and [cmd.sh](./cmd.sh).\n\n```python\n# process_data.py\nSAM2_PYTHON_PATH = \"/home/lixin/miniconda3/envs/hsr-data/bin/python\"\nMETRIC3D_PYTHON_PATH = \"/home/lixin/miniconda3/envs/metric3d/bin/python\"\nOPENPOSE_PYTHON_PATH = \"/usr/bin/python3\"\nOPENPOSE_MODEL_PATH = \"/home/lixin/softwares/openpose/models/\"\n\n# cmd.sh\nSAM2_PYTHON_PATH=\"/home/lixin/miniconda3/envs/hsr-data/bin/python\"\nMETRIC3D_PYTHON_PATH=\"/home/lixin/miniconda3/envs/metric3d/bin/python\"\nOPENPOSE_PYTHON_PATH=\"/usr/bin/python3\"\nOPENPOSE_MODEL_PATH=\"/home/lixin/softwares/openpose/models/\"\n\n```\n\nDownload [SMPL model](https://smpl.is.tue.mpg.de/download.php) (version 1.1.0 for Python 2.7 (female/male)) and place them under `checkpoints/smpl`:\n\n```bash\nmkdir -p checkpoints/smpl\nmv /path_to_smpl_models/basicmodel_f_lbs_10_207_0_v1.1.0.pkl checkpoints/smpl/SMPL_FEMALE.pkl\nmv /path_to_smpl_models/basicmodel_m_lbs_10_207_0_v1.1.0.pkl checkpoints/smpl/SMPL_MALE.pkl\n```\n\nPrepare SMPL model files needed by ROMP according to the [official instructions](https://github.com/Arthur151/ROMP/blob/a8558aed480af850756f84e2a7c787e359bddbd0/trace/README.md#metadata) and place them under `checkpoints/romp`:\n\n```bash\nmkdir -p checkpoints/romp\nmv /path_to_romp_models/SMPL_MALE.pth checkpoints/romp/SMPL_MALE.pth\nmv /path_to_romp_models/SMPL_FEMALE.pth checkpoints/romp/SMPL_FEMALE.pth\n```\n\n## Usage\n\nWe provide a python script [process_data.py](./process_data.py) and a shell script [run_process_data.sh](./run_process_data.sh) as examples to process the data.\n\n```bash\n# Modify the arguments in run_process_data.sh first to fit your data\n# Run each step with indices, e.g. 0 1 2 (modify indices as needed)\nbash run_process_data.sh 0 1 2\n```\n\nYou can also run each step separately by uncommenting the corresponding command in [cmd.sh](./cmd.sh).\n\n```bash\nbash cmd.sh\n```\n\nEach script contains detailed documentation of its functionality. For example, in [select_frames.py](./select_frames.py):\n\n```python\n\"\"\"\nFrame Selection Utility for Videos and Image Sequences\n\nArguments:\n    --input_path: path to the input video file or a directory of images\n    --data_dir: output directory for the processed data \n    --window_size: number of frames to consider in each selection window (default: 10)\n    --frame_start: starting frame number to process (default: 0)\n    --frame_end: ending frame number (inclusive) to process (default: 1000000)\n    --image_resize_factor: factor by which to reduce image size (1, 2, 4, or 8)\n\nOutput Structure:\n    data_dir/\n    ├── images/\n    │   ├── all_frames/         # Contains all processed frames\n    │   ├── selected_frames/    # Contains selected sharp frames\n    │   └── selected_idxs.npy   # Numpy array of selected frame indices\n\"\"\"\n```\n\n\n## Acknowledgements\n\n\nThis work builds upon several excellent open-source projects. We would like to thank the authors of:\n[Vid2Avatar](https://github.com/MoyGcc/vid2avatar), \n[NeuMAN](https://github.com/apple/ml-neuman), \n[hloc](https://github.com/cvg/Hierarchical-Localization),\n[colmap](https://github.com/colmap/colmap)\n[Metric3D](https://github.com/YvanYin/Metric3D), \n[Grounded-SAM2](https://github.com/IDEA-Research/Grounded-SAM-2)\n[openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose), \n[ROMP](https://github.com/Arthur151/ROMP) \n.\n\n## BibTex\n\nIf you find this work useful for your research, please consider citing our paper:\n\n```\n@inproceedings{xue2024hsr,\n    author={Xue, Lixin and Guo, Chen and Zheng, Chengwei and Wang, Fangjinhua and Jiang, Tianjian and Ho, Hsuan-I and Kaufmann, Manuel and Song, Jie and Hilliges Otmar},\n    title={{HSR:} Holistic 3D Human-Scene Reconstruction from Monocular Videos},\n    booktitle={European Conference on Computer Vision (ECCV)},\n    year={2024}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flxxue%2Fhsr-data-preprocessing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flxxue%2Fhsr-data-preprocessing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flxxue%2Fhsr-data-preprocessing/lists"}