{"id":13545496,"url":"https://github.com/facebookresearch/VideoPose3D","last_synced_at":"2025-04-02T15:31:15.219Z","repository":{"id":37692422,"uuid":"153697606","full_name":"facebookresearch/VideoPose3D","owner":"facebookresearch","description":"Efficient 3D human pose estimation in video using 2D keypoint trajectories","archived":true,"fork":false,"pushed_at":"2022-12-10T16:22:15.000Z","size":9757,"stargazers_count":3791,"open_issues_count":161,"forks_count":760,"subscribers_count":103,"default_branch":"main","last_synced_at":"2025-02-23T00:14:07.352Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-18T22:59:49.000Z","updated_at":"2025-02-21T01:52:21.000Z","dependencies_parsed_at":"2022-07-09T04:17:02.761Z","dependency_job_id":null,"html_url":"https://github.com/facebookresearch/VideoPose3D","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FVideoPose3D","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FVideoPose3D/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FVideoPose3D/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2FVideoPose3D/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/VideoPose3D/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246841666,"owners_count":20842630,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T11:01:04.008Z","updated_at":"2025-04-02T15:31:10.209Z","avatar_url":"https://github.com/facebookresearch.png","language":"Python","readme":"# 3D human pose estimation in video with temporal convolutions and semi-supervised training\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/convolutions_anim.gif\" width=\"50%\" alt=\"\" /\u003e\u003c/p\u003e\n\nThis is the implementation of the approach described in the paper:\n\u003e Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. [3D human pose estimation in video with temporal convolutions and semi-supervised training](https://arxiv.org/abs/1811.11742). In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.\n\nMore demos are available at https://dariopavllo.github.io/VideoPose3D\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/demo_yt.gif\" width=\"70%\" alt=\"\" /\u003e\u003c/p\u003e\n\n![](images/demo_temporal.gif)\n\n### Results on Human3.6M\nUnder Protocol 1 (mean per-joint position error) and Protocol 2 (mean-per-joint position error after rigid alignment).\n\n| 2D Detections | BBoxes | Blocks | Receptive Field | Error (P1) | Error (P2) |\n|:-------|:-------:|:-------:|:-------:|:-------:|:-------:|\n| CPN | Mask R-CNN  | 4 | 243 frames | **46.8 mm** | **36.5 mm** |\n| CPN | Ground truth | 4 | 243 frames | 47.1 mm | 36.8 mm |\n| CPN | Ground truth | 3 | 81 frames | 47.7 mm | 37.2 mm |\n| CPN | Ground truth | 2 | 27 frames | 48.8 mm | 38.0 mm |\n| Mask R-CNN | Mask R-CNN | 4 | 243 frames | 51.6 mm | 40.3 mm |\n| Ground truth | -- | 4 | 243 frames | 37.2 mm | 27.2 mm |\n\n## Quick start\nTo get started as quickly as possible, follow the instructions in this section. This should allow you train a model from scratch, test our pretrained models, and produce basic visualizations. For more detailed instructions, please refer to [`DOCUMENTATION.md`](DOCUMENTATION.md).\n\n### Dependencies\nMake sure you have the following dependencies installed before proceeding:\n- Python 3+ distribution\n- PyTorch \u003e= 0.4.0\n\nOptional:\n- Matplotlib, if you want to visualize predictions. Additionally, you need *ffmpeg* to export MP4 videos, and *imagemagick* to export GIFs.\n- MATLAB, if you want to experiment with HumanEva-I (you need this to convert the dataset). \n\n### Dataset setup\nYou can find the instructions for setting up the Human3.6M and HumanEva-I datasets in [`DATASETS.md`](DATASETS.md). For this short guide, we focus on Human3.6M. You are not required to setup HumanEva, unless you want to experiment with it.\n\nIn order to proceed, you must also copy CPN detections (for Human3.6M) and/or Mask R-CNN detections (for HumanEva).\n\n### Evaluating our pretrained models\nThe pretrained models can be downloaded from AWS. Put `pretrained_h36m_cpn.bin` (for Human3.6M) and/or `pretrained_humaneva15_detectron.bin` (for HumanEva) in the `checkpoint/` directory (create it if it does not exist).\n```sh\nmkdir checkpoint\ncd checkpoint\nwget https://dl.fbaipublicfiles.com/video-pose-3d/pretrained_h36m_cpn.bin\nwget https://dl.fbaipublicfiles.com/video-pose-3d/pretrained_humaneva15_detectron.bin\ncd ..\n```\n\nThese models allow you to reproduce our top-performing baselines, which are:\n- 46.8 mm for Human3.6M, using fine-tuned CPN detections, bounding boxes from Mask R-CNN, and an architecture with a receptive field of 243 frames.\n- 33.0 mm for HumanEva-I (on 3 actions), using pretrained Mask R-CNN detections, and an architecture with a receptive field of 27 frames. This is the multi-action model trained on 3 actions (Walk, Jog, Box).\n\nTo test on Human3.6M, run:\n```\npython run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin\n```\n\nTo test on HumanEva, run:\n```\npython run.py -d humaneva15 -k detectron_pt_coco -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -a Walk,Jog,Box --by-subject -c checkpoint --evaluate pretrained_humaneva15_detectron.bin\n```\n\n[`DOCUMENTATION.md`](DOCUMENTATION.md) provides a precise description of all command-line arguments.\n\n### Inference in the wild\nWe have introduced an experimental feature to run our model on custom videos. See [`INFERENCE.md`](INFERENCE.md) for more details.\n\n### Training from scratch\nIf you want to reproduce the results of our pretrained models, run the following commands.\n\nFor Human3.6M:\n```\npython run.py -e 80 -k cpn_ft_h36m_dbb -arc 3,3,3,3,3\n```\nBy default the application runs in training mode. This will train a new model for 80 epochs, using fine-tuned CPN detections. Expect a training time of 24 hours on a high-end Pascal GPU. If you feel that this is too much, or your GPU is not powerful enough, you can train a model with a smaller receptive field, e.g.\n- `-arc 3,3,3,3` (81 frames) should require 11 hours and achieve 47.7 mm. \n- `-arc 3,3,3` (27 frames) should require 6 hours and achieve 48.8 mm.\n\nYou could also lower the number of epochs from 80 to 60 with a negligible impact on the result.\n\nFor HumanEva:\n```\npython run.py -d humaneva15 -k detectron_pt_coco -str Train/S1,Train/S2,Train/S3 -ste Validate/S1,Validate/S2,Validate/S3 -b 128 -e 1000 -lrd 0.996 -a Walk,Jog,Box --by-subject\n```\nThis will train for 1000 epochs, using Mask R-CNN detections and evaluating each subject separately.\nSince HumanEva is much smaller than Human3.6M, training should require about 50 minutes.\n\n### Semi-supervised training\nTo perform semi-supervised training, you just need to add the `--subjects-unlabeled` argument. In the example below, we use ground-truth 2D poses as input, and train supervised on just 10% of Subject 1 (specified by `--subset 0.1`). The remaining subjects are treated as unlabeled data and are used for semi-supervision.\n```\npython run.py -k gt --subjects-train S1 --subset 0.1 --subjects-unlabeled S5,S6,S7,S8 -e 200 -lrd 0.98 -arc 3,3,3 --warmup 5 -b 64\n```\nThis should give you an error around 65.2 mm. By contrast, if we only train supervised\n```\npython run.py -k gt --subjects-train S1 --subset 0.1 -e 200 -lrd 0.98 -arc 3,3,3 -b 64\n```\nwe get around 80.7 mm, which is significantly higher.\n\n### Visualization\nIf you have the original Human3.6M videos, you can generate nice visualizations of the model predictions. For instance:\n```\npython run.py -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -c checkpoint --evaluate pretrained_h36m_cpn.bin --render --viz-subject S11 --viz-action Walking --viz-camera 0 --viz-video \"/path/to/videos/S11/Videos/Walking.54138969.mp4\" --viz-output output.gif --viz-size 3 --viz-downsample 2 --viz-limit 60\n```\nThe script can also export MP4 videos, and supports a variety of parameters (e.g. downsampling/FPS, size, bitrate). See [`DOCUMENTATION.md`](DOCUMENTATION.md) for more details.\n\n## License\nThis work is licensed under CC BY-NC. See LICENSE for details. Third-party datasets are subject to their respective licenses.\nIf you use our code/models in your research, please cite our paper:\n```\n@inproceedings{pavllo:videopose3d:2019,\n  title={3D human pose estimation in video with temporal convolutions and semi-supervised training},\n  author={Pavllo, Dario and Feichtenhofer, Christoph and Grangier, David and Auli, Michael},\n  booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},\n  year={2019}\n}\n```\n","funding_links":[],"categories":["Python","🎭 3D Human Pose Estimation","**Programming (learning)**","Pose Estimation"],"sub_categories":["📚 Key Resources","**Developer\\'s Tools**","Implementations"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FVideoPose3D","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2FVideoPose3D","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2FVideoPose3D/lists"}