{"id":34681588,"url":"https://github.com/guiggh/hand_pose_action","last_synced_at":"2025-12-24T21:12:02.478Z","repository":{"id":41070638,"uuid":"149117007","full_name":"guiggh/hand_pose_action","owner":"guiggh","description":"Dataset and code for the paper \"First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations\", CVPR 2018.","archived":false,"fork":false,"pushed_at":"2019-02-20T17:41:37.000Z","size":22099,"stargazers_count":253,"open_issues_count":1,"forks_count":32,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-07-23T13:04:35.812Z","etag":null,"topics":["action-recognition","benchmark","computer-vision","dataset","hand-pose-estimation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guiggh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-17T11:47:40.000Z","updated_at":"2024-05-26T02:33:28.000Z","dependencies_parsed_at":"2022-08-28T21:11:05.850Z","dependency_job_id":null,"html_url":"https://github.com/guiggh/hand_pose_action","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/guiggh/hand_pose_action","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiggh%2Fhand_pose_action","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiggh%2Fhand_pose_action/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiggh%2Fhand_pose_action/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiggh%2Fhand_pose_action/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guiggh","download_url":"https://codeload.github.com/guiggh/hand_pose_action/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guiggh%2Fhand_pose_action/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28008543,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-24T02:00:07.193Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["action-recognition","benchmark","computer-vision","dataset","hand-pose-estimation"],"created_at":"2025-12-24T21:12:01.978Z","updated_at":"2025-12-24T21:12:02.458Z","avatar_url":"https://github.com/guiggh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations\nThis repository contains instructions on getting the data and code of the work `First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations` presented at CVPR 2018. For more information on the benchmark please check out [[1]](#refs).\n\n### Downloading the data\nPlease fill this [form](https://goo.gl/forms/FIsXpYVIUov0j7Wv2) to download the dataset after reading the [terms and conditions](#terms).\n\n### Dataset structure:\n\nThe dataset is organized as the following example:\n\n- File `Video_files/Subject_1/put_salt/1/color/color_0015.jpeg`\nConsists of frame number 15 of the color stream of the 1st repetition of \naction class \"put salt\" by subject number 1.\n\n- File `Video_files/Subject_1/put_salt/1/depth/depth_0015.png`\nCConsists of frame number 15 of the depth stream of the 1st repetition of \naction class \"put salt\" by subject number 1.\n\n- File `Hand_pose_annotation_v1_1/Subject_1/put_salt/1/skeleton.txt`\nContains the hand pose (in world coordinates) for the sequence: repetition \n1 of action class \"put salt\" by subject number 1. \n\n- File `Object_6D_pose_annotation_v1/Subject_1/put_salt/1/object_pose.txt`\nContains the 6D object pose for the sequence: repetition \n1 of action class \"put salt\" by subject number 1. \n\nComment: Check Figure 3 and 4 of the paper to learn about action categories. We used a slightly different nomenclature for some actions compared to the paper. These are: \"dish soap -\u003e liquid soap\"; \"read paper -\u003e read letter\"; \"use spray -\u003e use flash\". \nNote: Check `Subjects_info` folder for details on number of sequences, frames, etc. for each subject. The following sequences can be ignored (they were not used in the paper): 'Subject_2/close_milk/4', 'Subject_2/put_tea_bag/2' and 'Subject_4/flip_sponge/2'.\n\n### Image data details\n* Camera: Intel RealSense SR300.\n* Color data: 1920x1080 32bit, jpeg format.\n* Depth data: 640x480 16bit, png format.\n\n### Hand pose data:\nFormat of each line of skeleton.txt:\n`t x_1 y_1 z_1 x_2 y_2 z_2 ... x_21 y_21 z_21`\n\nwhere `t` is the frame number and `x_i y_i z_i` are the world coordinates (in mm) of joint `i` at frame `t`.\n\nHand joints are organised as follows:\n`[Wrist, TMCP, IMCP, MMCP, RMCP, PMCP, TPIP, TDIP, TTIP, IPIP, IDIP, ITIP, MPIP, MDIP, MTIP, RPIP, RDIP, RTIP, PPIP, PDIP, PTIP]`, where ’T’, ’I’, ’M’, ’R’, ’P’ denote ’Thumb’, ’Index’, ’Middle’, ’Ring’, ’Pinky’ fingers.  \n\n\u003cimg src=\"hand_model.png\" alt=\"hand_model\" width=\"400\" class=\"center\"/\u003e\n\nCheck out the scripts `load_example.x` (.py for Python and .m for Matlab) for examples on how to visualise the hand pose on both color and depth images.\n\n**Updated 20/02/2019**:  We also provide action sequences with normalized hand poses.  Normalization of hand poses is essential to replicate the action recognition results on the paper. It's briefly mentioned on the paper, but if you want to normalize the hand poses you will need to: compute average distance among subjects between joints, normalize the distance between joints to have the same distance on every frame and subject, make the wrist the origin of coordinates for each frame and (optional but helps) align the wrist with one of the axis by rotating the 3D skeleton. \n\n### Object pose data:\nAvailable objects: 'juice carton', 'milk bottle', 'salt' and 'liquid soap'.\nFormat of each line of object_pose.txt:\n\n`t M11 M21 M31 M41 M12 ... Mij... M44`\n\nwhere `Mij` is the element of the transformation matrix `M` at row `i` and column `j`. \n\nCheck the Python code `load_example.py` to see an example on how to visualise the object model for a given pose on top of the image.\n\n### Object models\nAvailable objects: 'juice carton', 'milk bottle', 'salt' and 'liquid soap'.\n\nFormat [.PLY](https://en.wikipedia.org/wiki/PLY_(file_format)). Each object comes with a texture file `texture.jpg`. Coordinates are in meters (in contrast to mm for hand poses).\n\nJuice carton and milk bottle objects also appear in this popular [6D object pose estimation dataset](http://rkouskou.gitlab.io/research/LCHF.html) and part of the recent [6D ECCV 2018 benchmark](https://arxiv.org/abs/1808.08319). We recaptured the object models attempting to improve the quality. Feel free to use the older [models](http://rkouskou.gitlab.io/research/LCHF.html), however our object pose data is annotated for the new models.\n\nComment: The milk bottle model is not exactly the same as the one used when capturing the dataset. The object got lost (campus cleaning services) and when we bought the milk model again the brand had (slightly) changed the bottle design.\n\n### Camera parameters:\n#### Depth sensor (intrinsics)\nImage center:\n* u0 = 315.944855;\n* v0 = 245.287079;\n\nFocal Length:\n* fx = 475.065948;\n* fy = 475.065857;\n\n#### RGB sensor (intrinsics)\nImage center:\n* u0 = 935.732544;\n* v0 = 540.681030;\n\nFocal Length:\n* fx = 1395.749023;\n* fy = 1395.749268;\n\n#### Extrinsics\nR = [0.999988496304, -0.00468848412856, 0.000982563360594;\n     0.00469115935266, 0.999985218048, -0.00273845880292;\n    -0.000969709653873, 0.00274303671904, 0.99999576807;\n     0,0,0];\n \nt = [25.7; 1.22; 3.902; 1];\n\n### Benchmark tasks \nIn this section we describe the protocols used for the experiments on the paper.\n#### Action recognition\ndata_split_action_recognition.txt contains the 1:1 split reported on the paper. These are the files you should use for training and testing if you want to compare with the results reported.\n\n#### Hand pose estimation\n- Cross subject: training subjects are 1, 3, 4. The rest for test.\n- Cross object:  test scenario includes all actions with the following objects 'peanut butter', 'fork', 'milk', 'tea', 'liquid soap', 'spray/flash', 'paper' (including reading letter), 'calculator', 'phone', 'coin', 'card' and 'wine bottle'. The rest of objects are for training.\n\n### Terms and conditions\n\u003ca name=\"terms\"\u003e\u003c/a\u003e\nThe download and use of the dataset is released for academic research only and it is free to researchers from educational or research institutes for non-commercial purposes. When downloading the dataset you agree to (unless with expressed permission of the authors): not redistribute, modificate, or commercial usage of this dataset in any way or form, either partially or entirely.\n\nIf using this dataset, please cite the following paper:\n\n```\n@inproceedings{FirstPersonAction_CVPR2018,\n  title={First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations},\n  author={Garcia-Hernando, Guillermo and Yuan, Shanxin and Baek, Seungryul and Kim, Tae-Kyun}\n  booktitle = {Proceedings of Computer Vision and Pattern Recognition ({CVPR})},\n  year = {2018}\n}\n```\n\n### Acknowledgments\nThis dataset is part of Imperial College London-Samsung Research project, supported by Samsung Electronics.\n\nAuthors thank Gabriel Garcia for object model acquisition and [Yana Hasson](https://github.com/hassony2) for providing Python scripts and feedback on the dataset.\n\n### References\n\u003ca name=\"refs\"\u003e\u003c/a\u003e\n\n[1] *First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations*, Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek and Tae-Kyun Kim, CVPR 2018. [arXiv](https://arxiv.org/abs/1704.02463)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguiggh%2Fhand_pose_action","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguiggh%2Fhand_pose_action","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguiggh%2Fhand_pose_action/lists"}