{"id":26478659,"url":"https://github.com/kenshohara/3d-resnets","last_synced_at":"2026-03-17T23:35:41.681Z","repository":{"id":41454807,"uuid":"100664462","full_name":"kenshohara/3D-ResNets","owner":"kenshohara","description":"3D ResNets for Action Recognition","archived":false,"fork":false,"pushed_at":"2017-11-29T05:25:31.000Z","size":29,"stargazers_count":109,"open_issues_count":1,"forks_count":21,"subscribers_count":6,"default_branch":"master","last_synced_at":"2023-11-07T17:25:31.776Z","etag":null,"topics":["action-recognition","computer-vision","deep-learning","lua","torch7","video-recognition"],"latest_commit_sha":null,"homepage":"","language":"Lua","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kenshohara.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-18T02:27:56.000Z","updated_at":"2023-09-10T15:57:59.000Z","dependencies_parsed_at":"2022-09-21T09:31:50.472Z","dependency_job_id":null,"html_url":"https://github.com/kenshohara/3D-ResNets","commit_stats":null,"previous_names":[],"tags_count":1,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenshohara%2F3D-ResNets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenshohara%2F3D-ResNets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenshohara%2F3D-ResNets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kenshohara%2F3D-ResNets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kenshohara","download_url":"https://codeload.github.com/kenshohara/3D-ResNets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244531021,"owners_count":20467392,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["action-recognition","computer-vision","deep-learning","lua","torch7","video-recognition"],"created_at":"2025-03-20T01:16:48.953Z","updated_at":"2026-03-17T23:35:41.629Z","avatar_url":"https://github.com/kenshohara.png","language":"Lua","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 3D ResNets for Action Recognition\nThis is the PyTorch code for the following papers:\n\n[\nKensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,  \n\"Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?\",  \narXiv preprint, arXiv:1711.09577, 2017.\n](https://arxiv.org/abs/1711.09577)\n\n[\nKensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,  \n\"Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition\",  \nProceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.\n](http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w44/Hara_Learning_Spatio-Temporal_Features_ICCV_2017_paper.pdf)\n\nThis code includes only training and testing on the ActivityNet and Kinetics datasets.  \n**If you want to classify your videos using our pretrained models,\nuse [this code](https://github.com/kenshohara/video-classification-3d-cnn).**\n\n**The PyTorch (python) version of this code is available [here](https://github.com/kenshohara/3D-ResNets-PyTorch).**  \nThe PyTorch version includes additional models, such as pre-activation ResNet, Wide ResNet, ResNeXt, and DenseNet.\n\n## Citation\nIf you use this code or pre-trained models, please cite the following:\n```\n@article{hara3dcnns,\n  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},\n  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},\n  journal={arXiv preprint},\n  volume={arXiv:1711.09577},\n  year={2017},\n}\n```\n\n## Pre-trained models\nPre-trained models are available at [releases](https://github.com/kenshohara/3D-ResNets/releases/tag/1.0).\n\n## Requirements\n* [Torch](http://torch.ch/)\n```\ngit clone https://github.com/torch/distro.git ~/torch --recursive\ncd ~/torch; bash install-deps;\n./install.sh\n```\n* [json package](https://github.com/clementfarabet/lua---json)\n```\nluarocks install json\n```\n* FFmpeg, FFprobe\n```\nwget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz\ntar xvf ffmpeg-release-64bit-static.tar.xz\ncd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;\n```\n* Python 3\n\n## Preparation\n### ActivityNet\n* Download datasets using official crawler codes\n* Convert from avi to jpg files using ```utils/video_jpg.py```\n```\npython utils/video_jpg.py avi_video_directory jpg_video_directory\n```\n* Generate fps files using ```utils/fps.py```\n```\npython utils/fps.py avi_video_directory jpg_video_directory\n```\n\n### Kinetics\n* Download datasets using official crawler codes\n  * Locate test set in ```video_directory/test```.\n* Convert from avi to jpg files using ```utils/video_jpg_kinetics.py```\n```\npython utils/video_jpg_kinetics.py avi_video_directory jpg_video_directory\n```\n* Generate n_frames files using ```utils/n_frames_kinetics.py```\n```\npython utils/n_frames_kinetics.py jpg_video_directory\n```\n* Generate annotation file in json format similar to ActivityNet using ```utils/kinetics_json.py```\n```\npython utils/kinetics_json.py train_csv_path val_csv_path test_csv_path json_path\n```\n\n## Running the code\nAssume the structure of data directories is the following:\n```\n~/\n  data/\n    activitynet_videos/\n      jpg/\n        .../ (directories of video names)\n          ... (jpg files)\n    kinetics_videos/\n      jpg/\n        .../ (directories of class names)\n          .../ (directories of video names)\n            ... (jpg files)\n    models/\n      resnet.t7\n    results/\n      model_100.t7\n    LR/\n      ActivityNet/\n        lr.lua\n      Kinetics/\n        lr.lua\n    kinetics.json\n    activitynet.json\n```\n\nConfirm all options.\n```\nth main.lua -h\n```\n\nTrain ResNets-34 on the Kinetics dataset (400 classes) with 4 CPU threads (for data loading) and 2 GPUs.  \nBatch size is 128.  \nSave models at every 5 epochs.\n```\nth main.lua --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \\\n--result_path results --lr_path LR/Kinetics/lr.lua --dataset kinetics --model resnet \\\n--resnet_depth 34 --n_classes 400 --batch_size 128 --n_gpu 2 --n_threads 4 --checkpoint 5\n```\n\nContinue Training from epoch 101. (~/data/results/model_100.t7 is loaded.)\n```\nth main.lua --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \\\n--result_path results --lr_path LR/Kinetics/lr.lua --dataset kinetics --begin_epoch 101 \\\n--batch_size 128 --n_gpu 2 --n_threads 4 --checkpoint 5\n```\n\nPerform recognition for each video of validation set using pretrained model.\nThis operation outputs top-10 labels for each video.\n```\nth main.lua --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \\\n--result_path results --premodel_path models/resnet.t7 --dataset kinetics \\\n--no_train --no_val --test_video --test_subset val --n_gpu 2 --n_threads 4\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenshohara%2F3d-resnets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkenshohara%2F3d-resnets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkenshohara%2F3d-resnets/lists"}