{"id":13807849,"url":"https://github.com/rohitgirdhar/ActionVLAD","last_synced_at":"2025-05-14T00:32:15.596Z","repository":{"id":146949587,"uuid":"84336558","full_name":"rohitgirdhar/ActionVLAD","owner":"rohitgirdhar","description":"ActionVLAD for video action classification (CVPR 2017)","archived":false,"fork":false,"pushed_at":"2019-02-14T22:45:12.000Z","size":13639,"stargazers_count":216,"open_issues_count":6,"forks_count":61,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-11-18T23:53:43.614Z","etag":null,"topics":["action-recognition","deep-learning","tensorflow","video-processing","video-understanding"],"latest_commit_sha":null,"homepage":"https://rohitgirdhar.github.io/ActionVLAD/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rohitgirdhar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-03-08T15:35:13.000Z","updated_at":"2024-08-05T06:24:39.000Z","dependencies_parsed_at":"2023-05-16T13:45:14.496Z","dependency_job_id":null,"html_url":"https://github.com/rohitgirdhar/ActionVLAD","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohitgirdhar%2FActionVLAD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohitgirdhar%2FActionVLAD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohitgirdhar%2FActionVLAD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rohitgirdhar%2FActionVLAD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rohitgirdhar","download_url":"https://codeload.github.com/rohitgirdhar/ActionVLAD/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254046440,"owners_count":22005595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["action-recognition","deep-learning","tensorflow","video-processing","video-understanding"],"created_at":"2024-08-04T01:01:31.204Z","updated_at":"2025-05-14T00:32:10.566Z","avatar_url":"https://github.com/rohitgirdhar.png","language":"Python","funding_links":[],"categories":["Video Understanding"],"sub_categories":[],"readme":"# [ActionVLAD: Learning spatio-temporal aggregation for action classification](https://rohitgirdhar.github.io/ActionVLAD/)\n\nIf this code helps with your work/research, please consider citing\n\nRohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic and Bryan Russell.\n**ActionVLAD: Learning spatio-temporal aggregation for action classification**.\nIn Conference on Computer Vision and Pattern Recognition (CVPR), 2017.\n\n```txt\n@inproceedings{Girdhar_17a_ActionVLAD,\n    title = {{ActionVLAD}: Learning spatio-temporal aggregation for action classification},\n    author = {Girdhar, Rohit and Ramanan, Deva and Gupta, Abhinav and Sivic, Josef and Russell, Bryan},\n    booktitle = {CVPR},\n    year = 2017\n}\n```\n\n## Updates\n\n- July 15, 2017: Released Charades models\n- May 7, 2017: First release\n\n## Quick Fusion\nIf you're only looking for our final last-layer features that can be combined with your method, we provide those\nfor the following datasets:\n\n1. HMDB51: `data/hmdb51/final_logits`\n2. Charades v1: `data/charadesV1/final_logits`\n\nNote: Be careful to re-organize them given our filename and class ordering.\n\n## Docker installation\n\nCreate docker_files folder where there should be the cudnn5.1 (include and lib) and also the models folder.\n```\n$ docker build -t action:latest .\n```\n\n\n## Pre-requisites\nThis code has been tested on a Linux (CentOS 6.5) system, though should be compatible with any OS running python and tensorflow.\n\n1. TensorFlow (0.12.0rc0)\n   - There have been breaking API changes in v1.0, so this code is not directly compatible with the latest tensorflow release. \n     You can try to use my pre-compiled [WHL file](https://cmu.box.com/shared/static/ayc9oeuwrmi5dnamdrz99n63bwnznely.whl).\n   - You may consider installing tensorflow into an environment. On anaconda, it can be done by:\n\n        ```bash\n        $ conda create --name tf_v0.12.0rc0\n        $ source activate tf_v0.12.0rc0\n        $ conda install pip  # need to install pip into this env,\n                             # else it will use global pip and overwrite your\n                             # main TF installation\n        $ pip install h5py  # and other libs, if need to be installed\n        $ git clone https://github.com/tensorflow/tensorflow.git\n        $ git checkout tags/0.12.0rc0\n        $ # Compile tensorflow. Refer https://www.tensorflow.org/install/install_sources\n        $ # If compiling on a CentOS (\u003c7) machine, you might find the following instructions useful:\n        $ # http://rohitgirdhar.github.io/tools/2016/12/23/compile-tf.html\n        $ pip install --upgrade --ignore-installed /path/to/tensorflow_pkg.whl\n        ```\n\n2. Standard python libraries\n   - pip\n   - scikit-learn 0.17.1\n   - h5py\n   - pickle, cPikcle etc\n\n\n## Quick Demo\n\nThis demo runs the RGB ActionVLAD model on a video. You will need the pretrained\nmodels, which can be downloaded using the `get_models.sh` script, as described later\nin this document.\n\n```bash\n$ cd demo\n$ bash run.sh \u003cvideo_path\u003e\n```\n\n## Setting up the data\n\n\nThe videos need to be stored on disk as individual frame JPEG files, and similarly for optical flow.\nThe list of train/test videos are specified by text files, similar to the one in\n`data/hmdb51/train_test_lists/train_split1.txt`. Each line consists of:\n\n```txt\nvideo_path number_of_frames class_id\n```\n\nSample train/test files are in `data/hmdb51/train_test_lists`. The frames must be named in format: `image_%05d.jpg`.\nFlow is stored similarly, with 2(n-1) files per video than the frames (n), named as `flow_%c_%05d.jpg`, where the\n`%c` corresponds to `x` and `y`. This follows the data style followed in\nvarious [previous works](http://yjxiong.me/others/action_recog/).\n\nNOTE: For HMDB51, I renamed the videos to avoid issues with special characters in the filenames,\nand hence the numbers in the train/test files.\nThe list of actual filenames is provided in `data/hmdb51/train_test_lists/AllVideos.txt`, and the new\nname for each video in that list is the 1-indexed line number of that video.\nThe `AllVideos_renamed.txt` contains all the HMDB videos that are a part of one or all of the train/test splits\n(it has fewer entries than `AllVideos.txt` because some videos are not in any split). So, the video `brush_hair/19`\nin that file (and in the train/test split files) would correspond to the line number 19 in `AllVideos.txt`.\n\nCreate soft links to the directories where the frames are stored as following, so the provided scripts work out-of-the-box.\n\n```bash\n$ ln -s /path/to/hmdb51/frames data/hmdb51/frames\n$ ln -s /path/to/hmdb51/flow data/hmdb51/flow\n```\n\nand so on. Since the code requires random access to this data\nwhile training, it is advisable to store the frames/flow on a\nfast disk/SSD.\n\nFor ease of reproduction, you can download our [frames](https://cmu.box.com/shared/static/i3q01shr30ziccf4b500g16t3podvsor.tgz) (`.tgz`, 9.3GB) and\n[optical flow](https://cmu.box.com/shared/static/prpeizkk9ohil8yx40cdlodp84u8ttva.tgz) (`.tgz`, 4.7GB) on HMDB51.\nOur UCF101 models should be compatible with the data provided with the [Good Practices](http://yjxiong.me/others/action_recog/) paper.\n\n\n### Charades Data\n\nCan be directly downloaded from [official website](http://allenai.org/plato/charades/).\nThis code assumes the [480px scaled frames](http://ai2-website.s3.amazonaws.com/data/Charades_v1_480.zip)\nto be stored at `data/charadesV1/frames`.\n\n## Testing pre-trained models\n\nDownload the models using `get_models.sh` script. Comment out specific lines\nto download a subset of models.\n\nTest all the models using the following scripts:\n\n```bash\n$ cd experiments\n$ bash ext_all_logits.sh  # Stores all the features for each split\n$ bash combine_streams.sh \u003csplit_id\u003e  # change split_id to get final number for each split.\n```\n\nThe above scripts (with provided models) should reproduce the following performance. The\niDT features are available from [Varol16]. You can also run these with the pre-computed\nfeatures provided in the `data/` folder.\n\n| Split  | RGB | Flow | Combined (1:2) | iDT[Varol16] | ActionVLAD+iDT |\n|--------|-----|------|----------|------|-----|\n| 1      | 51.4 | 59.0 | 66.7 |  56.7 | 70.1 |\n| 2      | 49.2 | 59.7 | 66.5 | 57.2 | 69.0 |\n| 3      | 48.6 | 60.6 | 66.3 | 57.8 | 70.1 |\n| Avg    | 49.7 | 59.8 | 66.5 | 57.2 | 69.7 |\n\nNOTE: There is very small difference (\u003c0.1%) in the final numbers above from what's reported in the paper.\nThis was due to an [undocumented behavior of tensorflow `tf.train.batch` functionality](https://github.com/tensorflow/tensorflow/issues/9441),\nwhich is slightly non-deterministic when used with multiple threads.\nThis can lead to some local shuffling in the order of videos at test time, which\nleads to inconsistent results when late-fusing different methods.\nThis has been fixed now by\nforcing the use of a single thread when saving features to the disk.\n\n### Charades testing\nCharades models were trained using a slightly different version of TF, so need a\nbit more work to test. Download the model data file as mentioned\nin the `get_data.sh` script (by default, it will download).\nThen,\n\n```bash\n$ cp models/PreTrained/ActionVLAD-pretrained/charadesV1/checkpoint.example models/PreTrained/ActionVLAD-pretrained/charadesV1/checkpoint\n$ vim models/PreTrained/ActionVLAD-pretrained/charadesV1/checkpoint\n$ # modify the file and replace the $BASE_DIR with the **absolute path** of where the ActionVLAD repository is cloned to\n$ # Now, for testing\n$ cd experiments \u0026\u0026 bash 006_InceptionV2TSN_RGB_Charades_eval.sh\n$ cd .. \u0026\u0026 bash eval/charades_eval.sh data/charadesV1/feats.h5\n```\n\nThe above should reproduce the following numbers:\n\n|      | mAP | wAP |\n|------|-----|-----|\n| ActionVLAD (RGB) | 17.66 | 25.17 |\n\n## Training\n\nNote that in the following training steps, RGB model is trained directly on top of ImageNet\ninitialization while the flow models are trained over the flow stream of a two-stream\nmodel. This is just because we found that training the last few layers in RGB\nstream (of a two-stream model) gets  good enough performance, so everything before and including conv5_3\nis left untouched to the imagenet initialization. Since we build our model\non top of conv5_3, we end up essentially training on top of ImageNet initialization.\n\n\n### RGB model\n\n```bash\n$ ### Initialization for ActionVLAD (KMeans)\n$ cd experiments\n$ bash 001_VGG_RGB_HMDB_netvlad_feats_for_clustering.sh  # extract random subset of features\n$ 001_VGG_RGB_HMDB_netvlad_cluster.sh  # cluster the features to initialize ActionVLAD\n$ ### Training the model\n$ bash 001_VGG_RGB_HMDB_netvlad_stage1.sh  # trains the last layer with fixed ActionVLAD\n$ bash 001_VGG_RGB_HMDB_netvlad_stage2.sh  # trains the last layer+actionVLAD+conv5\n$ bash 001_VGG_RGB_HMDB_netvlad_eval.sh  # evaluates the final trained model\n```\n\n### Flow model\n\n```bash\n$ ### Initialization for ActionVLAD (KMeans)\n$ cd experiments\n$ bash 001_VGG_Flow_HMDB_netvlad_feats_for_clustering.sh  # extract random subset of features\n$ 001_VGG_Flow_HMDB_netvlad_cluster.sh  # cluster the features to initialize ActionVLAD\n$ ### Training the model\n$ bash 001_VGG_Flow_HMDB_netvlad_stage1.sh  # trains the last layer with fixed ActionVLAD\n$ bash 001_VGG_Flow_HMDB_netvlad_stage2.sh  # trains the last layer+actionVLAD+conv5\n$ bash 001_VGG_Flow_HMDB_netvlad_eval.sh  # evaluates the final trained model\n```\n\n\n## Miscellaneous\n\n### Two-stream models\n\nThe following scripts run testing on the flow stream of our two-stream models.\nAs mentioned earlier, we didn't need a RGB stream model for ActionVLAD training\nsince we could train directly on top of ImageNet initialization.\n\n```bash\n$ cd experiments\n$ bash 005_VGG_Flow_HMDB_TestTwoStream.sh\n```\n\nYou can also train two-stream models using this code base. Here's a sample script\nto train a RGB stream (not tested, so might require playing around with hyperparameters):\n\n```bash\n$ cd experiments\n$ bash 005_VGG_RGB_HMDB_TrainTwoStream.sh\n$ bash 005_VGG_RGB_HMDB_TestTwoStream.sh\n```\n\n## References\n\n[Varol16]: Gul Varol, Ivan Laptev and Cordelia Schmid.\n[Long-term Convolutions for Action Recognition.](https://www.di.ens.fr/willow/research/ltc/)\narXiv 2016.\n\n\n## Acknowledgements\nThis code is based on the [tensorflow/models](https://github.com/tensorflow/models/tree/master/slim) repository,\nso thanks to the original authors/maintainers for releasing the code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohitgirdhar%2FActionVLAD","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frohitgirdhar%2FActionVLAD","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohitgirdhar%2FActionVLAD/lists"}