{"id":13653394,"url":"https://github.com/mit-han-lab/temporal-shift-module","last_synced_at":"2025-05-15T07:05:17.136Z","repository":{"id":47127826,"uuid":"178086202","full_name":"mit-han-lab/temporal-shift-module","owner":"mit-han-lab","description":"[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding","archived":false,"fork":false,"pushed_at":"2024-07-11T18:54:08.000Z","size":251,"stargazers_count":2104,"open_issues_count":100,"forks_count":420,"subscribers_count":41,"default_branch":"master","last_synced_at":"2025-04-11T15:57:14.031Z","etag":null,"topics":["acceleration","efficient-model","low-latency","nvidia-jetson-nano","temporal-modeling","tsm","video-understanding"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1811.08383","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mit-han-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-27T22:49:33.000Z","updated_at":"2025-04-07T13:39:11.000Z","dependencies_parsed_at":"2022-08-12T13:11:49.698Z","dependency_job_id":"2efb7e7c-5f5c-4adb-8a82-fadbde2ef00e","html_url":"https://github.com/mit-han-lab/temporal-shift-module","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Ftemporal-shift-module","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Ftemporal-shift-module/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Ftemporal-shift-module/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Ftemporal-shift-module/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mit-han-lab","download_url":"https://codeload.github.com/mit-han-lab/temporal-shift-module/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254292040,"owners_count":22046426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acceleration","efficient-model","low-latency","nvidia-jetson-nano","temporal-modeling","tsm","video-understanding"],"created_at":"2024-08-02T02:01:09.850Z","updated_at":"2025-05-15T07:05:12.128Z","avatar_url":"https://github.com/mit-han-lab.png","language":"Python","readme":"# TSM: Temporal Shift Module for Efficient Video Understanding [[Website]](https://hanlab18.mit.edu/projects/tsm/) [[arXiv]](https://arxiv.org/abs/1811.08383)[[Demo]](https://www.youtube.com/watch?v=0T6u7S_gq-4)\n\n```\n@inproceedings{lin2019tsm,\n  title={TSM: Temporal Shift Module for Efficient Video Understanding},\n  author={Lin, Ji and Gan, Chuang and Han, Song},\n  booktitle={Proceedings of the IEEE International Conference on Computer Vision},\n  year={2019}\n} \n```\n\n## News\n\n- TSM is featured by [MIT News](http://news.mit.edu/2019/faster-video-recognition-smartphone-era-1011) / [MIT Technology Review](https://www.technologyreview.com/f/614551/ai-computer-vision-algorithms-on-your-phone-mit-ibm/) / [WIRED](https://www.wired.com/story/technique-easier-ai-understand-videos/) / [Engadget](https://www.engadget.com/2019/10/09/mit-ibm-machine-learning-faster-video-recognition/?guccounter=1\u0026guce_referrer=aHR0cHM6Ly90LmNvL3hQSHBUMlJtdXc_YW1wPTE\u0026guce_referrer_sig=AQAAAMPjElPjCfQqcJfbfckoSUJnh3OuqTR0KC_Z6S8-3h4ruHQ2z2RA5uiy_RQPVGmDJ8JghLtfI4XH0gIQr9-UlAQuA_4MJwfEEY9GMq6Tl8YolX6AVBlObRlvSMQ2M35zqGnzhp7-Av5dyfUUBxJQhH7Zo8Y_p9uOkhgU_FKl9oYB) / [NVIDIA News](https://news.developer.nvidia.com/new-mit-video-recognition-model-dramatically-improves-latency-on-edge-devices/?ncid=em-news-24390\u0026mkt_tok=eyJpIjoiWm1JeU9UVTBNVGRpT1RVeCIsInQiOiJCVXIyUkhsdUFtcFBNY1NoTElpUytUOHJnMjdFN2pUTGY4UWpHMEZGQXNSRHRJUmxJMXpFa0FyOGF5Zk1US0NLMWZ1SU90anRiN3lCU0xGOWNNajdTazB4ajFVK2g4RnBxYXpiVFZLSWFKRzFkSURZZ0pGUVdodUYwek1vT2NSWiJ9#cid=dlz_em-news_en-us) \n- **(09/2020)** We update the environment setup for the `online_demo`, and should be much easier to set up. Check the folder for a try!\n\n- **(01/2020)** We have released the pre-trained **optical flow** model on Kinetics. We believe the pre-trained weight will help the training of two-stream models on other datasets.\n- **(10/2019)** We scale up the training of the TSM model to 1,536 GPUs, finishing Kinetics pre-training in 15 minutes. See tech report [here](https://arxiv.org/abs/1910.00932).\n\n- **(09/2019)** We have released the code of online hand gesture recognition on NVIDIA Jeston Nano. It can achieve real-time recognition at only 8 watts. See [`online_demo`](online_demo) folder for the details. [[Full Video]](https://hanlab18.mit.edu/projects/tsm/#live_demo)\n\n![tsm-demo](https://hanlab18.mit.edu/projects/tsm/external/tsm-demo2.gif)\n\n## Overview\n\nWe release the PyTorch code of the [Temporal Shift Module](https://arxiv.org/abs/1811.08383).\n\n![framework](https://hanlab18.mit.edu/projects/tsm/external/TSM-module.png)\n\n## Content\n\n- [Prerequisites](#prerequisites)\n- [Data Preparation](#data-preparation)\n- [Code](#code)\n- [Pretrained Models](#pretrained-models)\n  * [Kinetics-400](#kinetics-400)\n    + [Dense Sample](#dense-sample)\n    + [Unifrom Sampling](#unifrom-sampling)\n  * [Something-Something](#something-something)\n    + [Something-Something-V1](#something-something-v1)\n    + [Something-Something-V2](#something-something-v2)\n- [Testing](#testing)\n- [Training](#training)\n- [Live Demo on NVIDIA Jetson Nano](#live-demo-on-nvidia-jetson-nano)\n\n## Prerequisites\n\nThe code is built with following libraries:\n\n- [PyTorch](https://pytorch.org/) 1.0 or higher\n- [TensorboardX](https://github.com/lanpa/tensorboardX)\n- [tqdm](https://github.com/tqdm/tqdm.git)\n- [scikit-learn](https://scikit-learn.org/stable/)\n\nFor video data pre-processing, you may need [ffmpeg](https://www.ffmpeg.org/).\n\n## Data Preparation\n\nWe need to first extract videos into frames for fast reading. Please refer to [TSN](https://github.com/yjxiong/temporal-segment-networks) repo for the detailed guide of data pre-processing.\n\nWe have successfully trained on [Kinetics](https://deepmind.com/research/open-source/open-source-datasets/kinetics/), [UCF101](http://crcv.ucf.edu/data/UCF101.php), [HMDB51](http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/), [Something-Something-V1](https://20bn.com/datasets/something-something/v1) and [V2](https://20bn.com/datasets/something-something/v2), [Jester](https://20bn.com/datasets/jester) datasets with this codebase. Basically, the processing of video data can be summarized into 3 steps:\n\n- Extract frames from videos (refer to [tools/vid2img_kinetics.py](tools/vid2img_kinetics.py) for Kinetics example and [tools/vid2img_sthv2.py](tools/vid2img_sthv2.py) for Something-Something-V2 example)\n- Generate annotations needed for dataloader (refer to [tools/gen_label_kinetics.py](tools/gen_label_kinetics.py) for Kinetics example, [tools/gen_label_sthv1.py](tools/gen_label_sthv1.py) for Something-Something-V1 example, and [tools/gen_label_sthv2.py](tools/gen_label_sthv2.py) for Something-Something-V2 example)\n- Add the information to [ops/dataset_configs.py](ops/dataset_configs.py)\n\n## Code\n\nThis code is based on the [TSN](https://github.com/yjxiong/temporal-segment-networks) codebase. The core code to implement the Temporal Shift Module is [ops/temporal_shift.py](ops/temporal_shift.py). It is a plug-and-play module to enable temporal reasoning, at the cost of *zero parameters* and *zero FLOPs*.\n\nHere we provide a naive implementation of TSM. It can be implemented with just several lines of code:\n\n```python\n# shape of x: [N, T, C, H, W] \nout = torch.zeros_like(x)\nfold = c // fold_div\nout[:, :-1, :fold] = x[:, 1:, :fold]  # shift left\nout[:, 1:, fold: 2 * fold] = x[:, :-1, fold: 2 * fold]  # shift right\nout[:, :, 2 * fold:] = x[:, :, 2 * fold:]  # not shift\nreturn out\n```\n\nNote that the naive implementation involves large data copying and increases memory consumption during training. It is suggested to use the **in-place** version of TSM to improve speed (see [ops/temporal_shift.py](ops/temporal_shift.py) Line 12 for the details.)\n\n## Pretrained Models\n\nTraining video models is computationally expensive. Here we provide some of the pretrained models. The accuracy might vary a little bit compared to the paper, since we re-train some of the models.\n\n### Kinetics-400\n\n#### Dense Sample\n\nIn the latest version of our paper, we reported the results of TSM trained and tested with **I3D dense sampling** (Table 1\u00264, 8-frame and 16-frame), using the same training and testing hyper-parameters as in [Non-local Neural Networks](https://arxiv.org/abs/1711.07971) paper to directly compare with I3D. \n\nWe compare the I3D performance reported in Non-local paper:\n\n| method          | n-frame      | Kinetics Acc. |\n| --------------- | ------------ | ------------- |\n| I3D-ResNet50    | 32 * 10clips | 73.3%         |\n| TSM-ResNet50    | 8 * 10clips  | **74.1%**     |\n| I3D-ResNet50 NL | 32 * 10clips | 74.9%         |\n| TSM-ResNet50 NL | 8 * 10clips  | **75.6%**     |\n\nTSM outperforms I3D under the same dense sampling protocol. NL TSM model also achieves better performance than NL I3D model. Non-local module itself improves the accuracy by 1.5%.\n\nHere is a list of pre-trained models that we provide (see Table 3 of the paper). The accuracy is tested using full resolution setting following [here](https://github.com/facebookresearch/video-nonlocal-net). The list is keeping updating.\n\n| model             | n-frame     | Kinetics Acc. | checkpoint                                                   | test log                                                     |\n| ----------------- | ----------- | ------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| TSN ResNet50 (2D) | 8 * 10clips | 70.6%         | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_RGB_resnet50_avg_segment5_e50.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_TSM_kinetics_RGB_resnet50_avg_segment5_e50.log) |\n| TSM ResNet50      | 8 * 10clips | 74.1%         | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense.log) |\n| TSM ResNet50 NL   | 8 * 10clips | 75.6%         | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense_nl.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense_nl.log) |\n| TSM ResNext101    | 8 * 10clips | 76.3%         | TODO                                                         | TODO                                                         |\n| TSM MobileNetV2    | 8 * 10clips | 69.5%         | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_RGB_mobilenetv2_shift8_blockres_avg_segment8_e100_dense.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_TSM_kinetics_RGB_mobilenetv2_shift8_blockres_avg_segment8_e100_dense.log) |\n\n#### Uniform Sampling\n\nWe also provide the checkpoints of TSN and TSM models using **uniform sampled frames** as in [Temporal Segment Networks](\u003chttps://arxiv.org/abs/1608.00859\u003e) paper, which is more sample efficient and very useful for fine-tuning on other datasets. Our TSM module improves consistently over the TSN baseline.\n\n| model             | n-frame    | acc (1-crop) | acc (10-crop) | checkpoint                                                   | test log                                                     |\n| ----------------- | ---------- | ------------ | ------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| TSN ResNet50 (2D) | 8 * 1clip  | 68.8%        | 69.9%         | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_RGB_resnet50_avg_segment5_e50.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_uniform_TSM_kinetics_RGB_resnet50_avg_segment5_e50.log) |\n| TSM ResNet50      | 8 * 1clip  | 71.2%        | 72.8%         | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.log) |\n| TSM ResNet50      | 16 * 1clip | 72.6%        | 73.7%         | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment16_e50.pth) | -                                                            |\n\n#### Optical Flow\n\nWe provide the optical flow model pre-trained on Kinetics. The model is trained using uniform sampling. We did not carefully tune the training hyper-parameters. Therefore, the model is intended for transfer learning on other datasets but not for performance evaluation.\n\n| model        | n-frame   | top-1 acc | top-5 acc | checkpoint                                                   | test log |\n| ------------ | --------- | --------- | --------- | ------------------------------------------------------------ | -------- |\n| TSM ResNet50 | 8 * 1clip | 55.7%     | 79.5%     | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_kinetics_Flow_resnet50_shift8_blockres_avg_segment8_e50.pth) | -        |\n\n### Something-Something\n\nSomething-Something [V1](https://20bn.com/datasets/something-something/v1)\u0026[V2](https://20bn.com/datasets/something-something) datasets are highly temporal-related. TSM achieves state-of-the-art performnace on the datasets: TSM achieves the **first place** on V1 (50.72% test acc.) and **second place** on V2 (66.55% test acc.), using just ResNet-50 backbone (as of 09/28/2019).\n\nHere we provide some of the models on the dataset. The accuracy is tested using both efficient setting (center crop * 1clip) and accuate setting ([full resolution](https://github.com/facebookresearch/video-nonlocal-net) * 2clip)\n\n##### Something-Something-V1\n\n| model         | n-frame | acc (center crop * 1clip) | acc (full res * 2clip) | checkpoint                                                   | test log                                                     |\n| ------------- | ------- | ------------------------- | ---------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| TSM ResNet50  | 8       | 45.6                      | 47.2                   | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e45.pth) | [link1](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_1clip_TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e45.log) [link2](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_2clip_TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e45.log) |\n| TSM ResNet50  | 16      | 47.2                      | 48.4                   | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_something_RGB_resnet50_shift8_blockres_avg_segment16_e45.pth) | [link1](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_1clip_TSM_something_RGB_resnet50_shift8_blockres_avg_segment16_e45.log) [link2](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_2clip_TSM_something_RGB_resnet50_shift8_blockres_avg_segment16_e45.log) |\n| TSM ResNet101 | 8       | 46.9                      | 48.7                   | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_something_RGB_resnet101_shift8_blockres_avg_segment8_e45.pth) | [link1](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_1clip_TSM_something_RGB_resnet101_shift8_blockres_avg_segment8_e45.log) [link2](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_2clip_TSM_something_RGB_resnet101_shift8_blockres_avg_segment8_e45.log) |\n\n#### Something-Something-V2\n\nOn V2 dataset, the accuracy is reported under the accurate setting (full resolution * 2clip).\n\n| model         | n-frame   | accuracy | checkpoint                                                   | test log                                                     |\n| ------------- | --------- | -------- | ------------------------------------------------------------ | ------------------------------------------------------------ |\n| TSM ResNet50  | 8 * 2clip | 61.2     | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_somethingv2_RGB_resnet50_shift8_blockres_avg_segment8_e45.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_2clip_TSM_somethingv2_RGB_resnet50_shift8_blockres_avg_segment8_e45.log) |\n| TSM ResNet50  | 16 * 2lip | 63.1     | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_somethingv2_RGB_resnet50_shift8_blockres_avg_segment16_e45.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_2clip_TSM_somethingv2_RGB_resnet50_shift8_blockres_avg_segment16_e45.log) |\n| TSM ResNet101 | 8 * 2clip | 63.3     | [link](https://hanlab18.mit.edu/projects/tsm/models/TSM_somethingv2_RGB_resnet101_shift8_blockres_avg_segment8_e45.pth) | [link](https://hanlab18.mit.edu/projects/tsm/models/log/testlog_2clip_TSM_somethingv2_RGB_resnet101_shift8_blockres_avg_segment8_e45.log) |\n\n## Testing \n\nFor example, to test the downloaded pretrained models on Kinetics, you can run `scripts/test_tsm_kinetics_rgb_8f.sh`. The scripts will test both TSN and TSM on 8-frame setting by running:\n\n```bash\n# test TSN\npython test_models.py kinetics \\\n    --weights=pretrained/TSM_kinetics_RGB_resnet50_avg_segment5_e50.pth \\\n    --test_segments=8 --test_crops=1 \\\n    --batch_size=64\n\n# test TSM\npython test_models.py kinetics \\\n    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth \\\n    --test_segments=8 --test_crops=1 \\\n    --batch_size=64\n```\n\nChange to `--test_crops=10` for 10-crop evaluation. With the above scripts, you should get around 68.8% and 71.2% results respectively.\n\nTo get the Kinetics performance of our dense sampling model under Non-local protocol, run:\n\n```bash\n# test TSN using non-local testing protocol\npython test_models.py kinetics \\\n    --weights=pretrained/TSM_kinetics_RGB_resnet50_avg_segment5_e50.pth \\\n    --test_segments=8 --test_crops=3 \\\n    --batch_size=8 --dense_sample --full_res\n\n# test TSM using non-local testing protocol\npython test_models.py kinetics \\\n    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense.pth \\\n    --test_segments=8 --test_crops=3 \\\n    --batch_size=8 --dense_sample --full_res\n\n# test NL TSM using non-local testing protocol\npython test_models.py kinetics \\\n    --weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_dense_nl.pth \\\n    --test_segments=8 --test_crops=3 \\\n    --batch_size=8 --dense_sample --full_res\n```\n\nYou should get around 70.6%, 74.1%, 75.6% top-1 accuracy, as shown in Table 1.\n\nFor the efficient (center crop and 1 clip) and accurate setting (full resolution and 2 clip) on Something-Something, you can try something like this:\n\n```bash\n# efficient setting: center crop and 1 clip\npython test_models.py something \\\n    --weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e45.pth \\\n    --test_segments=8 --batch_size=72 -j 24 --test_crops=1\n\n# accurate setting: full resolution and 2 clips (--twice sample)\npython test_models.py something \\\n    --weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e45.pth \\\n    --test_segments=8 --batch_size=72 -j 24 --test_crops=3  --twice_sample\n```\n\n## Training \n\nWe provided several examples to train TSM with this repo:\n\n- To train on Kinetics from ImageNet pretrained models, you can run `scripts/train_tsm_kinetics_rgb_8f.sh`, which contains:\n\n  ```bash\n  # You should get TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth\n  python main.py kinetics RGB \\\n       --arch resnet50 --num_segments 8 \\\n       --gd 20 --lr 0.02 --wd 1e-4 --lr_steps 20 40 --epochs 50 \\\n       --batch-size 128 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \\\n       --shift --shift_div=8 --shift_place=blockres --npb\n  ```\n\n  You should get `TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth` as downloaded above. Notice that you should scale up the learning rate with batch size. For example, if you use a batch size of 256 you should set learning rate to 0.04.\n\n- After getting the Kinetics pretrained models, we can fine-tune on other datasets using the Kinetics pretrained models. For example, we can fine-tune 8-frame Kinetics pre-trained model on UCF-101 dataset using **uniform sampling** by running:\n\n  ```\n  python main.py ucf101 RGB \\\n       --arch resnet50 --num_segments 8 \\\n       --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \\\n       --batch-size 64 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1 \\\n       --shift --shift_div=8 --shift_place=blockres \\\n       --tune_from=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e50.pth\n  ```\n\n- To train on Something-Something dataset (V1\u0026V2), using ImageNet pre-training is usually better:\n\n  ```bash\n  python main.py something RGB \\\n       --arch resnet50 --num_segments 8 \\\n       --gd 20 --lr 0.01 --lr_steps 20 40 --epochs 50 \\\n       --batch-size 64 -j 16 --dropout 0.5 --consensus_type=avg --eval-freq=1 \\\n       --shift --shift_div=8 --shift_place=blockres --npb\n  ```\n\n## Live Demo on NVIDIA Jetson Nano\n\nWe have build an online hand gesture recognition demo using our TSM. The model is built with MobileNetV2 backbone and trained on Jester dataset. \n\n- Recorded video of the live demo [[link]](https://hanlab18.mit.edu/projects/tsm/#live_demo)\n- Code of the live demo and set up tutorial:  [`online_demo`](online_demo) \n\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Ftemporal-shift-module","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmit-han-lab%2Ftemporal-shift-module","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Ftemporal-shift-module/lists"}