https://github.com/kenshohara/video-classification-3d-cnn-pytorch

Video classification tools using 3D ResNet
https://github.com/kenshohara/video-classification-3d-cnn-pytorch

action-recognition computer-vision computer-vision-tools deep-learning python pytorch video-classification

Last synced: 3 months ago
JSON representation

Video classification tools using 3D ResNet

Host: GitHub
URL: https://github.com/kenshohara/video-classification-3d-cnn-pytorch
Owner: kenshohara
License: mit
Created: 2017-09-21T04:48:33.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2018-11-23T14:26:39.000Z (over 6 years ago)
Last Synced: 2025-04-04T00:09:50.761Z (3 months ago)
Topics: action-recognition, computer-vision, computer-vision-tools, deep-learning, python, pytorch, video-classification
Language: Python
Size: 154 KB
Stars: 1,119
Watchers: 18
Forks: 260
Open Issues: 37
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Video Classification Using 3D ResNet

This is a pytorch code for video (action) classification using 3D ResNet trained by [this code](https://github.com/kenshohara/3D-ResNets-PyTorch).  

The 3D ResNet is trained on the Kinetics dataset, which includes 400 action classes.  

This code uses videos as inputs and outputs class names and predicted class scores for each 16 frames in the score mode.  

In the feature mode, this code outputs features of 512 dims (after global average pooling) for each 16 frames.  

**Torch (Lua) version of this code is available [here](https://github.com/kenshohara/video-classification-3d-cnn).**

## Requirements

* [PyTorch](http://pytorch.org/)

```

conda install pytorch torchvision cuda80 -c soumith

```

* FFmpeg, FFprobe

```

wget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz

tar xvf ffmpeg-release-64bit-static.tar.xz

cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;

```

* Python 3

## Preparation

* Download this code.

* Download the [pretrained model](https://drive.google.com/drive/folders/1zvl89AgFAApbH0At-gMuZSeQB_LpNP-M?usp=sharing).  

  * ResNeXt-101 achieved the best performance in our experiments. (See [paper](https://arxiv.org/abs/1711.09577) in details.)

## Usage

Assume input video files are located in ```./videos```.

To calculate class scores for each 16 frames, use ```--mode score```.

```

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode score

```

To visualize the classification results, use ```generate_result_video/generate_result_video.py```.

To calculate video features for each 16 frames, use ```--mode feature```.

```

python main.py --input ./input --video_root ./videos --output ./output.json --model ./resnet-34-kinetics.pth --mode feature

```

## Citation

If you use this code, please cite the following:

```

@article{hara3dcnns,

  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},

  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},

  journal={arXiv preprint},

  volume={arXiv:1711.09577},

  year={2017},

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kenshohara/video-classification-3d-cnn-pytorch

Awesome Lists containing this project

README