Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kracwarlock/action-recognition-visual-attention
Action recognition using soft attention based deep recurrent neural networks
https://github.com/kracwarlock/action-recognition-visual-attention
action-recognition attention-mechanism deep deep-learning deep-neural-networks deeplearning paper soft-attention video
Last synced: 4 days ago
JSON representation
Action recognition using soft attention based deep recurrent neural networks
- Host: GitHub
- URL: https://github.com/kracwarlock/action-recognition-visual-attention
- Owner: kracwarlock
- Created: 2015-09-28T17:45:48.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-10-30T22:19:10.000Z (over 8 years ago)
- Last Synced: 2025-01-28T20:08:27.938Z (5 days ago)
- Topics: action-recognition, attention-mechanism, deep, deep-learning, deep-neural-networks, deeplearning, paper, soft-attention, video
- Language: Jupyter Notebook
- Homepage: http://www.cs.toronto.edu/~shikhar/projects/action-recognition-attention
- Size: 985 KB
- Stars: 350
- Watchers: 25
- Forks: 158
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Action Recognition using Visual Attention
We propose a soft attention based model for the task of action recognition in videos.
We use multi-layered Recurrent Neural Networks (RNNs) with Long-Short Term Memory
(LSTM) units which are deep both spatially and temporally. Our model learns to focus
selectively on parts of the video frames and classifies videos after taking a few
glimpses. The model essentially learns which parts in the frames are relevant for the
task at hand and attaches higher importance to them. We evaluate the model on UCF-11
(YouTube Action), HMDB-51 and Hollywood2 datasets and analyze how the model focuses its
attention depending on the scene and the action being performed.## Dependencies
* Python 2.7
* [NumPy](http://www.numpy.org/)
* [scikit learn](http://scikit-learn.org/stable/index.html)
* [skimage](http://scikit-image.org/docs/dev/api/skimage.html)
* [Theano](http://www.deeplearning.net/software/theano/)
* [h5py](http://docs.h5py.org/en/latest/)## Input data format
This is provided in [util/README.md](https://github.com/kracwarlock/action-recognition-visual-attention/blob/master/util/README.md)
## Reference
If you use this code as part of any published research, please acknowledge the
following papers:**"Action Recognition using Visual Attention."**
Shikhar Sharma, Ryan Kiros, Ruslan Salakhutdinov. *[arXiv](http://arxiv.org/abs/1511.04119)*@article{sharma2015attention,
title={Action Recognition using Visual Attention},
author={Sharma, Shikhar and Kiros, Ryan and Salakhutdinov, Ruslan},
journal={arXiv preprint arXiv:1511.04119},
year={2015}
}**"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention."**
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan
Salakhutdinov, Richard Zemel, Yoshua Bengio. *To appear ICML (2015)*@article{Xu2015show,
title={Show, Attend and Tell: Neural Image Caption Generation with Visual Attention},
author={Xu, Kelvin and Ba, Jimmy and Kiros, Ryan and Cho, Kyunghyun and Courville, Aaron and Salakhutdinov, Ruslan and Zemel, Richard and Bengio, Yoshua},
journal={arXiv preprint arXiv:1502.03044},
year={2015}
}## License
This repsoitory is released under a [revised (3-clause) BSD License](http://directory.fsf.org/wiki/License:BSD_3Clause). It
is the implementation for our paper [Action Recognition using Visual Attention](http://arxiv.org/abs/1511.04119). The repository uses some code from the project
[arctic-caption](https://github.com/kelvinxu/arctic-captions) which is originally the implementation for the paper
[Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](http://arxiv.org/abs/1502.03044) and is also licensed
under a [revised (3-clause) BSD License](http://directory.fsf.org/wiki/License:BSD_3Clause).