https://github.com/fork123aniket/encoder-decoder-based-video-captioning
Implementation of Encoder-Decoder Model for Video Captioning in Tensorflow
https://github.com/fork123aniket/encoder-decoder-based-video-captioning
encoder-decoder encoder-decoder-model keras-model keras-tensorflow tensorflow video-caption video-captioning
Last synced: 6 months ago
JSON representation
Implementation of Encoder-Decoder Model for Video Captioning in Tensorflow
- Host: GitHub
- URL: https://github.com/fork123aniket/encoder-decoder-based-video-captioning
- Owner: fork123aniket
- License: mit
- Created: 2022-11-26T13:15:14.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-27T07:38:31.000Z (about 3 years ago)
- Last Synced: 2025-04-13T18:43:56.481Z (10 months ago)
- Topics: encoder-decoder, encoder-decoder-model, keras-model, keras-tensorflow, tensorflow, video-caption, video-captioning
- Language: Python
- Homepage:
- Size: 76.7 MB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Encoder-Decoder-based Video Captioning
This repository provides an ***Encoder-Decoder Sequence-to-Sequence*** model to generate captions for input videos. Moreover, ***pre-Trained VGG16 model*** is being used to extract features for every frame of the video.
The ability to be applied for numerous applications mark ***Video Captioning***'s importance. For example, it can be applied to help search videos across web pages in an efficient manner and it can also cluster the videos having a large degree of similarity in terms of their respective generated captions.
## Requirements
- `Tensorflow`
- `Keras`
- `OpenCV`
- `NumPy`
- `FuncTools`
## Usage
### Data
- The ***MSVD*** dataset developed by Microsoft can be downloaded from [***here***](https://www.dropbox.com/sh/whatkfg5mr4dr63/AACKCO3LwSsHK4_GOmHn4oyYa?dl=0).
- This data set contains 1450 short YouTube clips that have been manually labeled for training and 100 videos for testing.
- Each video has been assigned a unique ID and each ID has about 15–20 captions.
### Training and Testing
- To extract features for frames of every single input videos using pre-Trained VGG16 model, run `Extract_Features_Using_VGG.py`.
- To train the developed model, run `training_model.py`.
- To use the trained ***Video Captioning*** model for inference, run `predict_model.py`.
- To use the trained model for ***real-time Video-Caption generation***, run `Video_Captioning.py`.
## Results
Following are a few results of the developed ***Video Captioning*** approach on test videos:-
| Test Video | Generated Caption |
| ------------------- |:----------------------------:|
|  | a woman is mixing some food |
|
| a man is performing on a stage |
|  | a man is mixing ingredients in a bowl |
|
| a man is spreading a tortilla |
|
| a woman is seasoning some food |
|  | a cat is playing the piano |