Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iamshnoo/udacity-cvnd-p2---image-captioning
Uses a CNN Encoder and a RNN Decoder to generate captions for input images.
https://github.com/iamshnoo/udacity-cvnd-p2---image-captioning
Last synced: 8 days ago
JSON representation
Uses a CNN Encoder and a RNN Decoder to generate captions for input images.
- Host: GitHub
- URL: https://github.com/iamshnoo/udacity-cvnd-p2---image-captioning
- Owner: iamshnoo
- License: mit
- Created: 2019-03-21T13:53:35.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-31T21:29:15.000Z (about 4 years ago)
- Last Synced: 2023-03-07T11:33:48.012Z (over 1 year ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 2.75 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# UDACITY-CVND-P2---Image-Captioning
[![Udacity Computer Vision Nanodegree](http://tugan0329.bitbucket.io/imgs/github/cvnd.svg)](https://www.udacity.com/course/computer-vision-nanodegree--nd891)Uses a CNN Encoder and a RNN Decoder to generate captions for input images.
The Project has been reviewed by Udacity and graded Meets Specifications.Here's a sumary of the steps involved.
- Dataset used is the COCO data set by Microsoft.
- Feature vectors for images are generated using a CNN based on the ResNet architecture by Google.
- Word embeddings are generated from captions for training images. NLTK was used for working with processing of captions.
- Implemented an RNN decoder using LSTM cells.
- Trained the network for nearly 3 hrs using GPU to achieve average loss of about 2%.
- Obtained outputs for some test images to understand efficiency of the trained network.![Alt](https://raw.githubusercontent.com/udacity/CVND---Image-Captioning-Project/master/images/encoder-decoder.png)