Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shahardekel/image-captions-predictions
In this mini-project I'll take an image and their possible captions from the Flickr8k library and predict captions on images.
https://github.com/shahardekel/image-captions-predictions
Last synced: about 17 hours ago
JSON representation
In this mini-project I'll take an image and their possible captions from the Flickr8k library and predict captions on images.
- Host: GitHub
- URL: https://github.com/shahardekel/image-captions-predictions
- Owner: shahardekel
- Created: 2023-09-03T11:44:37.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-03T12:07:51.000Z (about 1 year ago)
- Last Synced: 2023-09-04T14:40:29.562Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 4.26 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Image-Captions-Predictions
In this mini-project I'll take an image and their possible captions from the Flickr8k library and predict captions on images.Using libraries: numpy, pandas, keras, pickle, tensorflow, matplotlib, nltk and more.
First, I'll clean the raw data captions, and create a unique vocabulry of possible words.
The train data will be the images and their captions, and the testing data wll be only the images.
For the image preprocess I'll use ResNet50- a model to extract features from the train images, and than encode the images to one numerical vector.
For the caption preprocess, I'll create a mapping between the encoded vetor to its translation into a sequence of words, and then use word embeddings- convert each word to a vector using the GloVe algorithm.
The unique words will be loaded to the predictiong model in a form of an embedding matrix.
After all that, I'll create a predictive model with 2 parts- image and caption, that will be able to predict a probability for each word suggested to be in the caption for the image.
To sum it up, I'll evaluate my model with the BLEU score- an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another, with different weights (1-gram, 2-gram, 3-gram and 4-gram).
Image dataset- https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip
GloVe algorithm (download as txt file- https://www.kaggle.com/datasets/watts2/glove6b50dtxt