https://github.com/shahardekel/image-captions-predictions

In this mini-project I'll take an image and their possible captions from the Flickr8k library and predict captions on images.
https://github.com/shahardekel/image-captions-predictions

Last synced: 4 months ago
JSON representation

In this mini-project I'll take an image and their possible captions from the Flickr8k library and predict captions on images.

Host: GitHub
URL: https://github.com/shahardekel/image-captions-predictions
Owner: shahardekel
Created: 2023-09-03T11:44:37.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-03T12:07:51.000Z (almost 2 years ago)
Last Synced: 2024-12-30T20:15:49.715Z (6 months ago)
Language: Jupyter Notebook
Size: 4.26 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Image-Captions-Predictions
In this mini-project I'll take an image and their possible captions from the Flickr8k library and predict captions on images.

Using libraries: numpy, pandas, keras, pickle, tensorflow, matplotlib, nltk and more.

First, I'll clean the raw data captions, and create a unique vocabulry of possible words.

The train data will be the images and their captions, and the testing data wll be only the images.

For the image preprocess I'll use ResNet50- a model to extract features from the train images, and than encode the images to one numerical vector.

For the caption preprocess, I'll create a mapping between the encoded vetor to its translation into a sequence of words, and then use word embeddings- convert each word to a vector using the GloVe algorithm.

The unique words will be loaded to the predictiong model in a form of an embedding matrix.

After all that, I'll create a predictive model with 2 parts- image and caption, that will be able to predict a probability for each word suggested to be in the caption for the image.

To sum it up, I'll evaluate my model with the BLEU score- an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another, with different weights (1-gram, 2-gram, 3-gram and 4-gram).

Image dataset- https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip

GloVe algorithm (download as txt file- https://www.kaggle.com/datasets/watts2/glove6b50dtxt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shahardekel/image-captions-predictions

Awesome Lists containing this project

README