https://github.com/zolppy/recommendation-system
This project demonstrates the creation of a content-based image recommendation system. It leverages a pre-trained VGG16 deep learning model to extract meaningful feature vectors from images. These features are then compared using cosine similarity to identify and recommend visually similar images.
https://github.com/zolppy/recommendation-system
computer-vision deep-learning keras machine-learning numpy recommendation-system sklearn tensorflow vgg16
Last synced: 3 months ago
JSON representation
This project demonstrates the creation of a content-based image recommendation system. It leverages a pre-trained VGG16 deep learning model to extract meaningful feature vectors from images. These features are then compared using cosine similarity to identify and recommend visually similar images.
- Host: GitHub
- URL: https://github.com/zolppy/recommendation-system
- Owner: zolppy
- License: mit
- Created: 2025-08-22T19:53:26.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-08-22T20:01:45.000Z (5 months ago)
- Last Synced: 2025-08-22T21:56:51.476Z (5 months ago)
- Topics: computer-vision, deep-learning, keras, machine-learning, numpy, recommendation-system, sklearn, tensorflow, vgg16
- Language: Jupyter Notebook
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Image Recommendation System using VGG16 and Cosine Similarity
This project demonstrates the creation of a content-based image recommendation system. It leverages a pre-trained VGG16 deep learning model to extract meaningful feature vectors from images. These features are then compared using cosine similarity to identify and recommend visually similar images.
## Table of Contents
- [Project Overview](#project-overview)
- [Dataset](#dataset)
- [Methodology](#methodology)
- [How to Run](#how-to-run)
- [Evaluation](#evaluation)
- [Dependencies](#dependencies)
## Project Overview
The core idea is to transform images into a high-dimensional space where their proximity represents visual similarity. A powerful, pre-trained Convolutional Neural Network (CNN) is used for this transformation. Once images are represented as numerical vectors, we can calculate the similarity between them and build a recommendation engine.
The notebook accomplishes the following:
1. Loads and preprocesses the `tf_flowers` image dataset.
2. Uses a pre-trained VGG16 model on ImageNet to extract feature vectors from all images.
3. Implements a recommendation function based on the cosine similarity between these feature vectors.
4. Evaluates the recommendation system's performance using an average precision metric.
## Dataset
The project uses the **`tf_flowers`** dataset, available through `tensorflow_datasets`.
- **Total Images:** 3,670
- **Number of Classes:** 5
- **Class Names:** `dandelion`, `daisy`, `tulips`, `sunflowers`, `roses`
The dataset is split as follows:
- **Training Set:** 80%
- **Validation Set:** 10%
- **Test Set:** 10%
## Methodology
The workflow is divided into four main stages:
1. **Data Loading and Preprocessing:**
- Images are loaded from the `tf_flowers` dataset.
- Each image is resized to $224 \\times 224$ pixels to match the input dimensions required by the VGG16 model.
- Pixel values are normalized from the `[0, 255]` range to `[0, 1]`.
- The datasets are batched for efficient processing.
2. **Feature Extraction:**
- A VGG16 model, pre-trained on the ImageNet dataset, is loaded without its final classification layer (`include_top=False`).
- This base model acts as a powerful feature extractor. Each image is passed through the network, and the output from the last convolutional block is flattened to produce a high-dimensional feature vector.
- This process is applied to all images in the training, validation, and test sets.
3. **Similarity Calculation:**
- **Cosine Similarity** is used to measure the similarity between the feature vectors of two images. It calculates the cosine of the angle between two vectors, providing a score between -1 and 1 (or 0 and 1 for non-negative vectors). A score closer to 1 indicates higher similarity.
- The formula for cosine similarity between two vectors $A$ and $B$ is:
$$\text{similarity} = \cos(\theta) = \frac{A \cdot B}{\|A\| \|B\|}$$
4. **Recommendation and Evaluation:**
- To test the system, random images are selected from the test set to act as queries.
- For each query image, its feature vector is compared against the feature vectors of all images in the training set.
- The images from the training set with the highest cosine similarity scores are returned as recommendations.
## How to Run
1. **Clone the repository or download the `main.ipynb` file.**
2. **Install the necessary dependencies.** You can install them using pip:
```bash
pip install numpy tensorflow tensorflow-datasets matplotlib scikit-learn
```
3. **Open and run the notebook.**
- Open the `main.ipynb` file in a Jupyter environment such as JupyterLab, Jupyter Notebook, or Google Colab.
- Execute the cells in sequential order. The notebook will automatically download the dataset, build the model, extract features, and run the evaluation.
## Evaluation
The performance of the recommendation system is evaluated using **Average Precision**.
For each query image from the test set:
1. The system retrieves the top 10 most similar images from the training set.
2. **Precision** is calculated as the proportion of these 10 recommended images that belong to the same class as the query image.
$$\text{Precision} = \frac{\text{Number of relevant recommendations}}{\text{Total number of recommendations}}$$
3. The **Average Precision** is the mean of the precision scores calculated for all query images. This final score provides a measure of the system's ability to retrieve visually similar and semantically relevant images.
The notebook's output includes the precision score for each query and the final average precision, along with a brief discussion of the result.
## Dependencies
- `Python 3.x`
- `numpy`
- `tensorflow`
- `tensorflow_datasets`
- `matplotlib`
- `scikit-learn`