https://github.com/rodolfoferro/cv-demo

This repository contains a technical demo for an image classification system using a one-shot learning model.
https://github.com/rodolfoferro/cv-demo

Last synced: 4 months ago
JSON representation

This repository contains a technical demo for an image classification system using a one-shot learning model.

Host: GitHub
URL: https://github.com/rodolfoferro/cv-demo
Owner: RodolfoFerro
Created: 2025-09-10T06:19:56.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-09-10T14:32:20.000Z (9 months ago)
Last Synced: 2025-09-10T18:37:38.019Z (9 months ago)
Language: Jupyter Notebook
Size: 3.43 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Image classification using One-Shot Learning Techniques

This repository contains a technical demo for an **Image Classification** system of CAD figures and tickets. 

![Overview](assets/samples.png)

The goal of this project is to showcase a production-ready workflow that combines:

- **Computer vision (OpenCV)** for preprocessing and image handling.

- **Deep learning (TensorFlow)** for image classification.

- **Optical Character Recognition (OCR with pytesseract)** for text extraction.

- **Web service (Flask/FastAPI)** to expose inference via REST API.

## 📊 System Diagram

_A general overview of the system integrating the complete workflow:_

![Overview](assets/overview.png)

## 🧩 Workflow

The system is divided into several stages:

1. **Preprocessing (OpenCV).** The module `src/dataset.py` contains a function to prepare images, which includes an OpenCV pipeline that:

    - Loads an image and converts it to grayscale from a given path.

    - Binarizes the image using an Otsu threshold.

    - Applies a morphological transformation (erosion, dilation or none).

    - Resizes the image to a given size.

    - Adds the dimension channel (1).

    - Normalizes the image to range [0, 1].

    - The dataset includes 2 classes:

        - Cylinder

        - Ticket (this ticket is used with `pytesseract`in a posterior process)

2. **Model training (TensorFlow).** The content for this task is conteined in several files:

    - The module `src/fsl.py` includes a function to build a model that uses a Few-Shot Learning (FSL) technique with a Siamese Network for image classifiaction tasks.

    - The model is trained using the `notebooks/Train Siamese Network.ipynb` Jupyter Notebook. 

    - The trained model was saved in the `models/siamese.weights.h5` file.

3. **OCR Extraction (Tesseract).**  For this task, an integration with `pytesseeract`is added in the web service (API):

   - If the classified images corresponds to the `ticket` class, then the OCR is run over the image.

4. **API Service (Flask/FastAPI)**  

   - `/` endpoint: Health check for monitoring.  

   - `/inference` endpoint: Receives an image and returns the resulting score.  

   - `/inference-ocr` endpoint: Receives an image and returns the resulting score, as well as the detected text.

## Future work/Improvements

- **Containerization (Docker)** for deployment in scalable environments.

- **Security** with an API bearer/auth token.

- **Deployment** in a production environment.

- **Model improvement** and extension to use more classes.

## Credits

The dataset used in this project was completely built using the following public datsets from Kaggle:

- [3D Geometric Objects in 2D plane (Sketch-like)](https://www.kaggle.com/datasets/breadzin/3d-geometric-objects-in-2d-plane-sketch-like)

- [Find it again! Dataset](https://www.kaggle.com/datasets/nikita2998/find-it-again-dataset)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rodolfoferro/cv-demo

Awesome Lists containing this project

README