https://github.com/rodolfoferro/cv-demo
This repository contains a technical demo for an image classification system using a one-shot learning model.
https://github.com/rodolfoferro/cv-demo
Last synced: 4 months ago
JSON representation
This repository contains a technical demo for an image classification system using a one-shot learning model.
- Host: GitHub
- URL: https://github.com/rodolfoferro/cv-demo
- Owner: RodolfoFerro
- Created: 2025-09-10T06:19:56.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-10T14:32:20.000Z (9 months ago)
- Last Synced: 2025-09-10T18:37:38.019Z (9 months ago)
- Language: Jupyter Notebook
- Size: 3.43 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Image classification using One-Shot Learning Techniques
This repository contains a technical demo for an **Image Classification** system of CAD figures and tickets.

The goal of this project is to showcase a production-ready workflow that combines:
- **Computer vision (OpenCV)** for preprocessing and image handling.
- **Deep learning (TensorFlow)** for image classification.
- **Optical Character Recognition (OCR with pytesseract)** for text extraction.
- **Web service (Flask/FastAPI)** to expose inference via REST API.
## 📊 System Diagram
_A general overview of the system integrating the complete workflow:_

## 🧩 Workflow
The system is divided into several stages:
1. **Preprocessing (OpenCV).** The module `src/dataset.py` contains a function to prepare images, which includes an OpenCV pipeline that:
- Loads an image and converts it to grayscale from a given path.
- Binarizes the image using an Otsu threshold.
- Applies a morphological transformation (erosion, dilation or none).
- Resizes the image to a given size.
- Adds the dimension channel (1).
- Normalizes the image to range [0, 1].
- The dataset includes 2 classes:
- Cylinder
- Ticket (this ticket is used with `pytesseract`in a posterior process)
2. **Model training (TensorFlow).** The content for this task is conteined in several files:
- The module `src/fsl.py` includes a function to build a model that uses a Few-Shot Learning (FSL) technique with a Siamese Network for image classifiaction tasks.
- The model is trained using the `notebooks/Train Siamese Network.ipynb` Jupyter Notebook.
- The trained model was saved in the `models/siamese.weights.h5` file.
3. **OCR Extraction (Tesseract).** For this task, an integration with `pytesseeract`is added in the web service (API):
- If the classified images corresponds to the `ticket` class, then the OCR is run over the image.
4. **API Service (Flask/FastAPI)**
- `/` endpoint: Health check for monitoring.
- `/inference` endpoint: Receives an image and returns the resulting score.
- `/inference-ocr` endpoint: Receives an image and returns the resulting score, as well as the detected text.
## Future work/Improvements
- **Containerization (Docker)** for deployment in scalable environments.
- **Security** with an API bearer/auth token.
- **Deployment** in a production environment.
- **Model improvement** and extension to use more classes.
## Credits
The dataset used in this project was completely built using the following public datsets from Kaggle:
- [3D Geometric Objects in 2D plane (Sketch-like)](https://www.kaggle.com/datasets/breadzin/3d-geometric-objects-in-2d-plane-sketch-like)
- [Find it again! Dataset](https://www.kaggle.com/datasets/nikita2998/find-it-again-dataset)