https://github.com/mohamedelareeg/documentclassificationzonalocr

Document Classification with Zonal OCR streamlines document processing by automating the categorization and extraction of information from various types of documents. By leveraging advanced OCR techniques and image processing capabilities, the system offers a reliable solution for businesses dealing with large volumes of documents.
https://github.com/mohamedelareeg/documentclassificationzonalocr

dotnet image-classification image-processing ocr serilog webapi zonal-ocr

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/mohamedelareeg/documentclassificationzonalocr
Owner: mohamedelareeg
License: mit
Created: 2024-05-31T20:31:35.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-06-05T18:25:36.000Z (about 2 years ago)
Last Synced: 2025-01-12T17:27:29.968Z (over 1 year ago)
Topics: dotnet, image-classification, image-processing, ocr, serilog, webapi, zonal-ocr
Language: JavaScript
Homepage:
Size: 3.77 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Document Classification with Zonal OCR

To watch a demo of the project, click [here](https://www.youtube.com/watch?v=zibVcrxsx9c).

[![Watch the video](https://img.youtube.com/vi/zibVcrxsx9c/0.jpg)](https://www.youtube.com/watch?v=zibVcrxsx9c)

This project is a .NET Core 8 web application that facilitates document classification and Zonal OCR (Optical Character Recognition). It allows users to define document types, create fields within these documents (indexing fields), upload sample documents for training, define anchor points to assist in document detection, and perform OCR to extract text from documents based on the defined fields.

## Features

### Document Classification
- **Form Creation:** Users can create document types/forms by defining various fields within them.
- **Anchor Points:** Anchor points can be added to assist in document detection. These anchor points are used as reference points to guide the OCR process.
- **Rectangle Placement:** Rectangles can be added to uploaded images to specify the locations of indexing fields within the document.

### OCR (Optical Character Recognition)
- **OpenCV Integration:** Utilizes OpenCV for image processing and assisting in the OCR process.
- **Field Mapping:** Extracted text is assigned to the corresponding indexing fields based on predefined mappings.

## Project Structure

The project follows a typical ASP.NET Core web application structure:
- **DocumentClassificationZonalOcr.Api:** Contains the API endpoints for interacting with the application.
- **DocumentClassificationZonalOcr.MVC:** Provides the user interface for interacting with the application.
- **DocumentClassificationZonalOcr.Shared:** Contains shared code and utilities used across the solution.

## Dependencies

- **Microsoft.EntityFrameworkCore:** ORM for database interactions.
- **OpenCvSharp4:** OpenCV wrapper for image processing tasks.
- **Serilog.AspNetCore:** Logging framework for ASP.NET Core applications.
- **SixLabors.ImageSharp:** Image processing library.
- **Tesseract:** OCR engine for text extraction from images.

## Usage

To use the application:
1. Clone the repository.
2. Open the solution file (`DocumentClassificationZonalOcr.sln`) in Visual Studio.
3. Ensure the necessary dependencies are installed.
4. Build and run the application.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mohamedelareeg/documentclassificationzonalocr

Awesome Lists containing this project

README