https://github.com/robertknight/ocrs-models

PyTorch models for the ocrs OCR engine
https://github.com/robertknight/ocrs-models

ocr

Last synced: 5 months ago
JSON representation

PyTorch models for the ocrs OCR engine

Host: GitHub
URL: https://github.com/robertknight/ocrs-models
Owner: robertknight
Created: 2022-06-19T11:46:07.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-08-20T18:04:03.000Z (11 months ago)
Last Synced: 2025-02-08T05:11:13.643Z (5 months ago)
Topics: ocr
Language: Python
Homepage:
Size: 676 KB
Stars: 62
Watchers: 2
Forks: 9
Open Issues: 9
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ocrs-models

This project contains tools for training PyTorch models for use with the
[**Ocrs**](https://github.com/robertknight/ocrs/) OCR engine.

## About the models

The ocrs engine splits text detection and recognition into three phases, each
of which corresponds to a different model in this repository:

1. **Text detection**: This is a semantic segmentation model which classifies
each pixel in a greyscale input image as text/non-text. Consumers then
post-process clusters of text pixels to get oriented bounding boxes for
words.
2. **Layout analysis (VERY WIP)**: This is a graph model which takes word
bounding boxes as input nodes and classifies each node's relation to nearby
nodes (eg. start / middle / end of line)
3. **Text recognition**: This is a CRNN model that takes a greyscale image of a
text line as input and returns a sequence of characters.

All models can be exported to ONNX for downstream use.

## Datasets

The models are trained exclusively on datasets which are a) open and b) have non-restrictive licenses. This currently includes:
- [HierText](https://github.com/google-research-datasets/hiertext) (CC-BY-SA 4.0)

## Pre-trained models

Pre-trained models are available from [Hugging
Face](https://huggingface.co/robertknight/ocrs) as PyTorch checkpoints,
[ONNX](https://onnx.ai) and [RTen](https://github.com/robertknight/rten) models.

## Training custom models

See the [Training guide](docs/training.md) for a walk-through of the process to
train models from scratch or fine-tune existing models.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/robertknight/ocrs-models

Awesome Lists containing this project

README