An open API service indexing awesome lists of open source software.

https://github.com/maxhalford/orc

🧌 Parsing structured information from OCR outputs
https://github.com/maxhalford/orc

Last synced: 8 months ago
JSON representation

🧌 Parsing structured information from OCR outputs

Awesome Lists containing this project

README

          

# orc 🧌

orc is a tool for parsing structured information from (messy) OCR outputs. This toolkit doesn't use fancy deep learning models. It focuses on simple and efficient algorithms that are practical enough to be used in battle.

## Usage

### `fuzz`: fuzzy string matching πŸ˜Άβ€πŸŒ«οΈ

This modules focuses on [approximate string matching](https://www.wikiwand.com/en/Approximate_string_matching). Not only does it give the ability to calculate distances between words, it also records the operations that were performed to transform one word into another.

### `spell`: spell checking πŸ“

### `ocr`: optical character recognition πŸ”¬

### `lines`: line segmentation πŸ“

## Development

```sh
git clone https://github.com/MaxHalford/orc
cd orc
pip install poetry
poetry install
poetry shell
pytest
```

## License

The MIT License (MIT). Please see the [license file](LICENSE) for more information.