Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cneud/alto-tools

Python tools for performing various operations on ALTO XML files
https://github.com/cneud/alto-tools

alto-xml digital-library optical-character-recognition

Last synced: about 2 months ago
JSON representation

Python tools for performing various operations on ALTO XML files

Awesome Lists containing this project

README

        

ALTO Tools



Python tools for performing various operations on ALTO XML files







---

## Installation

You can install from [PyPI](https://pypi.org/project/alto-tools/) by running

```bash
pip install alto-tools
```

or clone the repository, enter it and run

```bash
pip install .
```

## Usage

```bash
alto-tools [OPTION]
```

`INPUT` should be the path to an ALTO xml file or directory containing ALTO xml files.

The following `OPTIONS` are currently supported:

| OPTION | Description |
|------------------------|:------------------------------------------------------------------|
| `-t` `--text` | Extract UTF-8 encoded text content |
| `-c` `--confidence` | Extract mean OCR word confidence score |
| `-i` `--illustrations` | Extract bounding box coordinates of `` elements |
| `-g` `--graphics` | Extract bounding box coordinates of `` elements |
| `-s` `--statistics` | Extract statistical info (no. of textlines, words, glyphs etc.) |

All output is sent to `stdout`.