Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/altomator/alto-iiif

Extracting illustrations from ALTO documents with IIIF
https://github.com/altomator/alto-iiif

alto-xml iiif perl

Last synced: about 1 month ago
JSON representation

Extracting illustrations from ALTO documents with IIIF

Awesome Lists containing this project

README

        

## Extracting illustrations from ALTO files with IIIF

### Synopsis
Extracting illustrations described in OCRed documents (ALTO format) with IIIF API.

[Full presentation in French](https://altomator.wordpress.com/2015/11/15/extraire-les-illustrations-dune-collection-de-documents-alto-avec-iiif/)

### Installation
You will need 4 scripts :

1. filterIMG.sh (shell)
2. processURLs.pl (Perl)
3. extractIMG.pl (Perl)
4. extractMD.pl (Perl)

A batch.sh script chains the commands.

The documents must be stored in a "DOCS" folder.
The images will be generated in a "IMG" folder.
The metadata will be generated in a "MD" folder.

### Tests
1. Open a command line terminal.
2. > filterIMG.sh
2. > perl processURLs.pl illustrations.txt
3. > perl extractIMG.pl illustrations.txt_URL 200 -- minimal size in Ko of the extracted images
4. > perl extractMD.pl illustrations.txt_URL

## License
CC0

CC0