Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lmullen/chronam-ocr-debatcher
Turn a batch of OCR files from Chronicling America into a CSV that can be imported into a database
https://github.com/lmullen/chronam-ocr-debatcher
Last synced: about 1 month ago
JSON representation
Turn a batch of OCR files from Chronicling America into a CSV that can be imported into a database
- Host: GitHub
- URL: https://github.com/lmullen/chronam-ocr-debatcher
- Owner: lmullen
- License: mit
- Created: 2018-11-08T04:36:20.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-12-13T16:47:53.000Z (about 3 years ago)
- Last Synced: 2024-10-28T04:59:17.756Z (3 months ago)
- Language: Go
- Size: 10.7 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/lmullen/chronam-ocr-debatcher.svg?branch=master)](https://travis-ci.org/lmullen/chronam-ocr-debatcher)
# Chronicling America OCR debatcher
This program takes paths to `.tar.bz2` batches of OCR files from the
*Chronicling America* [bulk data
downloads](https://chroniclingamerica.loc.gov/about/api/#bulk-data). It converts
each batch into a CSV file, which you can load into a database or do whatever
you like with. It will process the batches concurrently.Usage:
```
./chronam-ocr-debatcher [--processes=8]
```You can download binaries from the [releases page](https://github.com/lmullen/chronam-ocr-debatcher/releases).