Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brawer/bern-addresses
https://github.com/brawer/bern-addresses
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/brawer/bern-addresses
- Owner: brawer
- License: mit
- Created: 2023-04-06T13:48:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-15T15:27:14.000Z (about 1 month ago)
- Last Synced: 2024-11-15T15:34:25.741Z (about 1 month ago)
- Language: Python
- Size: 274 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Bern Address Book
## Setup
```sh
[SEMI-DEPRECATED]
$ git clone https://github.com/brawer/bern-addresses.git
$ cd bern-addresses
$ python3 -m venv venv
$ venv/bin/pip3 install -r requirements.txt
$ venv/bin/python3 src/fetch.py
```## Input Pipeline
```sh
git restore proofread/
python3 src/convert_hocr_to_plaintext.py
python3 src/cleanup/blackhole_lines.py
python3 src/cleanup/fix_line_order.py
python3 src/cleanup/sanitize.py
python3 src/cleanup/fix_conjunctions.py
python3 src/cleanup/apply_replacement.py
python3 src/cleanup/fix_conjunctions.py
python3 src/cleanup/fix_indentation.py
```### Processing specific volumes
Use `PROCESS_VOLUMES='1862-07-31,1877-03-31'` to only process a subset of volumes.## Check a single Excel spreadsheet (before import)
```sh
git pull
venv/bin/python3 src/check_reviewed.py NAME.xlsx
```