https://github.com/gnames/ds-ruhoff-mollusca
Data-Source out of publication "Index to the species of Mollusca introduced from 1850 to 1870" Ruhoff, Florence A. 1980
https://github.com/gnames/ds-ruhoff-mollusca
Last synced: 5 months ago
JSON representation
Data-Source out of publication "Index to the species of Mollusca introduced from 1850 to 1870" Ruhoff, Florence A. 1980
- Host: GitHub
- URL: https://github.com/gnames/ds-ruhoff-mollusca
- Owner: gnames
- Created: 2023-06-16T10:40:33.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2024-12-02T21:35:18.000Z (over 1 year ago)
- Last Synced: 2025-09-05T00:09:04.934Z (10 months ago)
- Language: Ruby
- Size: 14.9 MB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Scientific names extracted from Ruhoff 1980
[](https://doi.org/10.5281/zenodo.14262688)
The goal of this project is to extract scientific names from
[Ruhoff 1980](https://doi.org/10.5479/si.00810282.294)
Cleaned up data [file](data/08-reconcile.csv)
## Process
- [x] Make [OCR](data/01-ocr.txt)
- [x] [Concatenate lines](data/03-concat.txt)
- [x] [Fix spaces](data/04-sortfix.txt) in species names
- [x] [Fix commas](data/04-sortfix.txt) which were recognized as periods.
- [x] [Fix years](data/05-year.txt)
- [x] [Extract name part](data/06-names.csv) (06-names.csv first column is the place to fix errors)
- [x] [Reformat name part](data/07-fmt-names.csv)
- [x] [Fix spellings in names](data/07-fmt-names.csv)
- [x] [Run reconciliation using GNverifier with OpenRefine](data/08-reconcile.csv)
## Stats
| Names | Number | Percentage |
| ----------------------- | ------ | ---------- |
| Total | 35487 | 100% |
| All Matches | 26799 | 75.4% |
| No Match | 8688 | 24.6% |
| Canonical + Auth. Match | 22311 | 62.8% |
| Canonical Match | 3448 | 9.7% |
| Fuzzy Canonical Match | 1040 | 2.9% |