Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wragge/loc-datajam
https://github.com/wragge/loc-datajam
Last synced: 8 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/wragge/loc-datajam
- Owner: wragge
- Created: 2022-10-25T02:59:01.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2022-10-25T06:03:48.000Z (about 2 years ago)
- Last Synced: 2024-11-13T11:16:57.709Z (2 months ago)
- Language: Jupyter Notebook
- Size: 38.1 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LoC Data Jam notebooks
I used these notebooks to extract country names from a very large collection of digitised books from the Library of Congress.
* [Download text files from Amazon S3 using the API](download_from_amazon_api.ipynb)
* [Extracting place names using Spacy and Named Entity Recognition](spacy.ipynb)
* [Look for countries in Wikidata matching the place names extracted through NER](get_countries.ipynb)
* [Filter place references using the Wikidata results](linking-countries-to-references.ipynb)
* [Extract sentences containing country names from the texts](process_sentences.ipynb)
* [Process metadata](process_metadata.ipynb)
* [Save data to an SQLite database for delivery via Datasette](prepare_for_datasette.ipynb)You can view the results in [this database](https://loc-books-yajhxrvxsa-ts.a.run.app/), or using this [simple app](https://wragge.github.io/loc-books-demo/).