An open API service indexing awesome lists of open source software.

https://github.com/ma-fi-94/letters

An 'end to end' data science project analysing the letters between two German poets. Includes a simple scraper for getting raw data, data cleaning, preprocessing, analysis and visualisation.
https://github.com/ma-fi-94/letters

bag-of-words beautifulsoup4 data-science goethe jupyter-notebook keras keras-tensorflow letters natural-language-processing python3 schiller web-scraping

Last synced: 2 months ago
JSON representation

An 'end to end' data science project analysing the letters between two German poets. Includes a simple scraper for getting raw data, data cleaning, preprocessing, analysis and visualisation.

Awesome Lists containing this project

README

        

# Letters
A small data science project I just started to work on for fun, analysing the letters between two famous German poets -- J. W. v. Goethe and J. C. F. v. Schiller :).

# Contents
- scrape.py downloads 14 HTML files from Projekt Gutenberg (www.projekt-gutenberg.org) containing ~1000 letters exchanged between between Goethe and Schiller.
- preprocess.py extracts all letter numbers, letter writers and letter contents from the raw HTML files, and writes them to one single CSV file. This will be used for further analysis
- all_letters.csv is this CSV file
- The jupyter notebooks show the results of the analyses.

# Writeup
A writeup of the analyses and results can be found on my blog: https://mmfischer.de/003_letters/003_letters.html