https://github.com/ma-fi-94/letters
An 'end to end' data science project analysing the letters between two German poets. Includes a simple scraper for getting raw data, data cleaning, preprocessing, analysis and visualisation.
https://github.com/ma-fi-94/letters
bag-of-words beautifulsoup4 data-science goethe jupyter-notebook keras keras-tensorflow letters natural-language-processing python3 schiller web-scraping
Last synced: 2 months ago
JSON representation
An 'end to end' data science project analysing the letters between two German poets. Includes a simple scraper for getting raw data, data cleaning, preprocessing, analysis and visualisation.
- Host: GitHub
- URL: https://github.com/ma-fi-94/letters
- Owner: Ma-Fi-94
- Created: 2021-10-17T11:48:05.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-17T16:10:55.000Z (over 2 years ago)
- Last Synced: 2025-02-12T06:24:26.731Z (4 months ago)
- Topics: bag-of-words, beautifulsoup4, data-science, goethe, jupyter-notebook, keras, keras-tensorflow, letters, natural-language-processing, python3, schiller, web-scraping
- Language: Jupyter Notebook
- Homepage: https://mmfischer.de/003_letters/003_letters.html
- Size: 4.9 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Letters
A small data science project I just started to work on for fun, analysing the letters between two famous German poets -- J. W. v. Goethe and J. C. F. v. Schiller :).# Contents
- scrape.py downloads 14 HTML files from Projekt Gutenberg (www.projekt-gutenberg.org) containing ~1000 letters exchanged between between Goethe and Schiller.
- preprocess.py extracts all letter numbers, letter writers and letter contents from the raw HTML files, and writes them to one single CSV file. This will be used for further analysis
- all_letters.csv is this CSV file
- The jupyter notebooks show the results of the analyses.# Writeup
A writeup of the analyses and results can be found on my blog: https://mmfischer.de/003_letters/003_letters.html