https://github.com/pythoncoderunicorn/harrypotterbooks
Harry Potter books for Text Analysis
https://github.com/pythoncoderunicorn/harrypotterbooks
books harrypotter nlp text-analysis
Last synced: about 1 year ago
JSON representation
Harry Potter books for Text Analysis
- Host: GitHub
- URL: https://github.com/pythoncoderunicorn/harrypotterbooks
- Owner: PythonCoderUnicorn
- Created: 2022-09-25T15:32:29.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-09-25T16:01:35.000Z (over 3 years ago)
- Last Synced: 2025-01-17T20:24:32.617Z (about 1 year ago)
- Topics: books, harrypotter, nlp, text-analysis
- Language: R
- Homepage:
- Size: 2.22 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README

# Harry Potter Books
Harry Potter books for Text Analysis.
This work was inspired by [Bradley Boehmke's R package](https://github.com/bradleyboehmke/harrypotter) which claimed to have
clean and tidy text data. Upon further inspection the text was in need of further
text cleaning, including adding paragraphs to the end of a chapter and removing the many
special characters.
This repository has kept each book in `csv` files as to be most applicable to any user who wants to do text analysis and not deal with a `.rda` file. Each `csv` is a book and each book has 2 columns: chapter (in uppercase) and text.
The book order:
- `philosophers_stone`: Harry Potter and the Philosophers Stone, published in 1997
- `chamber_of_secrets`: Harry Potter and the Chamber of Secrets, published in 1998
- `prisoner_of_azkaban`: Harry Potter and the Prisoner of Azkaban, published in 1999
- `goblet_of_fire`: Harry Potter and the Goblet of Fire, published in 2000
- `order_of_the_phoenix`: Harry Potter and the Order of the Phoenix, published in 2003
- `half_blood_prince`: Harry Potter and the Half-Blood Prince, published in 2005
- `deathly_hallows`: Harry Potter and the Deathly Hallows, published in 2007