An open API service indexing awesome lists of open source software.

https://github.com/pythoncoderunicorn/harrypotterbooks

Harry Potter books for Text Analysis
https://github.com/pythoncoderunicorn/harrypotterbooks

books harrypotter nlp text-analysis

Last synced: about 1 year ago
JSON representation

Harry Potter books for Text Analysis

Awesome Lists containing this project

README

          

# Harry Potter Books

Harry Potter books for Text Analysis.

This work was inspired by [Bradley Boehmke's R package](https://github.com/bradleyboehmke/harrypotter) which claimed to have
clean and tidy text data. Upon further inspection the text was in need of further
text cleaning, including adding paragraphs to the end of a chapter and removing the many
special characters.

This repository has kept each book in `csv` files as to be most applicable to any user who wants to do text analysis and not deal with a `.rda` file. Each `csv` is a book and each book has 2 columns: chapter (in uppercase) and text.

The book order:

- `philosophers_stone`: Harry Potter and the Philosophers Stone, published in 1997
- `chamber_of_secrets`: Harry Potter and the Chamber of Secrets, published in 1998
- `prisoner_of_azkaban`: Harry Potter and the Prisoner of Azkaban, published in 1999
- `goblet_of_fire`: Harry Potter and the Goblet of Fire, published in 2000
- `order_of_the_phoenix`: Harry Potter and the Order of the Phoenix, published in 2003
- `half_blood_prince`: Harry Potter and the Half-Blood Prince, published in 2005
- `deathly_hallows`: Harry Potter and the Deathly Hallows, published in 2007