https://github.com/bpkaur/word-frequency-in-moby-dick
To find out the most frequent words in the novel Moby Dick using Python.
https://github.com/bpkaur/word-frequency-in-moby-dick
beautifulsoup data-analysis data-science moby-dick nltk notebook-jupyter python3
Last synced: 7 months ago
JSON representation
To find out the most frequent words in the novel Moby Dick using Python.
- Host: GitHub
- URL: https://github.com/bpkaur/word-frequency-in-moby-dick
- Owner: bpkaur
- Created: 2018-05-18T23:37:15.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-06-06T03:26:57.000Z (about 8 years ago)
- Last Synced: 2025-04-15T20:09:36.540Z (about 1 year ago)
- Topics: beautifulsoup, data-analysis, data-science, moby-dick, nltk, notebook-jupyter, python3
- Language: HTML
- Homepage:
- Size: 1.43 MB
- Stars: 6
- Watchers: 0
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
What are the most frequent words in Herman Melville's novel Moby Dick and how often do they occur? To answer this question we first need the text of the book which is freely available online at Project Gutenberg (contains a large corpus of books) as an HTML file: https://www.gutenberg.org/files/2701/2701-h/2701-h.htm.
To fetch the HTML file with Moby Dick Python package request is used to make a GET request for the website. Then to extract words from this web data BeautifulSoup is used. and the analysis of the distribution of words is done using the Natural Language ToolKit (nltk).
So, what word turned out to be the most common word in Moby Dick?
The answer is "whale"