https://github.com/caritoramos/text-mining-project-in-python
https://github.com/caritoramos/text-mining-project-in-python
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/caritoramos/text-mining-project-in-python
- Owner: CaritoRamos
- Created: 2024-11-06T23:44:42.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-11-07T00:07:36.000Z (7 months ago)
- Last Synced: 2025-02-07T16:38:00.708Z (3 months ago)
- Language: Jupyter Notebook
- Size: 2.9 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
In this project, the fascinating world of text mining is explored, applied to the analysis of a specific book. Using natural language processing (NLP) techniques and tools like NLTK and SpaCy, various aspects of the book are analyzed, from word frequency to sentiment analysis and entity recognition. The goal is to leverage computational language analysis to extract meaningful insights from literary works.
A large text corpus is first collected, serving as the raw material for the analysis. Through the NLTK library, the text is accessed and prepared by converting all words to lowercase, removing punctuation, and tokenizing the words for further exploration.
Next, word frequency is analyzed using NLTK, identifying the most common words in the text. A word cloud visualizes these frequencies, highlighting thematic trends and recurring motifs in the book.
The analysis is then deepened with lemmatization and named entity recognition using SpaCy, revealing the base forms of words and identifying key entities within the literary universe of the book.
Finally, sentiment analysis is performed with NLTK, calculating the polarity and subjectivity of the text. This provides insights into the emotions and tone conveyed throughout the work, enhancing the understanding of character feelings, plot evolution, and the emotional impact on readers, while offering a deeper interpretation of underlying themes and author intent.