Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gurpreet0022/nlp_exploration
This repository explores various Natural Language Processing (NLP) techniques using the NLTK library in Python. It demonstrates these techniques on a sample dataset and performs sentiment analysis on movie reviews.
https://github.com/gurpreet0022/nlp_exploration
beginner-friendly nlp nlp-machine-learning nltk scikit-learn
Last synced: 21 days ago
JSON representation
This repository explores various Natural Language Processing (NLP) techniques using the NLTK library in Python. It demonstrates these techniques on a sample dataset and performs sentiment analysis on movie reviews.
- Host: GitHub
- URL: https://github.com/gurpreet0022/nlp_exploration
- Owner: Gurpreet0022
- License: mit
- Created: 2024-12-22T09:46:01.000Z (25 days ago)
- Default Branch: main
- Last Pushed: 2024-12-22T09:50:24.000Z (25 days ago)
- Last Synced: 2024-12-22T10:30:19.168Z (25 days ago)
- Topics: beginner-friendly, nlp, nlp-machine-learning, nltk, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NLP Techniques and Sentiment Analysis with NLTK
This notebook explores various Natural Language Processing (NLP) techniques using the NLTK library in Python. It demonstrates these techniques on a sample dataset and performs sentiment analysis on movie reviews.
## Tasks Performed
### NLP Techniques
1. **Data Loading and Preprocessing:** Loads a sample dataset for demonstrating NLP techniques.
2. **Tokenization:** Splits text into individual words.
3. **Stop Word Removal:** Removes common words that don't contribute to analysis.
4. **Stemming:** Reduces words to their base form.
5. **Part-of-Speech Tagging:** Assigns grammatical tags to words.
6. **Named Entity Recognition (NER):** Identifies named entities in the text.
7. **Lemmatization:** Similar to stemming, but finds the dictionary form of words.
8. **Corpora Exploration:** Accesses and explores text collections from NLTK's corpora.
9. **WordNet Exploration:** Analyzes word relationships using WordNet.### Sentiment Analysis
1. **Movie Reviews Dataset:** Uses the NLTK movie reviews dataset for sentiment analysis.
2. **Feature Extraction:** Extracts relevant features (words) from the reviews.
3. **Naive Bayes Classifier:** Trains a Naive Bayes classifier to predict sentiment (positive/negative).
4. **Accuracy Evaluation:** Evaluates the classifier's accuracy on a test set.
5. **Saving the Classifier:** Saves the trained classifier using Pickle for later use.## Libraries Used
- NLTK
- Pandas
- NumPy
- Pickle## Datasets
- A sample dataset for demonstrating NLP techniques.
- The NLTK movie reviews dataset for sentiment analysis.## Usage
1. Make sure you have the required libraries installed.
2. Upload the sample dataset to your Google Colab environment (if needed).
3. Run the notebook cells sequentially.