Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gurpreet0022/nlp_exploration

This repository explores various Natural Language Processing (NLP) techniques using the NLTK library in Python. It demonstrates these techniques on a sample dataset and performs sentiment analysis on movie reviews.
https://github.com/gurpreet0022/nlp_exploration

beginner-friendly nlp nlp-machine-learning nltk scikit-learn

Last synced: 3 days ago
JSON representation

Host: GitHub
URL: https://github.com/gurpreet0022/nlp_exploration
Owner: Gurpreet0022
License: mit
Created: 2024-12-22T09:46:01.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-12-22T09:50:24.000Z (2 months ago)
Last Synced: 2024-12-22T10:30:19.168Z (2 months ago)
Topics: beginner-friendly, nlp, nlp-machine-learning, nltk, scikit-learn
Language: Jupyter Notebook
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# NLP Techniques and Sentiment Analysis with NLTK

This notebook explores various Natural Language Processing (NLP) techniques using the NLTK library in Python. It demonstrates these techniques on a sample dataset and performs sentiment analysis on movie reviews.

## Tasks Performed

### NLP Techniques

1. **Data Loading and Preprocessing:** Loads a sample dataset for demonstrating NLP techniques.
2. **Tokenization:** Splits text into individual words.
3. **Stop Word Removal:** Removes common words that don't contribute to analysis.
4. **Stemming:** Reduces words to their base form.
5. **Part-of-Speech Tagging:** Assigns grammatical tags to words.
6. **Named Entity Recognition (NER):** Identifies named entities in the text.
7. **Lemmatization:** Similar to stemming, but finds the dictionary form of words.
8. **Corpora Exploration:** Accesses and explores text collections from NLTK's corpora.
9. **WordNet Exploration:** Analyzes word relationships using WordNet.

### Sentiment Analysis

1. **Movie Reviews Dataset:** Uses the NLTK movie reviews dataset for sentiment analysis.
2. **Feature Extraction:** Extracts relevant features (words) from the reviews.
3. **Naive Bayes Classifier:** Trains a Naive Bayes classifier to predict sentiment (positive/negative).
4. **Accuracy Evaluation:** Evaluates the classifier's accuracy on a test set.
5. **Saving the Classifier:** Saves the trained classifier using Pickle for later use.

## Libraries Used

- NLTK
- Pandas
- NumPy
- Pickle

## Datasets

- A sample dataset for demonstrating NLP techniques.
- The NLTK movie reviews dataset for sentiment analysis.

## Usage

1. Make sure you have the required libraries installed.
2. Upload the sample dataset to your Google Colab environment (if needed).
3. Run the notebook cells sequentially.