Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amir-tav/nlp-sentiment-analysis-
Sentiment analysis using NLP techniques on Amazon product reviews. It covers text pre-processing, visualization, and basic sentiment classification.
https://github.com/amir-tav/nlp-sentiment-analysis-
amazonreviews data-science machine-learning nlp python pytorch sentiment-analysis
Last synced: about 2 months ago
JSON representation
Sentiment analysis using NLP techniques on Amazon product reviews. It covers text pre-processing, visualization, and basic sentiment classification.
- Host: GitHub
- URL: https://github.com/amir-tav/nlp-sentiment-analysis-
- Owner: Amir-Tav
- License: mit
- Created: 2024-10-14T18:49:45.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-18T15:12:10.000Z (2 months ago)
- Last Synced: 2024-11-18T16:53:16.126Z (2 months ago)
- Topics: amazonreviews, data-science, machine-learning, nlp, python, pytorch, sentiment-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 799 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sentiment Analysis Using NLP 📊
Welcome to the world of **Natural Language Processing (NLP)**! In this project, we'll explore sentiment analysis from customer reviews using some powerful NLP techniques. Buckle up as we dive into the code, data, and some fascinating insights!
## Table of Contents
1. [Overview](#overview)
2. [Getting Started](#getting-started)
3. [Data Preprocessing](#data-preprocessing)
4. [Sentiment Analysis](#sentiment-analysis)
5. [Results](#results)
6. [Conclusion](#conclusion)---
## Overview
This project aims to classify customer sentiments based on Amazon product reviews. We use **NLP** tools to preprocess the text data, analyze it, and eventually predict whether reviews are positive or negative.### Libraries Used
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
```---
## Getting Started
### Dataset
The dataset we are working with is the **Amazon Fine Food Reviews** dataset. You can find it [here](https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews).First, we load the dataset and take a subset of 500 reviews to keep things manageable.
```python
df = pd.read_csv('data/Reviews.csv') # Reading the reviews data
df = df.head(500) # Taking a subset of 500 reviews
print(df.shape) # Prints: (500, 10)
```---
## Data Preprocessing
Before diving into analysis, we need to clean and preprocess the data. This includes tokenizing the text, removing stop words, and other common NLP tasks.
### Tokenizing the Text
We use `nltk` to tokenize the words and prepare them for analysis.```python
from nltk.tokenize import word_tokenizedf['tokenized'] = df['Text'].apply(lambda x: word_tokenize(x.lower()))
```### Removing Stop Words
Stop words (common words like "the", "is", "and") don't contribute much meaning and can be removed.```python
from nltk.corpus import stopwordsstop_words = set(stopwords.words('english'))
df['filtered_tokens'] = df['tokenized'].apply(lambda x: [word for word in x if word not in stop_words])
```---
## Sentiment Analysis
Now for the exciting part! We analyze the sentiment of reviews by looking at their textual data.
### Word Cloud Visualization
A quick look at the most frequent words in positive and negative reviews:
```python
from wordcloud import WordCloud# Generate word clouds
positive_reviews = " ".join(df[df['Score'] > 3]['Text'])
wordcloud = WordCloud(width=800, height=400).generate(positive_reviews)# Display the word cloud
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
```### Sentiment Classification
To classify sentiment, we can use basic techniques such as checking for positive or negative keywords.```python
# Sample code to classify based on score (positive/negative sentiment)
df['sentiment'] = df['Score'].apply(lambda x: 'positive' if x > 3 else 'negative')
```---
## Results
After analyzing the data, we found some interesting insights. For example, the majority of reviews in the dataset are positive, which is common for product reviews.
### Data Visualization
We also took a look at the distribution of review scores:
```python
sns.countplot(x='Score', data=df)
plt.title('Distribution of Review Scores')
plt.show()
```---
## Conclusion
This project highlights the basics of sentiment analysis using NLP techniques. We used a simple dataset and some basic text-processing techniques to analyze and classify sentiment. While this is just scratching the surface of NLP, it demonstrates how powerful these techniques can be for understanding large-scale textual data.