Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lovenui/marketing_analysis-aws-spark-sql
https://github.com/lovenui/marketing_analysis-aws-spark-sql
aws aws-rds aws-s3 data-analysis machine-learning marketing-analytics spark
Last synced: 13 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/lovenui/marketing_analysis-aws-spark-sql
- Owner: LoveNui
- Created: 2023-05-16T09:51:22.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-15T16:47:56.000Z (over 1 year ago)
- Last Synced: 2025-01-19T15:18:27.377Z (13 days ago)
- Topics: aws, aws-rds, aws-s3, data-analysis, machine-learning, marketing-analytics, spark
- Language: Jupyter Notebook
- Homepage:
- Size: 301 KB
- Stars: 14
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Marketing Analysis with Big Data
##
Build Data Pipeline with pgAdmin, AWS Cloud and Apache Spark to Analyze and Determine Bias in Amazon Vine Reviews
Goals •
Dataset •
Tools Used •
Analysis and Challenges •
Results •
Summary#
GoalsCompanies pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review. This project will analyze Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. In this project, you’ll have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products.
This scope will cover the TV review dataset. First I'll use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Next, I'll use PySpark to determine if there is any bias toward favorable reviews from Vine members in your dataset.
#
DatasetAmazon S3 bucket containing 50 review datasets.
- [Amazon Review Datasets:](https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt) I'll be analyzing a TSV file with 22,930 rows of TV reviews
#
Tools Used
- **Apache Spark:** A unified analytics engine for large-scale data processing
- **Google Colab:** Cloud based developer notebooks, used for testing scripts and performing complex calculations
- **Amazon Web Services:** Cloud based services that performs many functions, hosting, data processing
- **AWS RDS:** Relational Database service used for querying data in the cloud
- **AWS S3:** Cloud file storage service
- **PGAdmin:** Software used to build databases and analyze data with SQL#
Analysis and ChallengesAfter the success of the SellBy project, our group will be running an analysis Amazon reviews written by members of the paid Amazon Vine program. I analyzed the TV review dataset and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. I then used PySpark to determine if there is any bias toward favorable reviews from Vine members in your dataset.
Below you will see dataframes I used to analyze the TV review data.
### Review Data
![Review Data](images/review_data.png)### Review ID Table
![Review ID Table](images/review_id_table.png)### Customer Table
![Customer Table](images/customer_table.png)### Product Table
![Product Table](images/products_table.png)### Vine Table
![Vine Table](images/vine_df.png)#
Results![Vine Reviews](images/vine_reviews.png)
### Unpaid Reviews
![Unpaid Reviews](images/unpaid_reviews.png)- In Total there were 255 Vine reviews and 22,675 unpaid reviews
- Of the 255 Vine reviews, 103 were 5 star reviews (40%)
- Of the 22,675 unpaid reviews, 10,310 were 5 star reviews (45%)#
SummaryBased on the results of my analysis comparing Vine and unpaid reviews, I did not see evidence of positivity bias within the paid reviews. A higher percentage of unpaid reviews were 5 stars.
Here are some additional levels of analyis I am planning to apply to the current data set:
- Compare the number of 1 star reviews between Vine and Unpaid to determine any additional patterns
- Filter the Vine and Unpaid review datasets by verified purchase to add credibility to our review sample analysis[Back to top](#marketing-analysis-with-big-data)