{"id":20909693,"url":"https://github.com/lovenui/marketing_analysis-aws-spark-sql","last_synced_at":"2025-06-22T14:08:54.452Z","repository":{"id":165934291,"uuid":"641355528","full_name":"LoveNui/Marketing_Analysis-AWS-Spark-SQL","owner":"LoveNui","description":null,"archived":false,"fork":false,"pushed_at":"2023-07-15T16:47:56.000Z","size":308,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-11T08:30:23.895Z","etag":null,"topics":["aws","aws-rds","aws-s3","data-analysis","machine-learning","marketing-analytics","spark"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LoveNui.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-16T09:51:22.000Z","updated_at":"2024-07-31T18:18:29.000Z","dependencies_parsed_at":"2024-11-18T14:36:02.031Z","dependency_job_id":"b0af5a41-a604-4ecc-bfa4-45825b4897e6","html_url":"https://github.com/LoveNui/Marketing_Analysis-AWS-Spark-SQL","commit_stats":null,"previous_names":["superstar512/gammaprotocol","lovenui/gammaprotocol","lovenui/marketing_analysis-aws-spark-sql"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LoveNui/Marketing_Analysis-AWS-Spark-SQL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FMarketing_Analysis-AWS-Spark-SQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FMarketing_Analysis-AWS-Spark-SQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FMarketing_Analysis-AWS-Spark-SQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FMarketing_Analysis-AWS-Spark-SQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LoveNui","download_url":"https://codeload.github.com/LoveNui/Marketing_Analysis-AWS-Spark-SQL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LoveNui%2FMarketing_Analysis-AWS-Spark-SQL/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261304267,"owners_count":23138301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-rds","aws-s3","data-analysis","machine-learning","marketing-analytics","spark"],"created_at":"2024-11-18T14:12:20.440Z","updated_at":"2025-06-22T14:08:49.439Z","avatar_url":"https://github.com/LoveNui.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Marketing Analysis with Big Data\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=images/cloud_etl.png\u003e\n\u003c/div\u003e\n\n## \u003cdiv align=\"center\"\u003eBuild Data Pipeline with pgAdmin, AWS Cloud and Apache Spark to Analyze and Determine Bias in Amazon Vine Reviews \u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"#goals\"\u003eGoals\u003c/a\u003e \u0026nbsp;\u0026bull;\u0026nbsp;\n\u003ca href=\"#dataset\"\u003eDataset\u003c/a\u003e \u0026nbsp;\u0026bull;\u0026nbsp;\n\u003ca href=\"#tools-used\"\u003eTools Used\u003c/a\u003e \u0026nbsp;\u0026bull;\u0026nbsp;\n\u003ca href=\"#analysis-and-challenges\"\u003eAnalysis and Challenges\u003c/a\u003e \u0026nbsp;\u0026bull;\u0026nbsp;\n\u003ca href=\"#results\"\u003eResults\u003c/a\u003e \u0026nbsp;\u0026bull;\u0026nbsp;\n\u003ca href=\"#summary\"\u003eSummary\u003c/a\u003e\n\u003c/p\u003e\n\n# \u003cdiv align=\"center\"\u003eGoals\u003c/div\u003e\n\nCompanies pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review. This project will analyze Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. In this project, you’ll have access to approximately 50 datasets. Each one contains reviews of a specific product, from clothing apparel to wireless products. \n\nThis scope will cover the TV review dataset. First I'll use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Next, I'll use PySpark to determine if there is any bias toward favorable reviews from Vine members in your dataset.\n\n# \u003cdiv align=\"center\"\u003eDataset\u003c/div\u003e\n\nAmazon S3 bucket containing 50 review datasets.\n\n- [Amazon Review Datasets:](https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt)  I'll be analyzing a TSV file with 22,930 rows of TV reviews\n\n# \u003cdiv align=\"center\"\u003eTools Used\u003c/div\u003e\n- **Apache Spark:** A unified analytics engine for large-scale data processing\n- **Google Colab:** Cloud based developer notebooks, used for testing scripts and performing complex calculations\n- **Amazon Web Services:** Cloud based services that performs many functions, hosting, data processing\n    - **AWS RDS:** Relational Database service used for querying data in the cloud\n    - **AWS S3:** Cloud file storage service\n- **PGAdmin:** Software used to build databases and analyze data with SQL\n\n# \u003cdiv align=\"center\"\u003eAnalysis and Challenges\u003c/div\u003e\n\nAfter the success of the SellBy project, our group will be running an analysis Amazon reviews written by members of the paid Amazon Vine program. I analyzed the TV review dataset and use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. I then used PySpark to determine if there is any bias toward favorable reviews from Vine members in your dataset.\n\nBelow you will see dataframes I used to analyze the TV review data.\n\n### Review Data\n![Review Data](images/review_data.png)\n\n### Review ID Table\n![Review ID Table](images/review_id_table.png)\n\n### Customer Table\n![Customer Table](images/customer_table.png)\n\n### Product Table\n![Product Table](images/products_table.png)\n\n### Vine Table\n![Vine Table](images/vine_df.png)\n\n# \u003cdiv align=\"center\"\u003eResults\u003c/div\u003e\n\n![Vine Reviews](images/vine_reviews.png)\n\n### Unpaid Reviews\n![Unpaid Reviews](images/unpaid_reviews.png)\n\n- In Total there were 255 Vine reviews and 22,675 unpaid reviews\n- Of the 255 Vine reviews, 103 were 5 star reviews (40%)\n- Of the 22,675 unpaid reviews, 10,310 were 5 star reviews (45%)\n\n# \u003cdiv align=\"center\"\u003eSummary\u003c/div\u003e\n\nBased on the results of my analysis comparing Vine and unpaid reviews, I did not see evidence of positivity bias within the paid reviews. A higher percentage of unpaid reviews were 5 stars. \n\nHere are some additional levels of analyis I am planning to apply to the current data set:\n- Compare the number of 1 star reviews between Vine and Unpaid to determine any additional patterns\n- Filter the Vine and Unpaid review datasets by verified purchase to add credibility to our review sample analysis\n\n[Back to top](#marketing-analysis-with-big-data)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovenui%2Fmarketing_analysis-aws-spark-sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flovenui%2Fmarketing_analysis-aws-spark-sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovenui%2Fmarketing_analysis-aws-spark-sql/lists"}