{"id":31661242,"url":"https://github.com/harshitwaldia/exploratory-data-analysis","last_synced_at":"2026-04-20T06:02:24.718Z","repository":{"id":316692262,"uuid":"1064458556","full_name":"HarshitWaldia/Exploratory-Data-Analysis","owner":"HarshitWaldia","description":"Exploratory Data Analysis with data cleaning, visualization, and insights discovery.","archived":false,"fork":false,"pushed_at":"2025-09-26T04:45:25.000Z","size":6346,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-26T06:20:44.393Z","etag":null,"topics":["exploratory-data-analysis","jypyternotebook","outlier-detection","python","sentiment-analysis","textblob","wordcloud-visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HarshitWaldia.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-26T04:21:37.000Z","updated_at":"2025-09-26T04:47:58.000Z","dependencies_parsed_at":"2025-09-26T06:32:01.047Z","dependency_job_id":null,"html_url":"https://github.com/HarshitWaldia/Exploratory-Data-Analysis","commit_stats":null,"previous_names":["harshitwaldia/exploratory-data-analysis"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/HarshitWaldia/Exploratory-Data-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarshitWaldia%2FExploratory-Data-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarshitWaldia%2FExploratory-Data-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarshitWaldia%2FExploratory-Data-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarshitWaldia%2FExploratory-Data-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HarshitWaldia","download_url":"https://codeload.github.com/HarshitWaldia/Exploratory-Data-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HarshitWaldia%2FExploratory-Data-Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32035276,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T00:18:06.643Z","status":"online","status_checked_at":"2026-04-20T02:00:06.527Z","response_time":94,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["exploratory-data-analysis","jypyternotebook","outlier-detection","python","sentiment-analysis","textblob","wordcloud-visualization"],"created_at":"2025-10-07T18:19:41.916Z","updated_at":"2026-04-20T06:02:24.713Z","avatar_url":"https://github.com/HarshitWaldia.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📊 Amazon Product Reviews - Exploratory Data Analysis (EDA)\n\n## 📌 Overview\nThis project performs **Exploratory Data Analysis (EDA)** on an Amazon product dataset.  \nThe dataset contains product details, prices, discounts, ratings, reviews, and user information.  \n\nThe goal of this analysis is to:\n- Understand the structure and quality of the dataset.  \n- Identify trends in pricing, discounting, and ratings.  \n- Explore customer review patterns.  \n- Detect potential issues like missing values, duplicates, or imbalances.  \n\n---\n\n## 🗂️ Dataset Description\nThe dataset includes the following key columns:\n\n| Column | Description |\n|--------|-------------|\n| `product_id` | Unique identifier for each product |\n| `product_name` | Name/description of the product |\n| `category` | Product category (e.g., Electronics, Accessories) |\n| `discounted_price` | Selling price after discount |\n| `actual_price` | Original price before discount |\n| `discount_percentage` | Percentage discount offered |\n| `rating` | Customer rating (out of 5) |\n| `rating_count` | Number of ratings |\n| `about_product` | Short description/features |\n| `user_id` | Unique ID of reviewer |\n| `user_name` | Name of reviewer |\n| `review_id` | Unique ID of review |\n| `review_title` | Title of review |\n| `review_content` | Full review text |\n| `img_link` | Product image link |\n| `product_link` | Product page link |\n\n---\n\n## 🔍 Steps in EDA\n### 1. **Data Inspection**\n- Used `.info()` to check data types, null values, and dataset size.  \n- Found that most columns are complete, with very few missing values.  \n\n### 2. **Descriptive Statistics**\n- `.describe()` applied to both numeric and categorical columns.  \n- Found **mean ≈ median** in prices → data is fairly symmetric.  \n- Ratings cluster around **4.1**, showing positive bias.  \n\n### 3. **Correlation Analysis**\n- Computed correlation matrix for numeric features.  \n- Observed strong negative correlation between `discount_percentage` and `discounted_price`.  \n- Weak/no correlation between `rating` and price → ratings are not price-driven.  \n\n### 4. **Visualizations**\n- **Bar Chart**: Average rating per category.  \n- **Boxplot**: Discount % distribution across categories.  \n- **Scatterplot**: Discounted price vs rating.  \n- **Word Cloud**: Most frequent terms in reviews.  \n- **Heatmap**: Correlations between numeric features.  \n\n### 5. **Data Quality Checks**\n- Found duplicate product IDs (same product reviewed multiple times).  \n- Prices and discounts stored as strings (`₹`, `%`) → cleaned and converted to numeric.  \n\n---\n\n## 📈 Insights\n- Many products receive **4★ or higher** → customer reviews skew positive.  \n- Discounts are widely offered (~50% most frequent).  \n- Certain categories dominate the dataset (e.g., Electronics \u0026 Accessories).  \n- Some reviews and users appear multiple times → dataset contains duplicate/overlapping entries.  \n\n---\n\n## 🛠️ Tools \u0026 Libraries\n- **Python 3**  \n- **Pandas** → data cleaning \u0026 manipulation  \n- **NumPy** → numerical operations  \n- **Matplotlib / Seaborn** → data visualization  \n- **WordCloud** → review text analysis  \n\n---\n\n## 📌 How to Run\n1. Clone the repository:  \n   ```\n   git clone https://github.com/HarshitWaldia/Exploratory-Data-Analysis.git\n   cd Exploratory-Data-Analysis\n   ```\n2. Install required libraries:\n  ```\n  pip install -r requirements.txt\n  ```\n\n3.Open the Jupyter Notebook:\n  ```\n  jupyter notebook Amazon_EDA.ipynb\n  ```\n\n4. Run the cells step by step to reproduce the analysis.\n\n## 🚀 Future Work\n\n- **Build a recommendation system using ratings \u0026 categories.**\n\n- **Perform sentiment analysis on review text.**\n\n- **Use ML models to predict product ratings based on price \u0026 discount.**\n\n## 👨‍💻 Author\n\n**Harshit Waldia**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharshitwaldia%2Fexploratory-data-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharshitwaldia%2Fexploratory-data-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharshitwaldia%2Fexploratory-data-analysis/lists"}