{"id":25726768,"url":"https://github.com/billy-enrizky/sales-analysis","last_synced_at":"2026-05-07T09:35:36.275Z","repository":{"id":203822077,"uuid":"710476656","full_name":"billy-enrizky/Sales-Analysis","owner":"billy-enrizky","description":"\"Sales Data Analysis Project: Analyzing sales data, cleaning, and exploring insights. Python and Pandas used for data analysis.\"","archived":false,"fork":false,"pushed_at":"2023-11-03T21:27:46.000Z","size":5787,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-26T09:36:53.893Z","etag":null,"topics":["dataanalysis","exploratory-data-analysis","jupyter-notebook","pandas","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/billy-enrizky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-26T19:16:22.000Z","updated_at":"2024-06-17T14:00:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"d43046b5-4c58-46e6-9a7f-cac654777f45","html_url":"https://github.com/billy-enrizky/Sales-Analysis","commit_stats":null,"previous_names":["billy-enrizky/sales-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/billy-enrizky/Sales-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billy-enrizky%2FSales-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billy-enrizky%2FSales-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billy-enrizky%2FSales-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billy-enrizky%2FSales-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/billy-enrizky","download_url":"https://codeload.github.com/billy-enrizky/Sales-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/billy-enrizky%2FSales-Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32731754,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-07T02:14:30.463Z","status":"ssl_error","status_checked_at":"2026-05-07T02:14:29.405Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataanalysis","exploratory-data-analysis","jupyter-notebook","pandas","python"],"created_at":"2025-02-25T23:18:46.909Z","updated_at":"2026-05-07T09:35:36.254Z","avatar_url":"https://github.com/billy-enrizky.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sales Data Analysis Project\n\nThis project involves the analysis of sales data to gain insights into various aspects of the sales operation. The dataset used for this analysis includes information about sales orders, such as order details, products, quantities, prices, and more.\n\n## Table of Contents\n1. [Project Overview](#project-overview)\n2. [Getting Started](#getting-started)\n3. [Data Cleaning](#data-cleaning)\n4. [Data Exploration](#data-exploration)\n    - [Question 1: What was the best month for sales? How much was earned that month?](#question-1)\n    - [Question 2: What city sold the most products?](#question-2)\n    - [Question 3: What time should we display advertisements to maximize the likelihood of customers buying products?](#question-3)\n    - [Question 4: What products are most often sold together?](#question-4)\n    - [What product sold the most? Why do you think it sold the most?](#most-sold-product)\n\n\u003ca name=\"project-overview\"\u003e\u003c/a\u003e\n## Project Overview\n\nThe goal of this project is to analyze sales data to gain insights into various aspects of the sales operation. This includes cleaning the data, performing data exploration, and answering specific questions related to sales performance.\n\n\u003ca name=\"getting-started\"\u003e\u003c/a\u003e\n## Getting Started\n\n### Import necessary libraries\n```python\nimport os\nimport pandas as pd\n```\n\n### Merge data from each month into one CSV\n```python\npath = \"./Sales_Data\"\nfiles = [file for file in os.listdir(path) if not file.startswith('.')]  # Ignore hidden files\nall_months_data = pd.DataFrame()\n\nfor file in files:\n    current_data = pd.read_csv(path + \"/\" + file)\n    all_months_data = pd.concat([all_months_data, current_data])\n    \nall_months_data.to_csv(\"all_data_copy.csv\", index=False)\n```\n\n### Read in the updated dataframe\n```python\nall_data = pd.read_csv(\"all_data.csv\")\n```\n\n\u003ca name=\"data-cleaning\"\u003e\u003c/a\u003e\n## Data Cleaning\n\n### Drop rows of NaN\n```python\nnan_df = all_data[all_data.isna().any(axis=1)]\nall_data = all_data.dropna(how='all')\n```\n\n### Get rid of text in the 'Order Date' column\n```python\nall_data = all_data[all_data['Order Date'].str[0:2] != 'Or']\n```\n\n### Make columns the correct type\n```python\nall_data['Quantity Ordered'] = pd.to_numeric(all_data['Quantity Ordered'])\nall_data['Price Each'] = pd.to_numeric(all_data['Price Each'])\n```\n\n### Augment data with additional columns\n\n#### Add month column\n```python\nall_data['Month'] = all_data['Order Date'].str[0:2]\nall_data['Month'] = all_data['Month'].astype('int32')\n```\n\n#### Add month column (alternative method)\n```python\nall_data['Month 2'] = pd.to_datetime(all_data['Order Date']).dt.month\n```\n\n#### Add city column\n```python\ndef get_city(address):\n    return address.split(\",\")[1].strip(\" \")\n\ndef get_state(address):\n    return address.split(\",\")[2].split(\" \")[1]\n\nall_data['City'] = all_data['Purchase Address'].apply(lambda x: f\"{get_city(x)}  ({get_state(x)})\")\n```\n\n\u003ca name=\"data-exploration\"\u003e\u003c/a\u003e\n## Data Exploration\n\n\u003ca name=\"question-1\"\u003e\u003c/a\u003e\n### Question 1: What was the best month for sales? How much was earned that month?\n```python\nall_data['Sales'] = all_data['Quantity Ordered'].astype('int') * all_data['Price Each'].astype('float')\nsales_by_month = all_data.groupby(['Month']).sum()\n\nimport matplotlib.pyplot as plt\n\nmonths = range(1, 13)\n\nplt.bar(months, sales_by_month['Sales'])\nplt.xticks(months)\nplt.ylabel('Sales in USD ($)')\nplt.xlabel('Month number')\nplt.show()\n```\n\n\u003ca name=\"question-2\"\u003e\u003c/a\u003e\n### Question 2: What city sold the most product?\n```python\ncity_sales = all_data.groupby(['City']).sum()\n\nkeys = [city for city, df in all_data.groupby(['City'])]\n\nplt.bar(keys, city_sales['Sales'])\nplt.ylabel('Sales in USD ($)')\nplt.xlabel('City')\nplt.xticks(keys, rotation='vertical', size=8)\nplt.show()\n```\n\n\u003ca name=\"question-3\"\u003e\u003c/a\u003e\n### Question 3: What time should we display advertisements to maximize the likelihood of customers buying a product?\n```python\nall_data['Hour'] = pd.to_datetime(all_data['Order Date']).dt.hour\n\nkeys = [pair for pair, df in all_data.groupby(['Hour'])]\n\nplt.plot(keys, all_data.groupby(['Hour']).count()['Count'])\nplt.xticks(keys)\nplt.grid()\nplt.show()\n```\n\n\u003ca name=\"question-4\"\u003e\u003c/a\u003e\n### Question 4: What products are most often sold together?\n```python\n# Find products that are often sold together\ndf = all_data[all_data['Order ID'].duplicated(keep=False)]\ndf['Grouped'] = df.groupby('Order ID')['Product'].transform(lambda x: ','.join(x))\ndf2 = df[['Order ID', 'Grouped']].drop duplicates()\n\n# Count combinations\nfrom itertools import combinations\nfrom collections import Counter\n\ncount = Counter()\nfor row in df2['Grouped']:\n    row_list = row.split(',')\n    count.update(Counter(combinations(row_list, 2))\n\n# Display the most common product combinations\nfor key, value in count.most_common(10):\n    print(key, value)\n```\n\n\u003ca name=\"most-sold-product\"\u003e\u003c/a\u003e\n### What product sold the most? Why do you think it sold the most?\n```python\nproduct_group = all_data.groupby('Product')\nquantity_ordered = product_group.sum()['Quantity Ordered']\n\nkeys = [pair for pair, df in product_group]\n\nplt.bar(keys, quantity_ordered)\nplt.xticks(keys, rotation='vertical', size=8)\n\nprices = all_data.groupby('Product').mean()['Price Each']\n\nfig, ax1 = plt.subplots()\nax2 = ax1.twinx()\nax1.bar(keys, quantity_ordered, color='g')\nax2.plot(keys, prices, color='b')\nax1.set_xlabel('Product Name')\nax1.set_ylabel('Quantity Ordered', color='g')\nax2.set_ylabel('Price ($)', color='b')\nplt.show()\n```\n\nThis README provides an overview of the Sales Data Analysis project, including the code for data cleaning, data exploration, and answers to specific questions related to the sales data. The project aims to provide insights into sales performance, best-selling products, and sales trends over time.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbilly-enrizky%2Fsales-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbilly-enrizky%2Fsales-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbilly-enrizky%2Fsales-analysis/lists"}