{"id":26591876,"url":"https://github.com/macnianios/retail_sales_analysis","last_synced_at":"2026-04-17T15:31:50.991Z","repository":{"id":249381945,"uuid":"831362473","full_name":"macnianios/Retail_Sales_Analysis","owner":"macnianios","description":"final data science project on techpro academy data science stream","archived":false,"fork":false,"pushed_at":"2024-07-21T19:09:27.000Z","size":7541,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-23T14:38:15.647Z","etag":null,"topics":["anova","clustering","colab-notebook","data-analysis","data-science","data-science-projects","linear-regression","numpy","pandas","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/macnianios.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-20T10:34:07.000Z","updated_at":"2024-07-21T19:09:29.000Z","dependencies_parsed_at":"2025-03-23T14:42:32.042Z","dependency_job_id":null,"html_url":"https://github.com/macnianios/Retail_Sales_Analysis","commit_stats":null,"previous_names":["macnianios/retail_sales_analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/macnianios/Retail_Sales_Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macnianios%2FRetail_Sales_Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macnianios%2FRetail_Sales_Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macnianios%2FRetail_Sales_Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macnianios%2FRetail_Sales_Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/macnianios","download_url":"https://codeload.github.com/macnianios/Retail_Sales_Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macnianios%2FRetail_Sales_Analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31934328,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T12:37:54.787Z","status":"ssl_error","status_checked_at":"2026-04-17T12:37:25.095Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anova","clustering","colab-notebook","data-analysis","data-science","data-science-projects","linear-regression","numpy","pandas","python"],"created_at":"2025-03-23T14:31:35.935Z","updated_at":"2026-04-17T15:31:50.974Z","avatar_url":"https://github.com/macnianios.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Retail_Sales_Analysis\nThis is the final project from the TechPro Academy Data Science stream\n\nFiles:\n  1. Retail_Sales_Analysis.ipynb : The python file with all the code, plots and comments in english\n  2. Retail_Sales_Analysis_presentation.pptx : The powerpoint of the presentation\n  3. Retail_Sales_Analysis_greek.pdf  : PDF file with the analysis and conclusions in greek language without the code\n  4. Retail_Sales_Analysis_dataset.csv : The dataset in a csv format\n--------------\n\nRunning the Notebook in Google Colab\n  1.\tOpen Google Colab: Go to https://colab.research.google.com/.\n  2.\tUpload the Notebook:\n  3.\tClick on \"File\" -\u003e \"Upload notebook\" and select \"Retail_Sales_Analysis.ipynb\" from your local machine.\n  4.\tAlternatively, you can upload the entire project directory (including \"Retail_Sales_Analysis.ipynb\") to your Google Drive and open the notebook directly from there.\n  5.  Don't forget to also upload the csv file in your google drive and change the path in the upload section of the code \n--------------\n\nObjective\n\nConduct a comprehensive data analysis project on a fictional retail company's sales data,\nfocusing on foundational data science skills and project management.\n--------------\n\nProject Description\n\nUse the dummy dataset provided in the accompanying CSV file for your analysis(Retail_Sales_Analysis_dataset.csv).\n--------------\n\nAnalysis Goals\n\n  • Data cleaning and preparation.\n  \n  • Descriptive statistics: Calculate basic statistics like mean, median, mode, and\n    standard deviation to understand sales trends, customer demographics, and product\n    performance. Analyze patterns like average sales per product category, age\n    distribution of customers, and typical product ratings.\n    \n  • Data visualization: Utilize Python libraries (like Matplotlib, Seaborn) or PowerBI to\n    create visualizations that highlight sales trends, customer demographics, and product\n    performance. You are strongly encouraged to use both methods of visualization,\n    using a non-enterprise solution and an enterprise solution.\n    \n  • Inferential statistics: Conduct simple hypothesis testing to draw conclusions from the\n    data. Test if average sales differ significantly between product categories. Test also if\n    average sales differ significantly between regions.\n    \n--------------\n\nAdvanced Components\n\n  • Predictive modeling: Build a linear regression model to predict sales amounts based\n    on factors like product category, customer demographics, and product ratings.\n    \n  • Advanced statistical analysis: Apply more complex statistical methods, like multivariable regression analysis to understand how various factors jointly influence sales.\n  \n  • Advanced statistical analysis: Apply clustering methods to identify distinct customer\n    segments.\n    \n--------------\n\nMethods Used\n\n    •\tData Manipulation,Cleaning (pandas,numpy)\n    •\tData Visualization(seaborn,matplotlib)\n    •\tManchine Learning(Linear Regression)\n    •\tClustering(Kmeans)\n    •\tANOVA\n    \n--------------\n\nDATA\n\n  The dataset includes the following:\n\n  • SalesDate (date in YYYY-MM-DD format)\n  \n  • ProductCategory (Categorical variable – {Clothing, Electronics, Home Appliances} –\n    text)\n    \n  • SalesAmount (numeric in $)\n  \n  • CustomerAge (Categorical variable – text)\n  \n  • CustomerGender (Categorical variable – {Male, Female, Non-binary} – text)\n  \n  • CustomerLocation (Categorical variable – {Japan, Australia, India, USA, UK, Canada}\n    – text)\n    \n  • ProductRatings (Categorical variable – {1 = low to 5 = high rating} – text)\n  \n--------------\nResults\n\n  • All our findings  from statistical analysis, ANOVA, Liner Regression, Clustering, suggests that our data have much variation, and all categories behave in a way that makes it difficult to predict purchase behavior based solely on customer demographics or product category. We were unable to successfully categorize customers based on their purchase characteristics.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacnianios%2Fretail_sales_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmacnianios%2Fretail_sales_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacnianios%2Fretail_sales_analysis/lists"}