{"id":21461115,"url":"https://github.com/harris-giki/e-comdataanalysis_ml","last_synced_at":"2026-04-14T00:02:37.063Z","repository":{"id":261435789,"uuid":"884301987","full_name":"Harris-giki/E-comDataAnalysis_ML","owner":"Harris-giki","description":" E-commerce Customer Analysis with Linear Regression: analyzes customer behavior within an e-commerce setting and predict yearly customer spending based on various features using a linear regression model.","archived":false,"fork":false,"pushed_at":"2024-11-11T18:47:24.000Z","size":885,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-17T01:34:05.031Z","etag":null,"topics":["development","ecommerce","linear-regression","machine-learning","model","prediction-model","python","scikit-learn"],"latest_commit_sha":null,"homepage":"https://ecom-data-analysis.streamlit.app/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Harris-giki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-06T14:05:36.000Z","updated_at":"2024-11-11T18:47:27.000Z","dependencies_parsed_at":"2024-11-11T19:18:06.906Z","dependency_job_id":null,"html_url":"https://github.com/Harris-giki/E-comDataAnalysis_ML","commit_stats":null,"previous_names":["harris-giki/linearregressionproject_1","harris-giki/e-comdataanalysis_ml"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Harris-giki/E-comDataAnalysis_ML","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Harris-giki%2FE-comDataAnalysis_ML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Harris-giki%2FE-comDataAnalysis_ML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Harris-giki%2FE-comDataAnalysis_ML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Harris-giki%2FE-comDataAnalysis_ML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Harris-giki","download_url":"https://codeload.github.com/Harris-giki/E-comDataAnalysis_ML/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Harris-giki%2FE-comDataAnalysis_ML/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270796216,"owners_count":24647319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["development","ecommerce","linear-regression","machine-learning","model","prediction-model","python","scikit-learn"],"created_at":"2024-11-23T07:07:32.899Z","updated_at":"2026-04-14T00:02:37.014Z","avatar_url":"https://github.com/Harris-giki.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cbody\u003e\n    \u003ch1\u003eProject Name: E-commerce Customer Analysis with Linear Regression\u003c/h1\u003e\n    \u003ch2\u003eREADME\u003c/h2\u003e\n    \u003cdiv class=\"section\"\u003e\n        \u003ch3\u003eProject Purpose\u003c/h3\u003e\n        \u003cp\u003eIn this model, we are predicting how much an e-commerce customer will spend in a year using data like their time spent on the website and how long they've been a member. We load and explore the data, select the most relevant factors (features), and build a linear regression model to make predictions. We then evaluate the model’s accuracy using error metrics, visualize the results, and interpret which features have the most impact on spending. The goal is to create a model that can predict future spending based on customer behavior.\u003c/p\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"section\"\u003e\n        \u003ch3\u003eData Requirements\u003c/h3\u003e\n        \u003cp\u003eEnsure that the dataset \u003ccode\u003eecommerce.csv\u003c/code\u003e is in the same directory as the code file. The dataset can be downloaded from the repository or from Kaggle if not already included.\u003c/p\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"section\"\u003e\n        \u003ch3\u003eProcedure Overview\u003c/h3\u003e\n        \u003col\u003e\n            \u003cli\u003e\u003cstrong\u003eData Loading \u0026 Exploration:\u003c/strong\u003e Load the dataset, examine the structure, and perform initial statistical analyses. Visualize key relationships between features and target variables to gain insights.\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eFeature Engineering and Model Selection:\u003c/strong\u003e Select relevant features based on correlation analysis and apply a linear regression model using scikit-learn to predict the target variable.\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eModel Evaluation:\u003c/strong\u003e Assess model performance using metrics like Mean Absolute Error, Mean Squared Error, and Root Mean Squared Error. Visualize predictions and residuals to analyze the model's performance.\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eInterpretation and Insights:\u003c/strong\u003e Interpret model coefficients to understand feature importance. Assess residual distribution to ensure model assumptions hold.\u003c/li\u003e\n        \u003c/ol\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"section\"\u003e\n        \u003ch3\u003eStep-by-Step Guide\u003c/h3\u003e\n        \u003ch4\u003eStep 1: Import Libraries\u003c/h4\u003e\n        \u003cul\u003e\n            \u003cli\u003e\u003cstrong\u003ePandas\u003c/strong\u003e - data handling\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eMatplotlib \u0026 Seaborn\u003c/strong\u003e - visualization\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eScikit-learn\u003c/strong\u003e - machine learning\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eSciPy\u003c/strong\u003e - statistical analysis\u003c/li\u003e\n        \u003c/ul\u003e\n        \u003cpre\u003e\u003ccode\u003eimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LinearRegression\nimport scipy.stats as stats\u003c/code\u003e\u003c/pre\u003e\n        \u003ch4\u003eStep 2: Data Loading \u0026 Initial Exploration\u003c/h4\u003e\n        \u003cp\u003eLoad the data and check the structure:\u003c/p\u003e\n        \u003cpre\u003e\u003ccode\u003edf = pd.read_csv('ecommerce.csv')\ndf.head()\u003c/code\u003e\u003c/pre\u003e\n        \u003ch4\u003eStep 3: Exploratory Data Analysis (EDA)\u003c/h4\u003e\n        \u003cp\u003eVisualize relationships with joint plots and pair plots:\u003c/p\u003e\n        \u003cpre\u003e\u003ccode\u003esns.jointplot(x='Time on Website', y='Yearly Amount Spent', data=df, alpha=0.5)\nsns.pairplot(df, plot_kws={'alpha': 0.4})\u003c/code\u003e\u003c/pre\u003e\n        \u003ch4\u003eStep 4: Data Splitting \u0026 Model Training\u003c/h4\u003e\n        \u003cp\u003eSplit data and train the model:\u003c/p\u003e\n        \u003cpre\u003e\u003ccode\u003ex = df[['Avg. Session Length', 'Time on App', 'Time on Website', 'Length of Membership']]\ny = df['Yearly Amount Spent']\nX_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)\nlm = LinearRegression()\nlm.fit(X_train, y_train)\u003c/code\u003e\u003c/pre\u003e\n        \u003ch4\u003eStep 5: Model Interpretation\u003c/h4\u003e\n        \u003cp\u003eView feature impact with model coefficients:\u003c/p\u003e\n        \u003cpre\u003e\u003ccode\u003ecdf = pd.DataFrame(lm.coef_, x.columns, columns=['Coeff'])\u003c/code\u003e\u003c/pre\u003e\n        \u003ch4\u003eStep 6: Predictions and Visualization\u003c/h4\u003e\n        \u003cp\u003ePlot predicted values against actual values:\u003c/p\u003e\n        \u003cpre\u003e\u003ccode\u003epredictions = lm.predict(X_test)\nsns.scatterplot(x=predictions, y=y_test)\u003c/code\u003e\u003c/pre\u003e\n        \u003ch4\u003eStep 7: Performance Metrics\u003c/h4\u003e\n        \u003cp\u003eEvaluate using MAE, MSE, and RMSE:\u003c/p\u003e\n        \u003cpre\u003e\u003ccode\u003efrom sklearn.metrics import mean_absolute_error, mean_squared_error\nimport math\nprint(\"MAE:\", mean_absolute_error(y_test, predictions))\nprint(\"RMSE:\", math.sqrt(mean_squared_error(y_test, predictions)))\u003c/code\u003e\u003c/pre\u003e\n        \u003ch4\u003eStep 8: Residual Analysis\u003c/h4\u003e\n        \u003cp\u003eVerify residuals for model fit assessment:\u003c/p\u003e\n        \u003cpre\u003e\u003ccode\u003eresiduals = y_test - predictions\nsns.histplot(residuals, bins=30)\u003c/code\u003e\u003c/pre\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"section\"\u003e\n        \u003ch3\u003eResults\u003c/h3\u003e\n        \u003cp\u003eThe model shows strong predictive performance with meaningful features. Residuals follow a near-normal distribution, supporting model fit.\u003c/p\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"section\"\u003e\n        \u003ch3\u003eApplications\u003c/h3\u003e\n        \u003cul\u003e\n            \u003cli\u003e\u003cstrong\u003eMarketing:\u003c/strong\u003e Predict spending for targeted campaigns.\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eCustomer Retention:\u003c/strong\u003e Identify high-value customer characteristics.\u003c/li\u003e\n            \u003cli\u003e\u003cstrong\u003eBusiness Decisions:\u003c/strong\u003e Data-driven insights for strategic planning.\u003c/li\u003e\n        \u003c/ul\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"section\"\u003e\n        \u003ch3\u003eInstructions to Run\u003c/h3\u003e\n        \u003col\u003e\n            \u003cli\u003eEnsure Python and libraries are installed.\u003c/li\u003e\n            \u003cli\u003eDownload \u003ccode\u003eecommerce.csv\u003c/code\u003e and place it in the project folder.\u003c/li\u003e\n            \u003cli\u003eRun each section in a Jupyter Notebook or compatible IDE to analyze results.\u003c/li\u003e\n        \u003c/ol\u003e\n    \u003c/div\u003e\n\u003c/body\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharris-giki%2Fe-comdataanalysis_ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharris-giki%2Fe-comdataanalysis_ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharris-giki%2Fe-comdataanalysis_ml/lists"}