Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/harris-giki/e-comdataanalysis_ml

E-commerce Customer Analysis with Linear Regression: analyzes customer behavior within an e-commerce setting and predict yearly customer spending based on various features using a linear regression model.
https://github.com/harris-giki/e-comdataanalysis_ml

development ecommerce linear-regression machine-learning model prediction-model python scikit-learn

Last synced: about 2 months ago
JSON representation

E-commerce Customer Analysis with Linear Regression: analyzes customer behavior within an e-commerce setting and predict yearly customer spending based on various features using a linear regression model.

Awesome Lists containing this project

README

        

Project Name: E-commerce Customer Analysis with Linear Regression


README



Project Purpose


In this model, we are predicting how much an e-commerce customer will spend in a year using data like their time spent on the website and how long they've been a member. We load and explore the data, select the most relevant factors (features), and build a linear regression model to make predictions. We then evaluate the model’s accuracy using error metrics, visualize the results, and interpret which features have the most impact on spending. The goal is to create a model that can predict future spending based on customer behavior.




Data Requirements


Ensure that the dataset ecommerce.csv is in the same directory as the code file. The dataset can be downloaded from the repository or from Kaggle if not already included.




Procedure Overview




  1. Data Loading & Exploration: Load the dataset, examine the structure, and perform initial statistical analyses. Visualize key relationships between features and target variables to gain insights.


  2. Feature Engineering and Model Selection: Select relevant features based on correlation analysis and apply a linear regression model using scikit-learn to predict the target variable.


  3. Model Evaluation: Assess model performance using metrics like Mean Absolute Error, Mean Squared Error, and Root Mean Squared Error. Visualize predictions and residuals to analyze the model's performance.


  4. Interpretation and Insights: Interpret model coefficients to understand feature importance. Assess residual distribution to ensure model assumptions hold.




Step-by-Step Guide


Step 1: Import Libraries




  • Pandas - data handling


  • Matplotlib & Seaborn - visualization


  • Scikit-learn - machine learning


  • SciPy - statistical analysis


import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import scipy.stats as stats

Step 2: Data Loading & Initial Exploration


Load the data and check the structure:


df = pd.read_csv('ecommerce.csv')

df.head()

Step 3: Exploratory Data Analysis (EDA)


Visualize relationships with joint plots and pair plots:


sns.jointplot(x='Time on Website', y='Yearly Amount Spent', data=df, alpha=0.5)

sns.pairplot(df, plot_kws={'alpha': 0.4})

Step 4: Data Splitting & Model Training


Split data and train the model:


x = df[['Avg. Session Length', 'Time on App', 'Time on Website', 'Length of Membership']]

y = df['Yearly Amount Spent']
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)
lm = LinearRegression()
lm.fit(X_train, y_train)

Step 5: Model Interpretation


View feature impact with model coefficients:


cdf = pd.DataFrame(lm.coef_, x.columns, columns=['Coeff'])

Step 6: Predictions and Visualization


Plot predicted values against actual values:


predictions = lm.predict(X_test)

sns.scatterplot(x=predictions, y=y_test)

Step 7: Performance Metrics


Evaluate using MAE, MSE, and RMSE:


from sklearn.metrics import mean_absolute_error, mean_squared_error

import math
print("MAE:", mean_absolute_error(y_test, predictions))
print("RMSE:", math.sqrt(mean_squared_error(y_test, predictions)))

Step 8: Residual Analysis


Verify residuals for model fit assessment:


residuals = y_test - predictions

sns.histplot(residuals, bins=30)



Results


The model shows strong predictive performance with meaningful features. Residuals follow a near-normal distribution, supporting model fit.




Applications




  • Marketing: Predict spending for targeted campaigns.


  • Customer Retention: Identify high-value customer characteristics.


  • Business Decisions: Data-driven insights for strategic planning.




Instructions to Run



  1. Ensure Python and libraries are installed.

  2. Download ecommerce.csv and place it in the project folder.

  3. Run each section in a Jupyter Notebook or compatible IDE to analyze results.