Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/harris-giki/e-comdataanalysis_ml

E-commerce Customer Analysis with Linear Regression: analyzes customer behavior within an e-commerce setting and predict yearly customer spending based on various features using a linear regression model.
https://github.com/harris-giki/e-comdataanalysis_ml

development ecommerce linear-regression machine-learning model prediction-model python scikit-learn

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/harris-giki/e-comdataanalysis_ml
Owner: Harris-giki
Created: 2024-11-06T14:05:36.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-11-11T18:47:24.000Z (about 2 months ago)
Last Synced: 2024-11-23T07:07:30.935Z (about 2 months ago)
Topics: development, ecommerce, linear-regression, machine-learning, model, prediction-model, python, scikit-learn
Language: Jupyter Notebook
Homepage: https://ecom-data-analysis.streamlit.app/
Size: 864 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        
    
Project Name: E-commerce Customer Analysis with Linear Regression

    README

    

        Project Purpose

        In this model, we are predicting how much an e-commerce customer will spend in a year using data like their time spent on the website and how long they've been a member. We load and explore the data, select the most relevant factors (features), and build a linear regression model to make predictions. We then evaluate the model’s accuracy using error metrics, visualize the results, and interpret which features have the most impact on spending. The goal is to create a model that can predict future spending based on customer behavior.

    

    

        Data Requirements

        Ensure that the dataset ecommerce.csv is in the same directory as the code file. The dataset can be downloaded from the repository or from Kaggle if not already included.

    

    

        Procedure Overview

        

            

Data Loading & Exploration: Load the dataset, examine the structure, and perform initial statistical analyses. Visualize key relationships between features and target variables to gain insights.

            

Feature Engineering and Model Selection: Select relevant features based on correlation analysis and apply a linear regression model using scikit-learn to predict the target variable.

            

Model Evaluation: Assess model performance using metrics like Mean Absolute Error, Mean Squared Error, and Root Mean Squared Error. Visualize predictions and residuals to analyze the model's performance.

            

Interpretation and Insights: Interpret model coefficients to understand feature importance. Assess residual distribution to ensure model assumptions hold.

        

    

    

        Step-by-Step Guide

        Step 1: Import Libraries

        

            

Pandas - data handling

            

Matplotlib & Seaborn - visualization

            

Scikit-learn - machine learning

            

SciPy - statistical analysis

        

        import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

import scipy.stats as stats

        Step 2: Data Loading & Initial Exploration

        Load the data and check the structure:

        df = pd.read_csv('ecommerce.csv')

df.head()

        Step 3: Exploratory Data Analysis (EDA)

        Visualize relationships with joint plots and pair plots:

        sns.jointplot(x='Time on Website', y='Yearly Amount Spent', data=df, alpha=0.5)

sns.pairplot(df, plot_kws={'alpha': 0.4})

        Step 4: Data Splitting & Model Training

        Split data and train the model:

        x = df[['Avg. Session Length', 'Time on App', 'Time on Website', 'Length of Membership']]

y = df['Yearly Amount Spent']

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)

lm = LinearRegression()

lm.fit(X_train, y_train)

        Step 5: Model Interpretation

        View feature impact with model coefficients:

        cdf = pd.DataFrame(lm.coef_, x.columns, columns=['Coeff'])

        Step 6: Predictions and Visualization

        Plot predicted values against actual values:

        predictions = lm.predict(X_test)

sns.scatterplot(x=predictions, y=y_test)

        Step 7: Performance Metrics

        Evaluate using MAE, MSE, and RMSE:

        from sklearn.metrics import mean_absolute_error, mean_squared_error

import math

print("MAE:", mean_absolute_error(y_test, predictions))

print("RMSE:", math.sqrt(mean_squared_error(y_test, predictions)))

        Step 8: Residual Analysis

        Verify residuals for model fit assessment:

        residuals = y_test - predictions

sns.histplot(residuals, bins=30)

    

    

        Results

        The model shows strong predictive performance with meaningful features. Residuals follow a near-normal distribution, supporting model fit.

    

    

        Applications

        

            

Marketing: Predict spending for targeted campaigns.

            

Customer Retention: Identify high-value customer characteristics.

            

Business Decisions: Data-driven insights for strategic planning.

        

    

    

        Instructions to Run

        

            Ensure Python and libraries are installed.

            Download ecommerce.csv and place it in the project folder.

            Run each section in a Jupyter Notebook or compatible IDE to analyze results.