Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/divithraju/divith-raju-data-mining
This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.
https://github.com/divithraju/divith-raju-data-mining
algorthims analytics apache business client connector data dataarchitecture database dataengineering datamining datascience hadoop k-means-clustering mysql project project-repository pyspark python3 spark
Last synced: 7 days ago
JSON representation
This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.
- Host: GitHub
- URL: https://github.com/divithraju/divith-raju-data-mining
- Owner: divithraju
- Created: 2024-08-22T15:42:28.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-22T15:58:14.000Z (5 months ago)
- Last Synced: 2024-11-16T09:34:04.601Z (2 months ago)
- Topics: algorthims, analytics, apache, business, client, connector, data, dataarchitecture, database, dataengineering, datamining, datascience, hadoop, k-means-clustering, mysql, project, project-repository, pyspark, python3, spark
- Language: Python
- Homepage: https://linktr.ee/divithraju
- Size: 3.91 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Customer Segmentation Using K-Means Clustering with HDFS, MySQL, and PySpark Integration
## Overview
This project implements customer segmentation using K-Means clustering, with the results stored in both HDFS and MySQL databases. The solution leverages PySpark for efficient processing and is optimized for a big data environment.## Project Structure
- **data/**: Contains the dataset `customer_data.csv`.
- **src/**: Contains the implementation code `customer_segmentation.py`.
- **README.md**: Project documentation.## Installation
1. Clone the repository:
```bash
git clone
```
2. Install the required packages:
```bash
pip install pandas scikit-learn matplotlib mysql-connector-python hdfs pyspark
```## Usage
Run the `customer_segmentation.py` script to perform clustering and store results:
```bash
python src/customer_segmentation.py# Key Features
- Your specified HDFS path is set as `hdfs://localhost:50000/customer segmentation reult.csv`.
- The code integrates with Hadoop and PySpark, optimized for Ubuntu setup.
- The results are stored in both HDFS and MySQL.This setup provides a comprehensive solution while utilizing your big data environment.
# License
This project is licensed under the MIT License