https://github.com/divithraju/divith-raju-data-mining

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.
https://github.com/divithraju/divith-raju-data-mining

algorthims analytics apache business client connector data dataarchitecture database dataengineering datamining datascience hadoop k-means-clustering mysql project project-repository pyspark python3 spark

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/divithraju/divith-raju-data-mining
Owner: divithraju
Created: 2024-08-22T15:42:28.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-22T15:58:14.000Z (11 months ago)
Last Synced: 2025-03-06T17:15:44.851Z (4 months ago)
Topics: algorthims, analytics, apache, business, client, connector, data, dataarchitecture, database, dataengineering, datamining, datascience, hadoop, k-means-clustering, mysql, project, project-repository, pyspark, python3, spark
Language: Python
Homepage: https://linktr.ee/divithraju
Size: 3.91 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Customer Segmentation Using K-Means Clustering with HDFS, MySQL, and PySpark Integration

## Overview
This project implements customer segmentation using K-Means clustering, with the results stored in both HDFS and MySQL databases. The solution leverages PySpark for efficient processing and is optimized for a big data environment.

## Project Structure
- **data/**: Contains the dataset `customer_data.csv`.
- **src/**: Contains the implementation code `customer_segmentation.py`.
- **README.md**: Project documentation.

## Installation
1. Clone the repository:
```bash
git clone
```
2. Install the required packages:
```bash
pip install pandas scikit-learn matplotlib mysql-connector-python hdfs pyspark
```

## Usage
Run the `customer_segmentation.py` script to perform clustering and store results:
```bash
python src/customer_segmentation.py

# Key Features

- Your specified HDFS path is set as `hdfs://localhost:50000/customer segmentation reult.csv`.
- The code integrates with Hadoop and PySpark, optimized for Ubuntu setup.
- The results are stored in both HDFS and MySQL.

This setup provides a comprehensive solution while utilizing your big data environment.

# License
This project is licensed under the MIT License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/divithraju/divith-raju-data-mining

Awesome Lists containing this project

README