{"id":15208379,"url":"https://github.com/divithraju/divith-raju-data-mining","last_synced_at":"2026-03-06T23:33:01.560Z","repository":{"id":254307083,"uuid":"846133916","full_name":"divithraju/divith-raju-Data-Mining","owner":"divithraju","description":"This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.","archived":false,"fork":false,"pushed_at":"2024-08-22T15:58:14.000Z","size":4,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-06T17:15:44.851Z","etag":null,"topics":["algorthims","analytics","apache","business","client","connector","data","dataarchitecture","database","dataengineering","datamining","datascience","hadoop","k-means-clustering","mysql","project","project-repository","pyspark","python3","spark"],"latest_commit_sha":null,"homepage":"https://linktr.ee/divithraju","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/divithraju.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-22T15:42:28.000Z","updated_at":"2024-08-25T11:30:25.000Z","dependencies_parsed_at":"2024-08-22T18:15:14.247Z","dependency_job_id":null,"html_url":"https://github.com/divithraju/divith-raju-Data-Mining","commit_stats":null,"previous_names":["divithraju/divith-raju-data-mining"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divithraju%2Fdivith-raju-Data-Mining","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divithraju%2Fdivith-raju-Data-Mining/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divithraju%2Fdivith-raju-Data-Mining/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/divithraju%2Fdivith-raju-Data-Mining/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/divithraju","download_url":"https://codeload.github.com/divithraju/divith-raju-Data-Mining/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242250926,"owners_count":20096897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorthims","analytics","apache","business","client","connector","data","dataarchitecture","database","dataengineering","datamining","datascience","hadoop","k-means-clustering","mysql","project","project-repository","pyspark","python3","spark"],"created_at":"2024-09-28T07:01:21.168Z","updated_at":"2026-03-06T23:33:01.526Z","avatar_url":"https://github.com/divithraju.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Customer Segmentation Using K-Means Clustering with HDFS, MySQL, and PySpark Integration\n\n## Overview\nThis project implements customer segmentation using K-Means clustering, with the results stored in both HDFS and MySQL databases. The solution leverages PySpark for efficient processing and is optimized for a big data environment.\n\n## Project Structure\n- **data/**: Contains the dataset `customer_data.csv`.\n- **src/**: Contains the implementation code `customer_segmentation.py`.\n- **README.md**: Project documentation.\n\n## Installation\n1. Clone the repository:\n    ```bash\n    git clone \u003crepository-link\u003e\n    ```\n2. Install the required packages:\n    ```bash\n    pip install pandas scikit-learn matplotlib mysql-connector-python hdfs pyspark\n    ```\n\n## Usage\nRun the `customer_segmentation.py` script to perform clustering and store results:\n```bash\npython src/customer_segmentation.py\n\n\n# Key Features\n\n- Your specified HDFS path is set as `hdfs://localhost:50000/customer segmentation reult.csv`.\n- The code integrates with Hadoop and PySpark, optimized for Ubuntu setup.\n- The results are stored in both HDFS and MySQL.\n\nThis setup provides a comprehensive solution while utilizing your big data environment.\n\n\n# License\nThis project is licensed under the MIT License\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivithraju%2Fdivith-raju-data-mining","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdivithraju%2Fdivith-raju-data-mining","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdivithraju%2Fdivith-raju-data-mining/lists"}