{"id":18603971,"url":"https://github.com/farzeennimran/kmeans-clustering","last_synced_at":"2026-04-10T07:13:54.335Z","repository":{"id":244635206,"uuid":"815814891","full_name":"farzeennimran/Kmeans-Clustering","owner":"farzeennimran","description":null,"archived":false,"fork":false,"pushed_at":"2024-06-16T08:54:53.000Z","size":3100,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-02T09:06:30.911Z","etag":null,"topics":["clustering","clustering-algorithm","data-science","dataanalysis","datapreprocessing","kmeans-clustering","machine-learning","numpy","pandas","python","sklearn","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/farzeennimran.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-16T08:42:05.000Z","updated_at":"2024-06-22T06:18:02.000Z","dependencies_parsed_at":"2024-06-16T09:45:54.625Z","dependency_job_id":"8b2345f4-d4d0-4e3d-b046-bcca15431419","html_url":"https://github.com/farzeennimran/Kmeans-Clustering","commit_stats":null,"previous_names":["farzeennimran/kmeans-clustering"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/farzeennimran/Kmeans-Clustering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farzeennimran%2FKmeans-Clustering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farzeennimran%2FKmeans-Clustering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farzeennimran%2FKmeans-Clustering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farzeennimran%2FKmeans-Clustering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/farzeennimran","download_url":"https://codeload.github.com/farzeennimran/Kmeans-Clustering/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farzeennimran%2FKmeans-Clustering/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262981086,"owners_count":23394461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","clustering-algorithm","data-science","dataanalysis","datapreprocessing","kmeans-clustering","machine-learning","numpy","pandas","python","sklearn","visualization"],"created_at":"2024-11-07T02:16:06.228Z","updated_at":"2026-04-10T07:13:54.308Z","avatar_url":"https://github.com/farzeennimran.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kmeans Clustering\n\n## Introduction\n\nThis repository contains a K-means clustering project aimed at grouping similar data points based on their features. The main goal of this project is to demonstrate the implementation of the K-means clustering algorithm using a real-world dataset. The project includes data preprocessing, visualization, and clustering steps to provide a comprehensive understanding of the clustering process.\n\n## What is K-means Clustering?\n\nK-means clustering is an unsupervised machine learning algorithm used to partition a dataset into K distinct, non-overlapping clusters. Each cluster is defined by its centroid, which is the mean of the data points in that cluster. The algorithm aims to minimize the within-cluster variance, making the clusters as compact and distinct as possible.\n\n## How K-means Clustering Works\n\n1. **Initialization**: Select K initial centroids randomly from the dataset.\n2. **Assignment**: Assign each data point to the nearest centroid, forming K clusters.\n3. **Update**: Recalculate the centroids of the clusters by taking the mean of all data points in each cluster.\n4. **Repeat**: Repeat the assignment and update steps until the centroids no longer change significantly or a maximum number of iterations is reached.\n\n## Explanation of the Code\n\n### Importing Libraries\n\nWe start by importing the necessary libraries: `pandas` for data manipulation, `matplotlib` and `seaborn` for data visualization, and `scikit-learn` for machine learning algorithms.\n\n### Loading the Dataset\n\nThe dataset is loaded into a DataFrame using `pandas`. The dataset contains various features that will be used for clustering.\n\n### Data Preprocessing\n\nWe perform data preprocessing by dropping columns that are not needed for clustering. This helps in reducing noise and improving the clustering results.\n\n### Data Normalization`\n\nWe normalize the data using Min-Max scaling to ensure that all features contribute equally to the clustering process.\n\n### Handling Missing Values\n\n\nWe fill any remaining missing values with the median of each column to maintain data integrity.\n\n### Data Visualization\n\nWe visualize the data to understand its distribution. Here, we plot the latitude and longitude ranges to see the geographical distribution of data points.\n\n### K-means Clustering\n\nWe apply the K-means clustering algorithm to cluster the data based on latitude and longitude ranges. The results are visualized using a scatter plot, where different colors represent different clusters.\n\n### Activity Distribution Visualization\n\nFinally, we visualize the distribution of various activities in the dataset using a pie chart.\n\n## Conclusion\n\nThis project demonstrates the implementation of K-means clustering on a real-world dataset. The process includes data preprocessing, normalization, clustering, and visualization to provide insights into the data. The code can be further extended and customized for different datasets and clustering requirements.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarzeennimran%2Fkmeans-clustering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffarzeennimran%2Fkmeans-clustering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarzeennimran%2Fkmeans-clustering/lists"}