https://github.com/abhinav330/unsupervised-learning-groping-of-schools
This Python notebook demonstrates an exploratory data analysis (EDA) and clustering exercise using the pandas, seaborn, and matplotlib libraries. The code works with a dataset called 'College_Data' and explores college-related attributes, including 'Private' status, graduation rates, and enrollment data.
https://github.com/abhinav330/unsupervised-learning-groping-of-schools
data-science kmeans-algorithm kmeans-clustering kmeans-clustering-algorithm machine-learning python unsupervised-learning
Last synced: about 2 months ago
JSON representation
This Python notebook demonstrates an exploratory data analysis (EDA) and clustering exercise using the pandas, seaborn, and matplotlib libraries. The code works with a dataset called 'College_Data' and explores college-related attributes, including 'Private' status, graduation rates, and enrollment data.
- Host: GitHub
- URL: https://github.com/abhinav330/unsupervised-learning-groping-of-schools
- Owner: Abhinav330
- Created: 2024-08-25T22:56:02.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-31T20:48:49.000Z (about 1 year ago)
- Last Synced: 2025-03-03T08:16:35.336Z (8 months ago)
- Topics: data-science, kmeans-algorithm, kmeans-clustering, kmeans-clustering-algorithm, machine-learning, python, unsupervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 6.56 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://app.codacy.com/gh/Abhinav330/Loan-Repayment-Analysis/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)








# Unsupervised-learning-Groping-of-schools
# Code Summary
TThis Python notebook demonstrates an exploratory data analysis (EDA) and clustering exercise using the pandas, seaborn, and matplotlib libraries. The code works with a dataset called 'College_Data' and explores college-related attributes, including 'Private' status, graduation rates, and enrollment data.
## Data Exploration and Visualization
The code begins by importing the necessary libraries and loading the dataset 'College_Data' using pandas. It then conducts several data exploration and visualization tasks:
- Displays the first few rows of the dataset using `df.head()`.
- Provides dataset information using `df.info()`.
- Generates summary statistics of the dataset using `df.describe()`.
- Creates scatterplots and pairplots to visualize relationships between variables, with points color-coded by 'Private' status.
## Data Preprocessing
The code performs some data preprocessing steps, including:
- Filtering and visualizing data points where 'Grad.Rate' is greater than 100. This might be a data error, and the code appears to set the 'Grad.Rate' of 'Cazenovia College' to 100.
- Using the 'KMeans' clustering algorithm from scikit-learn to cluster colleges into two groups based on their features, excluding the 'Private' column.
- Displaying a scatterplot of 'Enroll' vs. 'F.Undergrad' with points colored by cluster labels.
## Clustering
The code uses the K-means clustering algorithm to cluster colleges into two groups based on their features. It computes cluster centers and assigns each college to a cluster label. The scatterplot at the end visualizes the clustering results.
Overall, this code showcases the use of EDA and K-means clustering for analyzing college data, with a focus on distinguishing between private and non-private institutions based on various features.