Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/niteshchawla/clustering-ml
Analyzing the vast data of learners can uncover patterns in their professional backgrounds and preferences. Allowing Scaler to make tailored content recommendations and provide specialized mentorship.
https://github.com/niteshchawla/clustering-ml
cluster-analysis clustering hierarchical-clustering k-means-clustering machine-learning numpy pca-analysis visualisation
Last synced: 8 days ago
JSON representation
Analyzing the vast data of learners can uncover patterns in their professional backgrounds and preferences. Allowing Scaler to make tailored content recommendations and provide specialized mentorship.
- Host: GitHub
- URL: https://github.com/niteshchawla/clustering-ml
- Owner: Niteshchawla
- Created: 2024-07-16T17:09:05.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-07-16T17:11:24.000Z (4 months ago)
- Last Synced: 2024-07-16T21:08:13.111Z (4 months ago)
- Topics: cluster-analysis, clustering, hierarchical-clustering, k-means-clustering, machine-learning, numpy, pca-analysis, visualisation
- Language: Jupyter Notebook
- Homepage:
- Size: 12.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Clustering-ML
**Problem Statement**Scaler is an online tech-versity offering intensive computer science & Data Science courses through live classes delivered by tech leaders and subject matter experts. The meticulously structured program enhances the skills of software professionals by offering a modern curriculum with exposure to the latest technologies. It is a product by InterviewBit.
You are working as a data scientist with the analytics vertical of Scaler, focused on profiling the best companies and job positions to work for from the Scaler database. You are provided with the information for a segment of learners and tasked to cluster them on the basis of their job profile, company, and other features. Ideally, these clusters should have similar characteristics.
**Dataset:**
Dataset Link: scaler_kmeans.csv
**Data Dictionary:**
‘Unnamed 0’ - Index of the dataset
Email_hash - Anonymised Personal Identifiable Information (PII)
Company_hash - This represents an anonymized identifier for the company, which is the current employer of the learner.
orgyear - Employment start date
CTC - Current CTC
Job_position - Job profile in the company
CTC_updated_year - Year in which CTC got updated (Yearly increments, Promotions)
**Concept Used:**
Manual Clustering
Unsupervised Clustering - K- means, Hierarchical Clustering