https://github.com/josepablodmg/python---unsupervised-learning-for-breast-cancer-diagnosis-patterns
This project explores unsupervised learning techniques on the Breast Cancer dataset. We used K-means and K-medoids clustering to detect patterns in tumor features, and apply PCA for dimensionality reduction and visualization. We compared clustering methods using silhouette scores and interprets the clusters in the context of tumor malignancy.
https://github.com/josepablodmg/python---unsupervised-learning-for-breast-cancer-diagnosis-patterns
breast-cancer clustering dimensionality-reduction k-means-clustering k-medoids machine-learning pattern-detection pca-analysis python sklearn unsupervised-learning
Last synced: about 1 month ago
JSON representation
This project explores unsupervised learning techniques on the Breast Cancer dataset. We used K-means and K-medoids clustering to detect patterns in tumor features, and apply PCA for dimensionality reduction and visualization. We compared clustering methods using silhouette scores and interprets the clusters in the context of tumor malignancy.
- Host: GitHub
- URL: https://github.com/josepablodmg/python---unsupervised-learning-for-breast-cancer-diagnosis-patterns
- Owner: josepablodmg
- Created: 2025-09-24T12:00:14.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-24T12:05:36.000Z (9 months ago)
- Last Synced: 2025-09-24T14:12:08.762Z (9 months ago)
- Topics: breast-cancer, clustering, dimensionality-reduction, k-means-clustering, k-medoids, machine-learning, pattern-detection, pca-analysis, python, sklearn, unsupervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 329 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Unsupervised Learning on Breast Cancer Dataset
## Overview
This project demonstrates the application of unsupervised machine learning techniques on the Breast Cancer dataset. The goals are to explore patterns in tumor features, reduce dimensionality for visualization, and compare clustering methods.
## Techniques Used
- **K-means Clustering**: Partition data into clusters based on centroids.
- **K-medoids Clustering**: A robust clustering method using actual data points (medoids) as cluster centers.
- **PCA (Principal Component Analysis)**: Reduces dimensionality and visualizes high-dimensional data.
## Dataset
The dataset contains features extracted from breast cancer cell nuclei, including radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. Two classes: **Malignant (M)** and **Benign (B)**.
## Project Structure
- `project.ipynb`: Jupyter notebook with code and analysis.
- `data.csv`: Dataset used in the project.
## Key Findings
- Optimal number of clusters for both K-means and K-medoids is **2**, aligning with the two tumor types.
- K-means achieved a higher silhouette score than K-medoids, indicating better clustering performance.
- PCA visualization shows clear separation of clusters corresponding to tumor malignancy.