https://github.com/josepablodmg/python---unsupervised-learning-for-breast-cancer-diagnosis-patterns

This project explores unsupervised learning techniques on the Breast Cancer dataset. We used K-means and K-medoids clustering to detect patterns in tumor features, and apply PCA for dimensionality reduction and visualization. We compared clustering methods using silhouette scores and interprets the clusters in the context of tumor malignancy.
https://github.com/josepablodmg/python---unsupervised-learning-for-breast-cancer-diagnosis-patterns

breast-cancer clustering dimensionality-reduction k-means-clustering k-medoids machine-learning pattern-detection pca-analysis python sklearn unsupervised-learning

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/josepablodmg/python---unsupervised-learning-for-breast-cancer-diagnosis-patterns
Owner: josepablodmg
Created: 2025-09-24T12:00:14.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-09-24T12:05:36.000Z (10 months ago)
Last Synced: 2025-09-24T14:12:08.762Z (10 months ago)
Topics: breast-cancer, clustering, dimensionality-reduction, k-means-clustering, k-medoids, machine-learning, pattern-detection, pca-analysis, python, sklearn, unsupervised-learning
Language: Jupyter Notebook
Homepage:
Size: 329 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Unsupervised Learning on Breast Cancer Dataset

## Overview
This project demonstrates the application of unsupervised machine learning techniques on the Breast Cancer dataset. The goals are to explore patterns in tumor features, reduce dimensionality for visualization, and compare clustering methods.

## Techniques Used
- **K-means Clustering**: Partition data into clusters based on centroids.
- **K-medoids Clustering**: A robust clustering method using actual data points (medoids) as cluster centers.
- **PCA (Principal Component Analysis)**: Reduces dimensionality and visualizes high-dimensional data.

## Dataset
The dataset contains features extracted from breast cancer cell nuclei, including radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. Two classes: **Malignant (M)** and **Benign (B)**.

## Project Structure
- `project.ipynb`: Jupyter notebook with code and analysis.
- `data.csv`: Dataset used in the project.

## Key Findings
- Optimal number of clusters for both K-means and K-medoids is **2**, aligning with the two tumor types.
- K-means achieved a higher silhouette score than K-medoids, indicating better clustering performance.
- PCA visualization shows clear separation of clusters corresponding to tumor malignancy.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/josepablodmg/python---unsupervised-learning-for-breast-cancer-diagnosis-patterns

Awesome Lists containing this project

README