Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mdh266/kmeans

Creating A Scikit-Learn Compatable Clustering Algorithm
https://github.com/mdh266/kmeans

algorithms clustering data-science machine-learning machine-learning-algorithms scikit-learn unsupervised-learning

Last synced: about 2 months ago
JSON representation

Creating A Scikit-Learn Compatable Clustering Algorithm

Awesome Lists containing this project

README

        

# Writing A Scikit Learn Compatible Clustering Algorithm
-----------------------

## About
---------
In this post, I will go over how to write a K-means clustering algorithm from scratch using [NumPy](https://numpy.org/). The algorithm will be explained in the next section and while seamingly simple, it can be tricky to implement efficiently! As an added bonus, I will go over how to implement a [Scikit-Learn](https://scikit-learn.org/stable/) compatible clustering algorithm so that we can using Scikit-Learn's framework including [Pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) and [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

## Using The Notebook
----------
You can install the dependencies and access the notebook using Docker by building the Docker image with the following:

docker build -t kmeans .

Followed by running the command container:

docker run -ip 8888:8888 -v `pwd`:/home/jovyan -t kmeans

See here for more info.

Otherwise without Docker, make sure to use Python 3.9 and install the libraries listed in requirements.txt. These can be installed with the command,

pip install -r requirements.txt