https://github.com/datamine/mnist-k-means-clustering

K-Means Clustering to Identify Handwritten Digits
https://github.com/datamine/mnist-k-means-clustering

Last synced: 11 months ago
JSON representation

K-Means Clustering to Identify Handwritten Digits

Host: GitHub
URL: https://github.com/datamine/mnist-k-means-clustering
Owner: Datamine
Created: 2016-02-26T04:44:26.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2017-01-04T09:39:40.000Z (over 9 years ago)
Last Synced: 2024-12-30T01:41:55.862Z (over 1 year ago)
Language: Jupyter Notebook
Homepage:
Size: 22.3 MB
Stars: 46
Watchers: 6
Forks: 27
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# MNIST-K-Means-Clustering
Using K-Means Clustering to Identify Handwritten Digits

Uncompress the .tar.gz archive to get the `digits.base64.json` dataset, which you'll need. (`tar -xzvf digits.base64.json.tar.gz`)

Design decision: the clustering algorithm is designed to train on labelled data. However, I've written it in such a way that it's easy to
change to unlabelled data -- I considered making it modular for labelled/unlabelled data, but the more I think about it, the less I'm convinced
of the utility of having a k-means clustering algorithm for unlabelled training data. (If your data is unlabelled, you can just place a dummy label on every datapoint.)

Inspired by a homework assignment in John Lafferty's [Large-Scale Data Analysis](https://galton.uchicago.edu/~lafferty/37601-syllabus.pdf) course that I took at UChicago in the Spring of 2015. I collaborated with Elliott Ding on that assignment. In the class, we used distributed systems via AWS and Apache Spark, parallellized code, and did most analysis using map-reduce. To make the computational statistics more accessible, I've rewritten this notebook to not use distributed techniques.

-----

See my blog post on this project [here](http://johnloeber.com/docs/kmeans.html).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datamine/mnist-k-means-clustering

Awesome Lists containing this project

README