An open API service indexing awesome lists of open source software.

https://github.com/sethuiyer/document-clusterer

Document clustering using PCA from scratch using numpy and scipy.
https://github.com/sethuiyer/document-clusterer

corpus document-clustering

Last synced: 8 months ago
JSON representation

Document clustering using PCA from scratch using numpy and scipy.

Awesome Lists containing this project

README

          

# Document-Clusterer

A simple document cluster using single value decomposition on a corpus of CNN-stories.

*cleaning.py*: Processes the directory of cnn-stories and produces a useful json file

*model.py*: Main program which does the clustering

#TODO
Make a blog post explaining about the same