An open API service indexing awesome lists of open source software.

https://github.com/subhadarship/text-clustering

Clustering text data (data mining fall 2019)
https://github.com/subhadarship/text-clustering

bert clustering glove-embeddings lda nlp roberta topic-modeling visualization

Last synced: 7 months ago
JSON representation

Clustering text data (data mining fall 2019)

Awesome Lists containing this project

README

          

# Text Clustering

![Alt Text](https://media.giphy.com/media/VSYyrK0MzKSyI/giphy.gif)

clustering text data (data mining fall 2019)

## Notes
- [`present.html`](https://github.com/subhadarship/text-clustering/blob/master/present.html) is created using revel.js
- [`topic_modeling_big.ipynb`](https://github.com/subhadarship/text-clustering/blob/master/topic_modeling_big.ipynb) uses more data for LDA than [`topic_modeling.ipynb`](https://github.com/subhadarship/text-clustering/blob/master/topic_modeling.ipynb). But it still cannot handle very large data (e.g. 1M)

## TODO
- [x] big data LDA (still less than a million samples)
- [x] visualize results LDA
- [x] big data neural models (check k-means time)
- [x] find optimum number of clusters for neural models
- [x] visualize results neural models

## Write-up
see [`text_clustering.pdf`](https://github.com/subhadarship/text-clustering/blob/master/text_clustering.pdf)

## LICENSE

[MIT](https://github.com/subhadarship/text-clustering/tree/master/LICENSE)