An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with document-clustering

A curated list of projects in awesome lists tagged with document-clustering .

https://github.com/taki0112/vector_similarity

Python, Java implementation of TS-SS called from "A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering"

document-clustering vector-similarity

Last synced: 17 Jun 2025

https://github.com/maxoodf/tgnews

Telegram Data Clustering Contest (Bossy Gnu's submission )

cpp document-clustering document-embedding document-similarity nlp nlp-machine-learning telegram word2vec

Last synced: 03 Apr 2025

https://github.com/sethuiyer/document-clusterer

Document clustering using PCA from scratch using numpy and scipy.

corpus document-clustering

Last synced: 30 Apr 2025

https://github.com/sidmishraw/scp

A data processing pipeline for text-mining on contents extracted from PDFs using Apriori and Simplicial Complex algorithms

apriori-algorithm association-rules docpruner document-clustering pdf-processor simplicial-complex simplicialcomplex text-mining

Last synced: 19 Apr 2026

https://github.com/surajiyer/multi-view-clustering-ensemble

Multi-view document clustering via ensemble method [https://link.springer.com/article/10.1007/s10844-014-0307-6]

clustering document-clustering ensemble multiview-clustering

Last synced: 29 Mar 2025

https://github.com/adhiiisetiawan/document-clustering

Document clustering system for thesis document using Self Organizing Maps algorithm

document-clustering neural-network self-organizing-map

Last synced: 11 Jul 2025

https://github.com/tdiprima/digital_vault

RAG-powered file organizer using sentence-transformers and KMeans clustering with a Gradio chatbot for semantic document search

document-clustering gradio retrieval-augmented-generation semantic-search-ai sentence-transformers

Last synced: 28 May 2026

https://github.com/hazim-hf/unstructured-data-analysis

This repository focuses on methods for compiling, summarizing, and analyzing unstructured and semi-structured data, including text, images, and audio. The course covers algorithms and techniques for mining and exploring unstructured data using suitable tools and packages. Applications such as sentiment analysis, document clustering, and information

document-clustering sentiment-analysis

Last synced: 16 Feb 2026

https://github.com/rohanag03/document-clustering-topic-modeling

This project applies K-means and LDA to the Twenty Newsgroups dataset to group similar documents and discover underlying topics. Explore clustering and topic modeling techniques for organizing and understanding text data.

data-science document-clustering k-means-clustering lda twenty-newsgroup

Last synced: 20 Aug 2025