An open API service indexing awesome lists of open source software.

https://github.com/aryanxxvii/lda-document-topic-modelling

Unsupervised clustering of articles with real-time similarity search using Flask and Redis.
https://github.com/aryanxxvii/lda-document-topic-modelling

Last synced: 7 months ago
JSON representation

Unsupervised clustering of articles with real-time similarity search using Flask and Redis.

Awesome Lists containing this project

README

          

LDA-Based Unsupervised Document Topic Clustering

Project Overview


This project uses Latent Dirichlet Allocation (LDA) to cluster over 90,000 CNN news articles into topics. It features a Flask backend API for real-time similarity search, allowing users to find the top-K most similar articles based on an input article. Redis is used for caching to improve search speed. The project is dockerized for easy deployment.

Tech Stack




  • Python - Flask - Backend web framework


  • Gensim, spaCy - Topic modeling with LDA


  • Redis - Caching for faster similarity search


  • Docker - Containerization for deployment

Key Features




  • LDA Topic Clustering: Clusters documents into topics using Gensim's LDA.


  • Similarity Search: Real-time search for top-K most similar articles based on LDA topics.


  • Caching: Redis caching for fast search results.

How to Run



  1. Clone the repository.

  2. Navigate to the project directory.

  3. Run the project using Docker:
    docker-compose up --build


API Endpoint




  • POST /api/similarity_search:

    • Input: Article text.

    • Output: Top-K most similar articles.



Installation


To run without Docker:


pip install -r requirements.txt

flask run

Access the API at http://127.0.0.1:5000.