https://github.com/aryanxxvii/lda-document-topic-modelling
Unsupervised clustering of articles with real-time similarity search using Flask and Redis.
https://github.com/aryanxxvii/lda-document-topic-modelling
Last synced: 7 months ago
JSON representation
Unsupervised clustering of articles with real-time similarity search using Flask and Redis.
- Host: GitHub
- URL: https://github.com/aryanxxvii/lda-document-topic-modelling
- Owner: aryanxxvii
- Created: 2024-08-18T17:35:10.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-06T13:22:54.000Z (12 months ago)
- Last Synced: 2025-01-30T17:08:13.058Z (9 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 4.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
LDA-Based Unsupervised Document Topic Clustering
Project Overview
This project uses Latent Dirichlet Allocation (LDA) to cluster over 90,000 CNN news articles into topics. It features a Flask backend API for real-time similarity search, allowing users to find the top-K most similar articles based on an input article. Redis is used for caching to improve search speed. The project is dockerized for easy deployment.
Tech Stack
-
Python - Flask - Backend web framework -
Gensim, spaCy - Topic modeling with LDA -
Redis - Caching for faster similarity search -
Docker - Containerization for deployment
Key Features
-
LDA Topic Clustering: Clusters documents into topics using Gensim's LDA. -
Similarity Search: Real-time search for top-K most similar articles based on LDA topics. -
Caching: Redis caching for fast search results.
How to Run
- Clone the repository.
- Navigate to the project directory.
- Run the project using Docker:
docker-compose up --build
API Endpoint
-
POST /api/similarity_search:
- Input: Article text.
- Output: Top-K most similar articles.
Installation
To run without Docker:
pip install -r requirements.txt
flask run
Access the API at http://127.0.0.1:5000
.