https://github.com/khushneetsingh/datasanity
DataSanity is a AI-powered web application for dataset cleaning, synthetic data generation, vectorization, and data enrichment using natural language prompts.
https://github.com/khushneetsingh/datasanity
cerebrus exa faiss-vector-database llm nextjs numpy pandas serperdev sqlite tailwindcss
Last synced: 4 months ago
JSON representation
DataSanity is a AI-powered web application for dataset cleaning, synthetic data generation, vectorization, and data enrichment using natural language prompts.
- Host: GitHub
- URL: https://github.com/khushneetsingh/datasanity
- Owner: KhushneetSingh
- License: apache-2.0
- Created: 2025-10-05T20:48:20.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-10-05T22:15:58.000Z (4 months ago)
- Last Synced: 2025-10-05T23:28:34.613Z (4 months ago)
- Topics: cerebrus, exa, faiss-vector-database, llm, nextjs, numpy, pandas, serperdev, sqlite, tailwindcss
- Language: JavaScript
- Homepage:
- Size: 10.3 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DataSanity
An AI-powered web application for dataset cleaning, synthetic data generation, vectorization, and data enrichment using natural language prompts.
## Features
- Dataset cleaning with LLM detection of noisy, missing, or duplicate values
- Synthetic data generation based on schema or prompt
- Vectorization for RAG pipelines
- Data enrichment using web search APIs
- Natural language prompt-based workflow
- Support for CSV uploads and downloads
## Tech Stack
- Frontend: Next.js with Tailwind CSS
- Backend: FastAPI (Python)
- LLM Inference: Cerebras API
- Data Processing: pandas, numpy
- Embedding: sentence-transformers
- Vector Store: FAISS
- Web Search: Exa or Serper.dev
- Storage: SQLite + local filesystem