https://github.com/atomworkplace/gitrag
This project is a RAG-based AI chat application using LangChain and OpenAI, featuring codebase analysis and a hierarchical file structure visualization for GitHub repositories.
https://github.com/atomworkplace/gitrag
code-analysis docker langchain pineconedb postgresql python rag-chatbot reactjs solo-project
Last synced: 3 months ago
JSON representation
This project is a RAG-based AI chat application using LangChain and OpenAI, featuring codebase analysis and a hierarchical file structure visualization for GitHub repositories.
- Host: GitHub
- URL: https://github.com/atomworkplace/gitrag
- Owner: ATOMworkplace
- Created: 2025-07-08T12:41:27.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2025-07-10T16:54:17.000Z (3 months ago)
- Last Synced: 2025-07-10T18:54:17.609Z (3 months ago)
- Topics: code-analysis, docker, langchain, pineconedb, postgresql, python, rag-chatbot, reactjs, solo-project
- Language: JavaScript
- Homepage: https://git-rag.vercel.app
- Size: 6.86 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# gitRAG
**RAG-based GitHub Repo Analysis Platform**
*Analyse any public GitHub repository with LLM-powered chat and advanced semantic search.*---
https://github.com/user-attachments/assets/99065742-a793-4ec5-8bb5-231f37d3d50e
## ⭐ Overview
### **Situation**
As a participant in open-source competitions and project exhibitions (EPICS, university projects), I often struggled to deeply understand large codebases—especially when onboarding new repositories from group members or exploring unfamiliar open-source projects. Sifting through thousands of files, dependencies, and scattered documentation was **tedious and overwhelming**, making it hard to answer even basic questions like "Where is X implemented?" or "How does this module work?"### **Task**
I needed a platform that would let me:
- Instantly chat with any GitHub repo to ask questions about code, architecture, or logic.
- Quickly visualize and explore repo structure, file contents, and metadata.
- Perform semantic code search (not just by filename/text).
- Support multiple users and projects securely for my team and in competitions.### **Action**
I independently designed and built **gitRAG**—an end-to-end, multi-tenant platform that ingests any public GitHub repo, chunks and indexes its code using embeddings and vector search, and enables users to interactively chat, search, and analyse codebases using a modern LLM (via LangChain and OpenAI API).- **Built secure, scalable backend** using FastAPI, PostgreSQL (Aiven), PineconeDB, and LangChain.
- **Developed a modern React frontend** with hierarchical file explorer, real-time AI chat, and repo analytics.
- **Integrated Google/GitHub OAuth2** for authentication, and per-user encrypted API key management for privacy.
- **Engineered ingestion pipelines** to chunk, embed, and index 50MB+ codebases with 10,000+ files.
- **Tested and deployed** the platform on multiple real-world repos for open-source events and university project groups.### **Result**
- Significantly reduced onboarding time for new repositories—now get context, explanations, and code Q&A in seconds.
- Enabled my team and myself to confidently tackle larger, more complex projects in hackathons and coursework.
- gitRAG is now a robust, reusable tool for anyone needing rapid understanding of unfamiliar codebases.---
## 🚀 Features
- **LLM-powered code chat:** Ask questions about repo structure, functions, or files—get contextual, AI-driven answers.
- **Semantic code search:** Find relevant code snippets using meaning, not just keywords.
- **Hierarchical file explorer:** Browse and preview the full repo tree with metadata and analytics.
- **Multi-user & multi-repo support:** Secure, per-user data isolation with Google/GitHub OAuth2.
- **Repo analytics:** Visualize language breakdown, file types, contributors, and more.
- **Encrypted API key management:** User API keys are encrypted and never exposed.
- **Blazing fast:** Sub-second query responses (vector search and retrieval).
- **Modern UI:** Built with React, TailwindCSS, and Three.js (for 3D hero effect).---
## 🛠️ Tech Stack
- **Frontend:** React.js, TailwindCSS, Vite, Three.js
- **Backend:** FastAPI (Python), LangChain, PostgreSQL (Aiven), PineconeDB
- **AI/Vector Search:** OpenAI API, PineconeDB, LangChain
- **Auth:** Google OAuth2, GitHub OAuth2
- **Integrations:** GitHub API (repo fetching, metadata), Node.js (utility scripts)---
## 📷 Demo
---
## ⚡ How it Works (RAG Pipeline)
1. **Login** with Google or GitHub OAuth2 (secure, per-user).
2. **Paste any public GitHub repo URL** and your OpenAI API key (encrypted).
3. **Ingestion:**
- Fetches repo files via GitHub API
- Chunks code using custom logic (by file type/size)
- Generates vector embeddings (LangChain + OpenAI API)
- Stores chunks and metadata in PineconeDB and PostgreSQL
4. **Analysis & Chat:**
- Use AI chat to ask any question about the repo (“What does X function do?” “Show me auth logic”)
- Semantic search finds and retrieves the most relevant code chunks
- LLM (via LangChain) generates contextual, accurate answers using retrieved code
5. **Explore:**
- Hierarchical explorer shows real file tree, lets you preview content and metadata
- Repo analytics panel for high-level insights---
## 🧩 Architecture
## ✨ Example Use Cases
- **Hackathons/open-source events:** Instantly understand any team repo or competition project.
- **University coursework:** Quickly onboard and analyze group project submissions.
- **Personal learning:** Explore popular open-source projects by chatting and searching their code.
- **Team code reviews:** Get instant explanations and context for PRs and legacy code.