https://github.com/atomworkplace/gitrag

This project is a RAG-based AI chat application using LangChain and OpenAI, featuring codebase analysis and a hierarchical file structure visualization for GitHub repositories.
https://github.com/atomworkplace/gitrag

code-analysis docker langchain pineconedb postgresql python rag-chatbot reactjs solo-project

Last synced: 3 months ago
JSON representation

This project is a RAG-based AI chat application using LangChain and OpenAI, featuring codebase analysis and a hierarchical file structure visualization for GitHub repositories.

Host: GitHub
URL: https://github.com/atomworkplace/gitrag
Owner: ATOMworkplace
Created: 2025-07-08T12:41:27.000Z (3 months ago)
Default Branch: master
Last Pushed: 2025-07-10T16:54:17.000Z (3 months ago)
Last Synced: 2025-07-10T18:54:17.609Z (3 months ago)
Topics: code-analysis, docker, langchain, pineconedb, postgresql, python, rag-chatbot, reactjs, solo-project
Language: JavaScript
Homepage: https://git-rag.vercel.app
Size: 6.86 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # gitRAG

**RAG-based GitHub Repo Analysis Platform**  

*Analyse any public GitHub repository with LLM-powered chat and advanced semantic search.*

---

https://github.com/user-attachments/assets/99065742-a793-4ec5-8bb5-231f37d3d50e

## ⭐ Overview

### **Situation**

As a participant in open-source competitions and project exhibitions (EPICS, university projects), I often struggled to deeply understand large codebases—especially when onboarding new repositories from group members or exploring unfamiliar open-source projects. Sifting through thousands of files, dependencies, and scattered documentation was **tedious and overwhelming**, making it hard to answer even basic questions like "Where is X implemented?" or "How does this module work?"

### **Task**

I needed a platform that would let me:

- Instantly chat with any GitHub repo to ask questions about code, architecture, or logic.

- Quickly visualize and explore repo structure, file contents, and metadata.

- Perform semantic code search (not just by filename/text).

- Support multiple users and projects securely for my team and in competitions.

### **Action**

I independently designed and built **gitRAG**—an end-to-end, multi-tenant platform that ingests any public GitHub repo, chunks and indexes its code using embeddings and vector search, and enables users to interactively chat, search, and analyse codebases using a modern LLM (via LangChain and OpenAI API).

- **Built secure, scalable backend** using FastAPI, PostgreSQL (Aiven), PineconeDB, and LangChain.

- **Developed a modern React frontend** with hierarchical file explorer, real-time AI chat, and repo analytics.

- **Integrated Google/GitHub OAuth2** for authentication, and per-user encrypted API key management for privacy.

- **Engineered ingestion pipelines** to chunk, embed, and index 50MB+ codebases with 10,000+ files.

- **Tested and deployed** the platform on multiple real-world repos for open-source events and university project groups.

### **Result**

- Significantly reduced onboarding time for new repositories—now get context, explanations, and code Q&A in seconds.

- Enabled my team and myself to confidently tackle larger, more complex projects in hackathons and coursework.

- gitRAG is now a robust, reusable tool for anyone needing rapid understanding of unfamiliar codebases.

---

## 🚀 Features

- **LLM-powered code chat:** Ask questions about repo structure, functions, or files—get contextual, AI-driven answers.

- **Semantic code search:** Find relevant code snippets using meaning, not just keywords.

- **Hierarchical file explorer:** Browse and preview the full repo tree with metadata and analytics.

- **Multi-user & multi-repo support:** Secure, per-user data isolation with Google/GitHub OAuth2.

- **Repo analytics:** Visualize language breakdown, file types, contributors, and more.

- **Encrypted API key management:** User API keys are encrypted and never exposed.

- **Blazing fast:** Sub-second query responses (vector search and retrieval).

- **Modern UI:** Built with React, TailwindCSS, and Three.js (for 3D hero effect).

---

## 🛠️ Tech Stack

- **Frontend:** React.js, TailwindCSS, Vite, Three.js

- **Backend:** FastAPI (Python), LangChain, PostgreSQL (Aiven), PineconeDB

- **AI/Vector Search:** OpenAI API, PineconeDB, LangChain

- **Auth:** Google OAuth2, GitHub OAuth2

- **Integrations:** GitHub API (repo fetching, metadata), Node.js (utility scripts)

---

## 📷 Demo











---

## ⚡ How it Works (RAG Pipeline)

1. **Login** with Google or GitHub OAuth2 (secure, per-user).

2. **Paste any public GitHub repo URL** and your OpenAI API key (encrypted).

3. **Ingestion:**  

   - Fetches repo files via GitHub API

   - Chunks code using custom logic (by file type/size)

   - Generates vector embeddings (LangChain + OpenAI API)

   - Stores chunks and metadata in PineconeDB and PostgreSQL

4. **Analysis & Chat:**  

   - Use AI chat to ask any question about the repo (“What does X function do?” “Show me auth logic”)

   - Semantic search finds and retrieves the most relevant code chunks

   - LLM (via LangChain) generates contextual, accurate answers using retrieved code

5. **Explore:**  

   - Hierarchical explorer shows real file tree, lets you preview content and metadata

   - Repo analytics panel for high-level insights

---

## 🧩 Architecture



## ✨ Example Use Cases

- **Hackathons/open-source events:** Instantly understand any team repo or competition project.

- **University coursework:** Quickly onboard and analyze group project submissions.

- **Personal learning:** Explore popular open-source projects by chatting and searching their code.

- **Team code reviews:** Get instant explanations and context for PRs and legacy code.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/atomworkplace/gitrag

Awesome Lists containing this project

README