https://github.com/aimaster-dev/smartrag
SmartRAG is a terminal-based RAG system using LangGraph. It processes queries by retrieving relevant content from markdown or PDFs, then responds using OpenAI GPT. Supports webpage-to-PDF conversion, vector DB search, and modular flow control.
https://github.com/aimaster-dev/smartrag
ai automation chatbot cli document-search gpt knowledge-base langchain langgraph markdown nlp openai pdf python query rag retrieval terminal vector-database web-scraping
Last synced: 27 days ago
JSON representation
SmartRAG is a terminal-based RAG system using LangGraph. It processes queries by retrieving relevant content from markdown or PDFs, then responds using OpenAI GPT. Supports webpage-to-PDF conversion, vector DB search, and modular flow control.
- Host: GitHub
- URL: https://github.com/aimaster-dev/smartrag
- Owner: aimaster-dev
- License: mit
- Created: 2025-05-30T10:14:54.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-06-17T11:58:24.000Z (7 months ago)
- Last Synced: 2025-10-20T09:08:56.304Z (3 months ago)
- Topics: ai, automation, chatbot, cli, document-search, gpt, knowledge-base, langchain, langgraph, markdown, nlp, openai, pdf, python, query, rag, retrieval, terminal, vector-database, web-scraping
- Language: Python
- Homepage:
- Size: 51.8 MB
- Stars: 5
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🚀 SmartRAG
**SmartRAG** is a terminal-based Retrieval-Augmented Generation (RAG) system built using [LangGraph](https://github.com/langchain-ai/langgraph). It routes user queries through a custom flow that includes message history, query transformation, and document retrieval from a vector store.
> 🔗 GitHub: [https://github.com/aimaster-dev/SmartRAG](https://github.com/aimaster-dev/SmartRAG)
---
## 🧠 Features
* LangGraph-powered RAG pipeline
* Smart routing of user queries
* PDF and Markdown ingestion support
* Optional webpage-to-PDF and PDF-to-Markdown conversion
* OpenAI GPT integration for natural language responses
---
## 🗂️ Project Structure
```
SmartRAG/
├── architecture/ # LangGraph RAG workflow logic
├── data/ # Processed markdown or PDF content
├── modules/ # Core logic for query handling & doc processing
├── main.py # Entry point
└── processDocs.py # Document preprocessing script
```
---
## ⚙️ Setup
Follow the steps below to get SmartRAG up and running:
### 1. Clone the Repo
```bash
git clone https://github.com/aimaster-dev/SmartRAG.git
cd SmartRAG
```
### 2. Create Virtual Environment
```bash
python3.12 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
```
### 3. Install Dependencies
```bash
pip install -r requirements.txt
choco install wkhtmltopdf # for HTML to PDF conversion (Windows only)
```
### 4. Environment Setup
Copy and edit the `.env` file:
```bash
cp .env.example .env
```
Edit `.env` to include:
```env
OPENAI_API_KEY=your_openai_key
URLS=url1,url2 # Optional: URLs to fetch as PDF
GET_WEB_PAGES_TO_PDF=True
CONVERT_PDF_TO_MD=True
INTERMEDIATE_PDF_DIR=./pdfs
DATA_DIR=./data
```
### 5. Process Your Documents
```bash
python modules/processDocs.py
```
> ⚠️ Make sure to update `.env` parameters based on your use case.
### 6. Run SmartRAG
```bash
python main.py
```
---
## 🔍 How It Works
1. **User query** is passed into a LangGraph workflow.
2. **Message history** is cached and contextually enriched.
3. If needed, input is transformed for better retrieval.
4. Documents are pulled from a **vector store** using similarity search.
5. GPT model generates a context-aware answer.
---
## 🖼️ Architecture Overview
### 📄 Vector Store Creation

### 🧠 RAG Pipeline

---
## 🤝 Contributing
We welcome contributions!
* Fork the repo
* Create a feature branch
* Submit a pull request
> Got a big idea? Open an issue to discuss it first.
---
## 📬 Contact
For questions, feedback, or collaboration ideas — feel free to open an issue or reach out through GitHub!
---