An open API service indexing awesome lists of open source software.

https://github.com/emredeveloper/quick-rag

Production-ready RAG for JavaScript & React. Built on Ollama & LM Studio SDKs with hybrid search, caching, conversation management & evaluation.
https://github.com/emredeveloper/quick-rag

Last synced: 5 months ago
JSON representation

Production-ready RAG for JavaScript & React. Built on Ollama & LM Studio SDKs with hybrid search, caching, conversation management & evaluation.

Awesome Lists containing this project

README

          

# Quick RAG ⚑

[![npm version](https://img.shields.io/npm/v/quick-rag.svg)](https://www.npmjs.com/package/quick-rag)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

πŸš€ **Production-ready RAG (Retrieval-Augmented Generation) for JavaScript & React**
Built on official [Ollama](https://github.com/ollama/ollama-js) & [LM Studio](https://github.com/lmstudio-ai/lmstudio-js) SDKs.

> **πŸŽ‰ v2.5.2 Released!** React export fixes, deterministic init tests, and Ollama base model alignment (`granite4:3b`). See [CHANGELOG.md](CHANGELOG.md) for details.

## ✨ Features

### πŸ†• v2.5.2 - Stability & Compatibility
- βœ… **React Export Fix** - `quick-rag/react` now resolves correctly to `useRAG`
- βœ… **Deterministic initRAG Tests** - core test suite no longer depends on external Ollama availability
- βœ… **Ollama Base Model Alignment** - examples and tests standardized to `granite4:3b`
- πŸ› **Critical Bug Fix** - `ConversationManager.addAssistantMessage()` now correctly passes content
- 🌐 **Browser Compatibility** - Cross-platform UUID generation (Node.js + Browser)
- πŸ“¦ **Cleaner Dependencies** - Removed invalid self-referencing dependency
- πŸ€– **Updated Default Models** - `qwen3-embedding:0.6b` (Ollama) & `google/gemma-3-4b` (LM Studio)

### v2.4.0 - Robustness & Explainability
- πŸ”ͺ **Robust Chunking** - Abbreviation-aware sentence splitting & word-safe text chunking
- πŸ” **Rich Explainability** - Detailed retrieval snippets, keyword density & term match metrics
- πŸš€ **BM25 Optimization** - Min-Heap based top-K selection for fast retrieval in large datasets
- 🌐 **Environment Stability** - Universal UUID support for Node.js and Browser (globalThis.crypto)

### v2.3.0 - Performance & Evaluation
- πŸš€ **Caching Layer** - LRU cache, embedding cache, query cache for 10x speedup
- πŸ’¬ **Conversation Manager** - Context window management & auto-summarization
- πŸ“Š **RAG Evaluation** - Precision@K, Recall, MRR, NDCG metrics
- πŸ—„οΈ **Vector DB Connectors** - ChromaDB & Qdrant adapters

### πŸ” v2.2.0 - Advanced Search
- πŸ” **BM25 Sparse Search** - Pure JS keyword-based retrieval (no dependencies!)
- πŸ”€ **Hybrid Search** - Combines BM25 + Vector with RRF fusion (20-30% better retrieval)
- πŸ“Š **Reranking** - Multi-signal scoring (keyword, semantic, coverage, coherence)
- πŸ”„ **Query Transformation** - Expansion, decomposition, multi-query, HyDE

### Core Features
- 🎯 **Official SDKs** - Built on `ollama` and `@lmstudio/sdk` packages
- πŸ’Ύ **Embedded Persistence** - SQLite-based vector store (No server required!)
- πŸ›‘οΈ **Robust Error Handling** - 7 custom error classes with recovery suggestions
- πŸ“Š **Telemetry & Metrics** - Track performance, latency, and usage
- πŸ“ **Structured Logging** - JSON logging with Pino integration
- ⚑ **5x Faster** - Parallel batch embedding
- πŸ“„ **Document Loaders** - PDF, Word, Excel, Text, Markdown, URLs
- πŸ”ͺ **Robust Chunking** - Intelligent splitting that respects abbreviations (Dr., Prof.) and avoids word cutting
- 🏷️ **Metadata Filtering** - Filter by document properties
- πŸ” **Rich Query Explainability** - See WHY docs were retrieved with snippets and density metrics (unique!)
- 🎨 **Dynamic Prompts** - 10 built-in templates + full customization
- 🧠 **Weighted Decision Making** - Multi-criteria document scoring
- 🎯 **Heuristic Reasoning** - Pattern learning and query optimization
- πŸ”„ **CRUD Operations** - Add, update, delete documents on the fly
- 🌊 **Streaming Support** - Real-time AI responses
- πŸ”§ **Zero Config** - Works with React, Next.js, Vite, Node.js
- πŸ’ͺ **Type Safe** - Full TypeScript support

## πŸ“¦ Installation

```bash
npm install quick-rag
```

**Default Ollama models (examples/docs):**
```bash
ollama pull granite4:3b
ollama pull qwen3-embedding:0.6b
```

**Optional Dependencies:**
```bash
# For embedded persistence
npm install better-sqlite3

# For vector databases (optional)
npm install chromadb @qdrant/js-client-rest
```

## πŸ†• What's New in v2.3.0

### πŸš€ Caching Layer
Speed up repeated operations with intelligent caching:

```javascript
import { CacheManager, EmbeddingCache } from 'quick-rag';

// Unified cache manager
const cache = new CacheManager({
embeddings: { maxSize: 5000, ttl: 3600000 }, // 1 hour
queries: { maxSize: 500, ttl: 1800000 } // 30 min
});

// Wrap embedding function for automatic caching
const cachedEmbed = cache.wrapEmbedding(embedFn);

// Check statistics
console.log(cache.getStats());
// { embeddings: { size: 100, cacheHits: 450, cacheMisses: 50, hitRate: 0.9 } }
```

### πŸ’¬ Conversation Manager
Manage chat history with context window limits:

```javascript
import { ConversationManager, getContextLimit } from 'quick-rag';

const conversation = new ConversationManager({
maxTokens: getContextLimit('llama3'), // 8192
autoSummarize: true,
systemPrompt: 'You are a helpful assistant.'
});

conversation.addMessage('user', 'What is RAG?');
conversation.addMessage('assistant', 'RAG stands for...');

// Get context for LLM (respects token limits)
const context = conversation.getContext();

// Fork, export, or summarize
const forked = conversation.fork();
const json = conversation.toJSON();
```

### πŸ“Š RAG Evaluation
Measure retrieval quality with standard metrics:

```javascript
import { precisionAtK, meanReciprocalRank, RAGEvaluator } from 'quick-rag';

// Individual metrics
const retrieved = ['doc1', 'doc4', 'doc2'];
const relevant = ['doc1', 'doc2', 'doc3'];

console.log(precisionAtK(retrieved, relevant, 3)); // 0.667
console.log(meanReciprocalRank(retrieved, relevant)); // 1.0

// Full evaluation
const evaluator = new RAGEvaluator(retriever);
const results = await evaluator.evaluate(testQueries);
console.log(results.metrics); // { precision, recall, mrr, ndcg }
```

### πŸ—„οΈ Vector Database Connectors
Connect to external vector databases:

```javascript
import { createVectorStore, ChromaVectorStore, QdrantVectorStore } from 'quick-rag';

// Factory pattern
const store = await createVectorStore('chroma', embedFn, {
collectionName: 'my-docs',
host: 'localhost',
port: 8000
});

// Or direct usage
const qdrant = new QdrantVectorStore(embedFn, {
url: 'http://localhost:6333',
collectionName: 'documents'
});
```

## πŸ†• What's New in v2.4.0

### πŸ”ͺ Robust Chunking
Intelligent text splitting that handles abbreviations and prevents word splitting:

```javascript
import { chunkBySentences, chunkText } from 'quick-rag';

// Handles Dr., Prof., LTD., approx., etc.
const chunks = chunkBySentences(text, {
sentencesPerChunk: 3,
overlapSentences: 1
});

// Avoids cutting words in half
const textChunks = chunkText(text, {
chunkSize: 500,
overlap: 50,
separator: ' ' // Word-safe splitting
});
```

### πŸ” Rich Query Explainability
Get deep insights into why a document was retrieved:

```javascript
const results = await retriever.getRelevant(query, 3, { explain: true });

console.log(results[0].explanation);
/*
{
score: 0.88,
snippet: "...context surrounding the match...",
relevanceFactors: {
semanticScore: 0.88,
termMatch: 0.75, // 3/4 terms matched
density: 0.15 // concentration of keywords
}
}
*/
```

## πŸ” What's in v2.2.0

### πŸ” BM25 Sparse Search
Pure JavaScript implementation - no external dependencies!

```javascript
import { BM25 } from 'quick-rag';

const bm25 = new BM25({ k1: 1.2, b: 0.75 });
bm25.addDocuments([
{ id: '1', text: 'Machine learning is a subset of AI' },
{ id: '2', text: 'Deep learning uses neural networks' },
{ id: '3', text: 'Natural language processing handles text' }
]);

const results = bm25.search('neural networks AI', 2);
// Fast keyword-based retrieval with TF-IDF scoring
```

### πŸ”€ Hybrid Search (BM25 + Vector)
Combine sparse and dense retrieval for 20-30% better results!

```javascript
import { HybridRetriever, InMemoryVectorStore } from 'quick-rag';

const vectorStore = new InMemoryVectorStore(embedFn);
await vectorStore.addDocuments(docs);

const hybrid = new HybridRetriever(vectorStore, {
alpha: 0.5, // Balance: 0=sparse only, 1=dense only
fusionMethod: 'rrf', // Reciprocal Rank Fusion
rrfK: 60
});

const results = await hybrid.search('query', 5, { explain: true });
// Results include both dense and sparse scores
```

### πŸ“Š Reranking
Multi-signal scoring to improve top-K precision:

```javascript
import { Reranker, createRerankedRetriever } from 'quick-rag';

const reranker = new Reranker({
keywordWeight: 0.35, // Keyword overlap
semanticWeight: 0.35, // Semantic similarity
coverageWeight: 0.20, // Query term coverage
coherenceWeight: 0.10 // Text coherence
});

// Rerank any retriever's results
const reranked = reranker.rerank(query, initialResults, { explain: true });

// Or wrap a retriever for automatic reranking
const smartRetriever = createRerankedRetriever(hybridRetriever, rerankerOptions);
```

### πŸ”„ Query Transformation
Advanced query processing techniques:

```javascript
import { QueryExpander, QueryDecomposer, MultiQueryGenerator } from 'quick-rag';

// 1. Query Expansion - Add synonyms
const expander = new QueryExpander();
expander.addSynonyms('ml', ['machine learning', 'AI']);
const expanded = expander.expand('ml models');
// "ml models machine learning AI"

// 2. Query Decomposition - Split complex queries
const decomposer = new QueryDecomposer();
const parts = decomposer.decompose('Compare BM25 with vector search and explain differences');
// ["Compare BM25 with vector search", "explain differences"]

// 3. Multi-Query - Generate variations
const generator = new MultiQueryGenerator();
const variations = generator.generate('How does RAG work?');
// ["How does RAG work?", "What is RAG?", "RAG explanation"]
```

### 🎯 Full Pipeline Example
Combine all features for maximum retrieval quality:

```javascript
import {
OllamaRAGClient,
createOllamaRAGEmbedding,
InMemoryVectorStore,
HybridRetriever,
createRerankedRetriever,
QueryExpander,
generateWithRAG
} from 'quick-rag';

// Setup
const client = new OllamaRAGClient();
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');
const store = new InMemoryVectorStore(embed);
await store.addDocuments(documents);

// Create hybrid + reranked retriever
const hybrid = new HybridRetriever(store, { alpha: 0.5, fusionMethod: 'rrf' });
const retriever = createRerankedRetriever(hybrid, { keywordWeight: 0.3 });

// Expand query and retrieve
const expander = new QueryExpander();
const { expanded } = expander.expand(userQuery);
const results = await retriever.getRelevant(expanded, 5);

// Generate response
const response = await generateWithRAG(client, 'llama3', userQuery, results);
```

---

## πŸ“š Previous Features

### πŸ’Ύ Embedded Persistence (v2.1.0)
Store your vectors locally without setting up a complex database server!
- **Zero Setup:** Just provide a file path (`./rag.db`)
- **Fast:** Built on `better-sqlite3`
- **Full Features:** Batch insert, metadata filtering, CRUD

### πŸ›‘οΈ Advanced Error Handling
Never crash without knowing why. New error system provides:
- **Specific Error Types:** `RAGError`, `EmbeddingError`, `RetrievalError`, etc.
- **Error Codes:** Programmatic handling
- **Recovery Hints:** Actionable suggestions in error messages

### πŸ“Š Metrics & Logging
Monitor your RAG pipeline in production:
- **Performance Tracking:** Embedding time, search latency, generation speed
- **Structured Logs:** JSON format for easy parsing
- **Prometheus Support:** Export metrics for monitoring dashboards
Advanced filtering with custom logic - filter documents using JavaScript functions:
```javascript
const results = await retriever.getRelevant('latest AI news', 5, {
filter: (meta) => {
return meta.year === 2024 &&
meta.tags.includes('AI') &&
meta.difficulty !== 'beginner';
}
});
```

### πŸ“½οΈ PowerPoint Support
Load .pptx and .ppt files with `officeparser`:
```javascript
import { loadDocument } from 'quick-rag';
const pptDoc = await loadDocument('./presentation.pptx');
```

### πŸ“ Organized Examples
12 comprehensive examples covering all features:
- Basic Usage (Ollama & LM Studio)
- Document Loading (PDF, Word, Excel)
- Metadata Filtering
- Streaming Responses
- Advanced Filtering
- Query Explainability
- Prompt Management
- Decision Engine (Simple & Real-World)
- Conversation History & Export
- New `examples/` folder for direct `npm i quick-rag` usage

---

## πŸ†• Previous Features (v1.1.x)

### πŸ“ Internationalization Update
- Translated all example files to English for better international accessibility
- `examples/10-decision-engine.js` - Smart Document Selection example
- `examples/11-loaders.js` - Document loaders example

### 🧠 Decision Engine (v1.1.0)

**Revolutionary AI-powered retrieval system** - The most advanced RAG retrieval available!

Quick RAG now includes a **Decision Engine** that goes far beyond simple cosine similarity. It combines:
- 🎯 **Multi-Criteria Weighted Scoring** - 5 factors evaluated together
- 🧠 **Heuristic Reasoning** - Pattern-based query optimization
- οΏ½ **Adaptive Learning** - Learns from user feedback
- οΏ½πŸ” **Full Transparency** - See exactly why each document was selected

#### Multi-Criteria Scoring

**5 weighted factors beyond similarity:**

1. **πŸ“Š Semantic Similarity** (50%) - Cosine similarity score
2. **πŸ”€ Keyword Match** (20%) - Term matching in document
3. **πŸ“… Recency** (15%) - Document freshness with exponential decay
4. **⭐ Source Quality** (10%) - Source reliability (official=1.0, research=0.9, blog=0.7, forum=0.6)
5. **🎯 Context Relevance** (5%) - Contextual fit

```javascript
import { SmartRetriever, DEFAULT_WEIGHTS } from 'quick-rag';

// Create smart retriever with default weights
const smartRetriever = new SmartRetriever(basicRetriever);

// Or customize weights for your use case
const smartRetriever = new SmartRetriever(basicRetriever, {
weights: {
semanticSimilarity: 0.35,
keywordMatch: 0.20,
recency: 0.30, // Higher for news sites
sourceQuality: 0.10,
contextRelevance: 0.05
}
});

// Get results with decision transparency
const response = await smartRetriever.getRelevant('latest AI news', 3);

// See scoring breakdown for each document
console.log(response.results[0]);
// {
// text: "...",
// weightedScore: 0.742,
// scoreBreakdown: {
// semanticSimilarity: { score: 0.85, weight: 0.35, contribution: 0.298 },
// keywordMatch: { score: 0.67, weight: 0.20, contribution: 0.134 },
// recency: { score: 0.95, weight: 0.30, contribution: 0.285 },
// sourceQuality: { score: 0.90, weight: 0.10, contribution: 0.090 },
// contextRelevance: { score: 1.00, weight: 0.05, contribution: 0.050 }
// }
// }

// Decision context shows WHY these results
console.log(response.decisions);
// {
// weights: { ... },
// appliedRules: ["boost-recent-for-news"],
// suggestions: [
// "Time-sensitive query detected. Prioritizing recent documents.",
// "Consider using filters if you need older historical content."
// ]
// }
```

#### Heuristic Reasoning

**Pattern-based optimization that learns:**

```javascript
// Enable learning mode
const smartRetriever = new SmartRetriever(basicRetriever, {
enableLearning: true,
enableHeuristics: true
});

// Add custom rules
smartRetriever.heuristicEngine.addRule(
'boost-documentation',
(query, context) => query.includes('documentation'),
(query, context) => {
context.adjustWeight('sourceQuality', 0.15); // Increase quality weight
return { adjusted: true, reason: 'Documentation query prioritizes quality' };
},
5 // Priority
);

// Provide feedback to enable learning
smartRetriever.provideFeedback(query, results, {
rating: 5, // 1-5 rating
hasFilters: true, // User applied filters
comment: 'Perfect results!'
});

// System learns successful patterns
const insights = smartRetriever.getInsights();
console.log(insights.heuristics.successfulPatterns);
// ["latest", "documentation", "official release"]

// Export learned knowledge
const knowledge = smartRetriever.exportKnowledge();

// Import to another instance
newRetriever.importKnowledge(knowledge);
```

#### Scenario Customization

**Different weights for different use cases:**

```javascript
// News Platform - Recency Priority
const newsRetriever = new SmartRetriever(basicRetriever, {
weights: {
semanticSimilarity: 0.30,
keywordMatch: 0.20,
recency: 0.40, // πŸ”₯ High recency
sourceQuality: 0.05,
contextRelevance: 0.05
}
});

// Documentation Site - Quality Priority
const docsRetriever = new SmartRetriever(basicRetriever, {
weights: {
semanticSimilarity: 0.35,
keywordMatch: 0.20,
recency: 0.10,
sourceQuality: 0.30, // πŸ”₯ High quality
contextRelevance: 0.05
}
});

// Research Platform - Balanced
const researchRetriever = new SmartRetriever(basicRetriever, {
weights: DEFAULT_WEIGHTS // Balanced approach
});
```

#### Real-World Example

See `examples/11-loaders.js` for a complete example with:
- PDF document loading
- Multiple source types (official, blog, research, forum)
- 3 different scenarios (news, documentation, research)
- RAG generation with quality metrics
- Decision transparency and explanations

**Benefits:**
- βœ… More accurate retrieval than pure similarity
- βœ… Adapts to different content types automatically
- βœ… Learns from user interactions
- βœ… Fully explainable decisions
- βœ… Customizable for any use case
- βœ… Production-ready with proven patterns

### πŸ” Query Explainability (v1.1.0)
**Understand WHY documents were retrieved** - A first-of-its-kind feature!

```javascript
const results = await retriever.getRelevant('What is Ollama?', 3, {
explain: true
});

// Each result includes detailed explanation:
console.log(results[0].explanation);
// {
// queryTerms: ["ollama", "local", "ai"],
// matchedTerms: ["ollama", "local"],
// matchCount: 2,
// matchRatio: 0.67,
// cosineSimilarity: 0.856,
// relevanceFactors: {
// termMatches: 2,
// semanticSimilarity: 0.856,
// coverage: "67%"
// }
// }
```

**Use cases:** Debug searches, optimize queries, validate accuracy, explain to users

### 🎨 Dynamic Prompt Management (v1.1.0)
**10 built-in templates + full customization**

```javascript
// Quick template selection
await generateWithRAG(client, model, query, docs, {
template: 'conversational' // or: technical, academic, code, etc.
});

// System prompts for role definition
await generateWithRAG(client, model, query, docs, {
systemPrompt: 'You are a helpful programming tutor',
template: 'instructional'
});

// Advanced: Reusable PromptManager
import { createPromptManager } from 'quick-rag';

const promptMgr = createPromptManager({
systemPrompt: 'You are an expert engineer',
template: 'technical'
});

await generateWithRAG(client, model, query, docs, {
promptManager: promptMgr
});
```

**Templates:** `default`, `conversational`, `technical`, `academic`, `code`, `concise`, `detailed`, `qa`, `instructional`, `creative`

---

## πŸš€ Quick Start

### Option 1: With Official Ollama SDK (Recommended)

```javascript
import {
OllamaRAGClient,
createOllamaRAGEmbedding,
InMemoryVectorStore,
Retriever
} from 'quick-rag';

// 1. Initialize client (official SDK)
const client = new OllamaRAGClient({
host: 'http://127.0.0.1:11434'
});

// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');

// 3. Create vector store
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocument({
text: 'Ollama provides local LLM hosting.'
});

// 5. Query with streaming (official SDK feature!)
const results = await retriever.getRelevant('What is Ollama?', 2);
const context = results.map(d => d.text).join('\n');

const response = await client.chat({
model: 'granite4:3b',
messages: [{
role: 'user',
content: `Context: ${context}\n\nQuestion: What is Ollama?`
}],
stream: true, // Official SDK streaming!
});

// Stream response
for await (const part of response) {
process.stdout.write(part.message?.content || '');
}
```

---

### Option 2: React with Vite

> **πŸ’‘ Starting from scratch?** Check out the detailed step-by-step guide in [QUICKSTART_REACT.md](./QUICKSTART_REACT.md)!

**Step 1:** Create your project

```bash
npm create vite@latest my-rag-app -- --template react
cd my-rag-app
npm install quick-rag express concurrently
```

**Step 2:** Create backend proxy (`server.js` in project root)

```javascript
import express from 'express';
import { OllamaRAGClient } from 'quick-rag';

const app = express();
app.use(express.json());

const client = new OllamaRAGClient({ host: 'http://127.0.0.1:11434' });

app.post('/api/generate', async (req, res) => {
const { model = 'granite4:3b', messages } = req.body;
const response = await client.chat({ model, messages, stream: false });
res.json({ response: response.message.content });
});

app.post('/api/embed', async (req, res) => {
const { model = 'qwen3-embedding:0.6b', input } = req.body;
const response = await client.embed(model, input);
res.json(response);
});

app.listen(3001, () => console.log('πŸš€ Server: http://127.0.0.1:3001'));
```

**Step 3:** Configure Vite proxy (`vite.config.js`)

```javascript
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';

export default defineConfig({
plugins: [react()],
server: {
proxy: {
'/api': {
target: 'http://127.0.0.1:3001',
changeOrigin: true
}
}
}
});
```

**Step 4:** Update `package.json` scripts

```json
{
"scripts": {
"dev": "concurrently \"npm:server\" \"npm:client\"",
"server": "node server.js",
"client": "vite"
}
}
```

**Step 5:** Use in your React component (`src/App.jsx`)

```jsx
import { useState, useEffect } from 'react';
import { useRAG, initRAG, createBrowserModelClient } from 'quick-rag';

const docs = [
{ id: '1', text: 'React is a JavaScript library for building user interfaces.' },
{ id: '2', text: 'Ollama provides local LLM hosting.' },
{ id: '3', text: 'RAG combines retrieval with AI generation.' }
];

export default function App() {
const [rag, setRAG] = useState(null);
const [query, setQuery] = useState('');

const { run, loading, response, docs: results } = useRAG({
retriever: rag?.retriever,
modelClient: createBrowserModelClient(),
model: 'granite4:3b'
});

useEffect(() => {
initRAG(docs, {
baseEmbeddingOptions: {
useBrowser: true,
baseUrl: '/api/embed',
model: 'qwen3-embedding:0.6b'
}
}).then(core => setRAG(core));
}, []);

return (


πŸ€– RAG Demo


setQuery(e.target.value)}
placeholder="Ask something..."
style={{ width: 300, padding: 10 }}
/>
run(query)} disabled={loading}>
{loading ? 'Thinking...' : 'Ask AI'}


{results && (

πŸ“š Retrieved:


{results.map(d =>

{d.text}

)}

)}

{response && (

✨ Answer:


{response}



)}

);
}
```

**Step 6:** Run your app

```bash
npm run dev
```

Open `http://localhost:5173` πŸŽ‰

---

### Option 2: Next.js (Pages Router)

**Step 1:** Create API routes

```javascript
// pages/api/generate.js
import { OllamaClient } from 'quick-rag';

export default async function handler(req, res) {
const client = new OllamaClient();
const { model = 'granite4:3b', prompt } = req.body;
const response = await client.generate(model, prompt);
res.json({ response });
}
```

```javascript
// pages/api/embed.js
import { OllamaClient } from 'quick-rag';

export default async function handler(req, res) {
const client = new OllamaClient();
const { model = 'qwen3-embedding:0.6b', input } = req.body;
const response = await client.embed(model, input);
res.json(response);
}
```

**Step 2:** Use in your page (same React component as above)

---

### Option 3: Vanilla JavaScript (Node.js)

**Simple approach with official Ollama SDK:**

```javascript
import {
OllamaRAGClient,
createOllamaRAGEmbedding,
InMemoryVectorStore,
Retriever
} from 'quick-rag';

// 1. Initialize client
const client = new OllamaRAGClient();

// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');

// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocuments([
{ text: 'JavaScript is a programming language.' },
{ text: 'Python is great for data science.' },
{ text: 'Rust is a systems programming language.' }
]);

// 5. Query
const query = 'What is JavaScript?';
const results = await retriever.getRelevant(query, 2);

// 6. Generate answer
const context = results.map(d => d.text).join('\n');
const response = await client.chat({
model: 'granite4:3b',
messages: [{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:`
}]
});

// Clean output
console.log('πŸ“š Retrieved:', results.map(d => d.text));
console.log('πŸ€– Answer:', response.message.content);
```

**Output:**
```
πŸ“š Retrieved: [
'JavaScript is a programming language.',
'Python is great for data science.'
]
πŸ€– Answer: JavaScript is a programming language that allows developers
to write code and implement functionality in web browsers...
```

---

### Option 4: LM Studio 🎨

Use LM Studio instead of Ollama with OpenAI-compatible API:

```javascript
import {
LMStudioRAGClient,
createLMStudioRAGEmbedding,
InMemoryVectorStore,
Retriever,
generateWithRAG
} from 'quick-rag';

// 1. Initialize LM Studio client
const client = new LMStudioRAGClient();

// 2. Setup embedding (use your embedding model from LM Studio)
const embed = createLMStudioRAGEmbedding(client, 'text-embedding-embeddinggemma-300m');

// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocuments([
{ text: 'LM Studio is a desktop app for running LLMs locally.' },
{ text: 'It provides an OpenAI-compatible API.' },
{ text: 'You can use models like Llama, Mistral, and more.' }
]);

// 5. Query with RAG
const results = await retriever.getRelevant('What is LM Studio?', 2);
const answer = await generateWithRAG(
client,
'google/gemma-3-4b', // or your model name
'What is LM Studio?',
results
);

console.log('Answer:', answer);
```

**Prerequisites for LM Studio:**

1. Download and install [LM Studio](https://lmstudio.ai)
2. Download a language model (e.g., Llama 3.2, Mistral)
3. Download an embedding model (e.g., text-embedding-embeddinggemma-300m)
4. Start the local server: `Developer > Local Server` (default: `http://localhost:1234`)

**For React projects:** Import from `'quick-rag/react'` to use hooks:

```javascript
import { useRAG } from 'quick-rag/react';
// or
import { useRAG } from 'quick-rag'; // Also works in React projects
```

---

## πŸ“– API Reference

### React Hook: `useRAG`

```javascript
const { run, loading, response, docs, streaming, error } = useRAG({
retriever, // Retriever instance
modelClient, // Model client (OllamaClient or BrowserModelClient)
model // Model name (e.g., 'granite4:3b')
});

// Ask a question
await run('What is React?');

// With options
await run('What is React?', {
topK: 5, // Number of documents to retrieve
stream: true, // Enable streaming
onDelta: (chunk, fullText) => console.log(chunk)
});
```

### Core Functions

**Initialize RAG**

```javascript
const { retriever, store, mrl } = await initRAG(documents, {
defaultDim: 128, // Embedding dimension
k: 2, // Default number of results
mrlBaseDim: 768, // Base embedding dimension
baseEmbeddingOptions: {
useBrowser: true, // Use browser-safe fetch
baseUrl: '/api/embed', // Embedding endpoint
model: 'qwen3-embedding:0.6b' // Embedding model
}
});
```

**Generate with RAG**

```javascript
const result = await generateWithRAG({
retriever,
modelClient,
model,
query: 'Your question',
topK: 3 // Optional: override default k
});

// Returns: { docs, response, prompt }
```

### VectorStore API

```javascript
const store = new InMemoryVectorStore(embeddingFn, { defaultDim: 128 });

// Add documents
await store.addDocument({ id: '1', text: 'Document text' });

// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([{ id: '1', text: '...' }], {
dim: 128,
batchSize: 20, // Process 20 chunks at a time
maxConcurrent: 5, // Max 5 concurrent requests
onProgress: (current, total) => {
console.log(`Progress: ${current}/${total}`);
}
});

// Query
const results = await store.similaritySearch('query', k, queryDim);

// CRUD
const doc = store.getDocument('id');
const all = store.getAllDocuments();
await store.updateDocument('id', 'new text', { meta: 'data' });
store.deleteDocument('id');
store.clear();
```

**Batch Processing for Large Documents (v2.0.3):**

```javascript
// Process large PDFs efficiently
const chunks = chunkDocuments([largePDF], { chunkSize: 1000, overlap: 100 });

await store.addDocuments(chunks, {
batchSize: 20, // Process 20 chunks per batch
maxConcurrent: 5, // Max 5 concurrent embedding requests
onProgress: (current, total) => {
console.log(`Embedding progress: ${current}/${total} (${Math.round(current/total*100)}%)`);
}
});
```

### Model Clients

**Browser (with proxy)**

```javascript
const client = createBrowserModelClient({
endpoint: '/api/generate' // Your proxy endpoint
});
```

**Node.js (direct)**

```javascript
const client = new OllamaClient({
baseUrl: 'http://127.0.0.1:11434/api'
});
```

---

## πŸ’‘ Examples

### CRUD Operations

```javascript
// Add document dynamically
await store.addDocument({
id: 'new-doc',
text: 'TypeScript adds types to JavaScript.'
});

// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([
{ id: 'doc1', text: 'First document' },
{ id: 'doc2', text: 'Second document' }
], {
batchSize: 10, // Process in batches
maxConcurrent: 5, // Rate limiting
onProgress: (current, total) => {
console.log(`Added ${current}/${total} documents`);
}
});

// Update existing
await store.updateDocument('1', 'React 19 is the latest version.', {
version: '19',
updated: Date.now()
});

// Delete
store.deleteDocument('2');

// Query all
const allDocs = store.getAllDocuments();
console.log(`Total documents: ${allDocs.length}`);
```

### Dynamic Retrieval

```javascript
// Ask with different topK values
const result1 = await run('What is JavaScript?', { topK: 1 }); // Get 1 doc
const result2 = await run('What is JavaScript?', { topK: 5 }); // Get 5 docs
```

### Streaming Responses

```javascript
await run('Explain React hooks', {
stream: true,
onDelta: (chunk, fullText) => {
console.log('New chunk:', chunk);
// Update UI in real-time
}
});
```

### Custom Embedding Models

```javascript
// Use different embedding models
const rag = await initRAG(docs, {
baseEmbeddingOptions: {
useBrowser: true,
baseUrl: '/api/embed',
model: 'qwen3-embedding:0.6b' // or another compatible embedding model
}
});
```

**More examples:** Check the [`examples/`](./examples) folder for complete demos.

---

## πŸ“„ Document Loaders (v0.7.4+)

Load documents from various formats and use them with RAG!

### Supported Formats

| Format | Function | Requires |
|--------|----------|----------|
| PDF | `loadPDF()` | `npm install pdf-parse` |
| Word (.docx) | `loadWord()` | `npm install mammoth` |
| Excel (.xlsx) | `loadExcel()` | `npm install xlsx` |
| Text (.txt) | `loadText()` | Built-in βœ… |
| JSON | `loadJSON()` | Built-in βœ… |
| Markdown | `loadMarkdown()` | Built-in βœ… |
| Web URLs | `loadURL()` | Built-in βœ… |

### Quick Start

**Load PDF:**
```javascript
import { loadPDF, chunkDocuments } from 'quick-rag';

// Load PDF
const pdf = await loadPDF('./document.pdf');
console.log(`Loaded ${pdf.meta.pages} pages`);

// Chunk and add to RAG
const chunks = chunkDocuments([pdf], {
chunkSize: 500,
overlap: 50
});
await store.addDocuments(chunks);
```

**Load from URL:**
```javascript
import { loadURL } from 'quick-rag';

const doc = await loadURL('https://example.com', {
extractText: true // Convert HTML to plain text
});
await store.addDocuments([doc]);
```

**Load Directory:**
```javascript
import { loadDirectory } from 'quick-rag';

// Load all supported documents from a folder
const docs = await loadDirectory('./documents', {
extensions: ['.pdf', '.docx', '.txt', '.md'],
recursive: true
});

console.log(`Loaded ${docs.length} documents`);

// Chunk and add to vector store
const chunks = chunkDocuments(docs, { chunkSize: 500 });
await store.addDocuments(chunks);
```

**Auto-Detect Format:**
```javascript
import { loadDocument } from 'quick-rag';

// Automatically detects file type
const doc = await loadDocument('./file.pdf');
// Works with: .pdf, .docx, .xlsx, .txt, .md, .json
```

### Installation

```bash
# Core package (includes text, JSON, markdown, URL loaders)
npm install quick-rag

# Optional: PDF support
npm install pdf-parse

# Optional: Word support
npm install mammoth

# Optional: Excel support
npm install xlsx

# Or install all at once:
npm install quick-rag pdf-parse mammoth xlsx
```

### Complete Example

```javascript
import {
loadPDF,
loadDirectory,
chunkDocuments,
InMemoryVectorStore,
Retriever,
OllamaRAGClient,
createOllamaRAGEmbedding,
generateWithRAG
} from 'quick-rag';

// Load documents
const pdf = await loadPDF('./research.pdf');
const docs = await loadDirectory('./articles');

// Combine and chunk
const allDocs = [pdf, ...docs];
const chunks = chunkDocuments(allDocs, {
chunkSize: 500,
overlap: 50
});

// Setup RAG
const client = new OllamaRAGClient();
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');
const store = new InMemoryVectorStore(embed);
const retriever = new Retriever(store);

// Add to vector store
await store.addDocuments(chunks);

// Query
const results = await retriever.getRelevant('What is the main topic?', 3);
const answer = await generateWithRAG(client, 'granite4:3b',
'What is the main topic?', results);

console.log(answer);
```

**See full example:** [`examples/11-loaders.js`](./examples/11-loaders.js)

---

## ❓ Troubleshooting

| Problem | Solution |
|---------|----------|
| 🚫 **CORS errors** | Use a proxy server (Express/Next.js API routes) |
| πŸ”Œ **Connection refused** | Ensure Ollama is running: `ollama serve` |
| πŸ“¦ **Models not found** | Pull models: `ollama pull granite4:3b && ollama pull qwen3-embedding:0.6b` |
| 🌐 **404 on `/api/embed`** | Check your proxy configuration in `vite.config.js` or API routes |
| πŸ’» **Windows IPv6 issues** | Use `127.0.0.1` instead of `localhost` |
| πŸ“¦ **Module not found** | Check imports: use `'quick-rag'` not `'quick-rag/...'` |

> **Note:** v0.6.5+ automatically detects and uses the correct API (generate or chat) for any model.

---

## πŸ“š Documentation

- **πŸ“– [API Reference](./docs/API_REFERENCE.md)** - Complete API documentation
- **πŸ›‘οΈ [Error Handling](./docs/ERROR_HANDLING.md)** - Error handling best practices
- **πŸ’Ύ [SQLite Persistence](./docs/SQLITE_PERSISTENCE.md)** - Embedded storage guide
- **πŸ“Š [Metrics & Telemetry](./docs/METRICS_TELEMETRY.md)** - Monitoring and logging
- **🀝 [Contributing](./CONTRIBUTING.md)** - Contribution guidelines
- **πŸ“ [Changelog](./CHANGELOG.md)** - Version history
- **πŸ’‘ [Examples](./examples)** - Working code examples
- **πŸš€ [Quickstart](./quickstart)** - Quick start guides

## πŸ”— Resources

- **Ollama Models:** [ollama.ai/library](https://ollama.ai/library)
- **LM Studio:** [lmstudio.ai](https://lmstudio.ai)
- **Issues:** [GitHub Issues](https://github.com/emredeveloper/quick-rag/issues)
- **Discussions:** [GitHub Discussions](https://github.com/emredeveloper/quick-rag/discussions)
- **NPM Package:** [npmjs.com/package/quick-rag](https://www.npmjs.com/package/quick-rag)

---

## πŸ“„ License

MIT © [Cihat Emre Karataş](https://github.com/emredeveloper)

---

## πŸ™ Acknowledgments

Built with:
- [Ollama JS SDK](https://github.com/ollama/ollama-js)
- [LM Studio SDK](https://github.com/lmstudio-ai/lmstudio-js)
- [Pino](https://github.com/pinojs/pino) - Fast logging
- [Better SQLite3](https://github.com/WiseLibs/better-sqlite3) - Embedded database

Special thanks to all contributors and the open-source community!

---

**Made with ❀️ for the JavaScript & AI community**