{"id":26928359,"url":"https://github.com/aemal/rag-killer","last_synced_at":"2026-04-24T16:36:58.223Z","repository":{"id":284616033,"uuid":"955491690","full_name":"aemal/rag-killer","owner":"aemal","description":"A tool that analyzes your content to determine if you need a RAG pipeline or if modern language models can handle your text directly. It compares your content's token requirements against model context windows to help you make an informed architectural decision.","archived":false,"fork":false,"pushed_at":"2025-03-26T19:03:54.000Z","size":2638,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T20:22:52.153Z","etag":null,"topics":["ai","ai-agents","aiagents","context-window","llm","rag"],"latest_commit_sha":null,"homepage":"https://aemalsayer.com","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aemal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-26T18:11:28.000Z","updated_at":"2025-03-26T19:03:57.000Z","dependencies_parsed_at":"2025-03-26T20:32:55.823Z","dependency_job_id":null,"html_url":"https://github.com/aemal/rag-killer","commit_stats":null,"previous_names":["aemal/rag-killer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aemal/rag-killer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aemal%2Frag-killer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aemal%2Frag-killer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aemal%2Frag-killer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aemal%2Frag-killer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aemal","download_url":"https://codeload.github.com/aemal/rag-killer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aemal%2Frag-killer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32231164,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T13:21:15.438Z","status":"ssl_error","status_checked_at":"2026-04-24T13:21:15.005Z","response_time":64,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-agents","aiagents","context-window","llm","rag"],"created_at":"2025-04-02T04:19:19.208Z","updated_at":"2026-04-24T16:36:58.194Z","avatar_url":"https://github.com/aemal.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RAG-Killer\n\nA text analysis tool that helps you determine whether you need a RAG (Retrieval Augmented Generation) pipeline or if your content can be processed directly by modern language models. By analyzing your content's token requirements and comparing them against model context windows, this tool provides insights to help you make an informed architectural decision.\n\n## Features\n\n- Analyzes text statistics (words, characters, lines)\n- Estimates token usage for different language models\n- Calculates size requirements (character and token-based)\n- Provides context window utilization warnings\n- Supports multiple model specifications via JSON configuration\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details\n\n## Real-World Example: Processing Pride and Prejudice\n\nHere's a real example of processing the entire \"Pride and Prejudice\" novel (255 pages) using o3-mini:\n\n```\nAnalysis for o3-mini (o3-mini):\n==================================================\nModel Description: Fast, flexible, intelligent reasoning model with 200k token context window\nContext Window: 200,000 tokens\n\nPricing (per 1M tokens):\n- Input: $1.1\n- Output: $4.4\n- Cached Input: $0.55\n\nText Statistics:\n- Words: 127,377\n- Characters: 728,841\n- Lines: 14,533\n- Estimated Pages: 255\n\nToken Analysis:\n- Estimated Tokens: 182,211\n- Context Window Utilization: 91.11%\n\nSize Analysis:\n- Character Size: 1.39 MB\n- Token Size: 711.76 KB\n- Total Size: 2.09 MB\n\nCost Analysis:\n- Input (182,211 tokens): $0.2004\n- Cached Input: $0.1002\n- Potential Output Cost: $0.8017\n- Total Cost (input only): $0.2004\n\nRecommendations:\n⚠️  Warning: Content is close to context window limit!\n\nContent Analysis:\n==================================================\nModel: o3-mini (o3-mini)\nPrice per 1M tokens:\n- Input: $1.1\n- Output: $4.4\n- Cached Input: $0.55\n\nText Statistics:\nWords: 127,377\nCharacters: 728,841\nEstimated Tokens: 182,211\nContext Window Utilization: 91.11%\nTotal Size: 2.09 MB\n\nGenerating Summary...\n\nSummary Statistics:\n==================================================\nTime taken: 30.25 seconds\nSummary saved to: result.md\nSummary length: 3,021 characters\nSummary tokens: ~756\nSummary size: 5.9 KB\n\nCost Analysis:\nInput cost (182,211 tokens): $0.2004\nOutput cost (756 tokens): $0.0033\nTotal cost: $0.2038\nSummary tokens: 756\nSummary size: 8.85 KB\n\nCompression Ratio: 99.6%\n```\n\n\u003e NOTE: The cost estimation isn't close to reality, in ChatGPT dashboard I see only $0.09 while my estimation shows $0.2038, not sure why.\n\n**Before Processing:**\n![OpenAI billing before processing](./images/o3-mini-before.png)\n\n**After Processing:**\n![OpenAI billing after processing](./images/o3-mini-after.png)\n\n\n### Cost Analysis\n- Total tokens used: ~96,000 tokens\n- Input cost: $1.10 per 1M tokens\n- Output cost: $4.40 per 1M tokens\n- Input cost for 96k tokens: ($1.10/1M) × 96k = $0.1056\n- Output cost for 356 tokens: ($4.40/1M) × 356 = $0.0016\n- Total cost: ~$0.11\n- Processing time: 47 seconds\n\nThis demonstrates that processing an entire novel (255 pages) is cost-effective, costing just about 11 cents! This is significantly cheaper than traditional RAG systems which would require:\n1. Vector database storage costs\n2. Multiple API calls for chunking and embedding\n3. Additional API calls for retrieval and generation\n\n### Model Features\n- 200,000 token context window (largest in its class)\n- 100,000 max output tokens\n- Fast, flexible reasoning capabilities\n- Supports structured outputs\n- Supports function calling\n- Supports streaming\n- Batch API support (50% cost reduction for batch processing)\n- Knowledge cutoff: Oct 01, 2023\n\n### Performance Characteristics\n- **Reasoning**: Higher\n- **Speed**: Medium\n- **Input**: Text only\n- **Output**: Text only\n- **Rate Limits**: \n  - Tier 1: 1,000 RPM, 100,000 TPM\n  - Tier 2: 2,000 RPM, 200,000 TPM\n  - Tier 3: 5,000 RPM, 4,000,000 TPM\n  - Higher tiers available for enterprise use\n\n### Actual Cost Verification\n\nHere are the actual OpenAI billing screenshots showing the cost of processing Pride and Prejudice:\n\n\nThe screenshots show a cost difference of $0.20 ($46.59 - $46.39), which aligns with our theoretical calculations and demonstrates the real-world cost-effectiveness of processing large documents directly.\n\n## FAQ\n\n### What is RAG-Killer?\nRAG-Killer is a tool that helps you determine whether you need a RAG (Retrieval Augmented Generation) pipeline or if your content can be processed directly by modern language models. It analyzes your content's size and token requirements to make this architectural decision.\n\n### How does it work?\nThe tool analyzes your input text to:\n1. Calculate basic statistics (words, characters, lines)\n2. Estimate token usage based on the selected model\n3. Compare against the model's context window\n4. Provide recommendations on whether RAG is necessary\n\n### When do I NOT need RAG?\nYou don't need RAG when:\n- Your content fits within the model's context window (e.g., o3-mini's 200k tokens)\n- You're processing static documents that don't need frequent updates\n- Your use case involves analyzing or summarizing complete documents\n- You need to maintain context across the entire document\n- You want to avoid the complexity of managing a vector database\n\n### When do I still need RAG?\nYou still need RAG when:\n- Your content exceeds the model's context window\n- You need to frequently update your knowledge base\n- You're building a search system that requires semantic search\n- You need to combine information from multiple sources dynamically\n- You're building a system that needs to reference external data in real-time\n\n### What models are supported?\nThe tool supports any model defined in the `models.json` configuration file. Currently configured for o3-mini, but you can add more models by updating the configuration.\n\n### How accurate are the token estimates?\nToken estimates are based on average word-to-token ratios and may vary slightly from actual token counts. They provide a good approximation for architectural decisions.\n\n### Can I customize the analysis?\nYes, you can:\n- Modify the model specifications in `models.json`\n- Adjust the input/output file paths in `config.ts`\n- Change the analysis parameters in the code\n\n### What's the compression ratio?\nThe compression ratio shows how much the content was reduced in the summary compared to the original text, helping you understand the efficiency of the summarization.\n\n## Sample Content\n\nThis project uses \"Pride and Prejudice\" by Jane Austen as a sample text to demonstrate the capabilities of large context windows. The text is available in two formats:\n- `book.txt`: The actual text file used for analysis and processing\n- `book.pdf`: A PDF version of the same text, included only for reference to show the physical page count (not used in the model)\n\n## Current Model Support\n\nThis PoC currently focuses on ChatGPT models, but the framework is designed to be extensible. Future versions will include support for other LLM providers and models as they become available with large context windows.\n\n## Context Window Capabilities\n\nThe ChatGPT o3-mini model can handle approximately 200,000 tokens in its context window. To put this into perspective:\n\n- **Token to Word Conversion**: 1,000 tokens ≈ 750 words\n- **Total Words**: 200,000 tokens × 0.75 words/token = 150,000 words\n- **Page Count**: 150,000 words ÷ 500 words/page = 300 A4 pages\n\nThis means the model can process the equivalent of a 300-page document in a single context window, which is approximately:\n- 75,000 to 90,000 words\n- 540,000 characters (including spaces and punctuation)\n\n## Payload Size and Traffic Considerations\n\nWhen considering the practical implementation of this large context window, it's important to analyze the payload size:\n\n- **Character Size**: 540,000 characters × 2 bytes/character (UTF-8) ≈ 1.08 MB\n- **Token Size**: 200,000 tokens × ~4 bytes/token ≈ 800 KB\n- **Total Payload Size**: Approximately 1-2 MB per request (including metadata and formatting)\n\n### Traffic Impact Analysis\n\nWhile the payload size is significant, it's important to consider:\n\n1. **Modern Network Capabilities**:\n   - Most modern networks can handle 1-2 MB requests efficiently\n   - Average broadband speeds (25+ Mbps) can transfer this in under a second\n   - 5G networks can handle this payload size in milliseconds\n\n2. **Cost-Benefit Trade-off**:\n   - The increased payload size is offset by:\n     - Eliminating the need for multiple API calls in RAG systems\n     - Reducing database queries and storage costs\n     - Simplifying the overall architecture\n\n3. **Practical Considerations**:\n   - For most use cases, you won't need the full 200,000 tokens\n   - The context window provides flexibility rather than a requirement\n   - You can still implement chunking for very large documents if needed\n\n## Content Analysis Tool\n\nThe project includes a content analysis tool (`stats.ts`) that helps you understand how your content fits within different model context windows. This tool provides:\n\n### Features\n- Text statistics (words, characters, lines)\n- Token estimation\n- Size analysis in bytes/KB/MB\n- Context window utilization percentage\n- Page estimation\n- Smart recommendations for content chunking\n\n### Currently Supported Models\n- ChatGPT o3-mini (200k token context window)\n- ChatGPT 4o-mini (128k token context window)\n\n### Future Model Support\nThe tool is designed to be easily extensible to support additional LLM providers and models. Future versions will include:\n- Claude 3.5 Sonnet (200k tokens)\n- Gemini 1.5 Pro (1M tokens)\n- Other emerging models with large context windows\n\n### Usage\n1. Place your content in `book.txt`\n2. Run the analysis:\n```bash\nbun run stats.ts\n```\n\n### Example Output\n```\nbun run index.ts                                                                                                                                                                                                                        ─╯\n\nAnalysis for o3-mini (o3-mini):\n==================================================\nModel Description: Fast, flexible, intelligent reasoning model with 200k token context window\nContext Window: 200,000 tokens\n\nPricing (per 1M tokens):\n- Input: $1.1\n- Output: $4.4\n- Cached Input: $0.55\n\nText Statistics:\n- Words: 127,377\n- Characters: 728,841\n- Lines: 14,533\n- Estimated Pages: 255\n\nToken Analysis:\n- Estimated Tokens: 95,533\n- Context Window Utilization: 47.77%\n\nSize Analysis:\n- Character Size: 1.39 MB\n- Token Size: 373.18 KB\n- Total Size: 1.75 MB\n\nCost Analysis:\n- Input (95,533 tokens): $0.1051\n- Cached Input: $0.0525\n- Potential Output Cost: $0.4203\n- Total Cost (input only): $0.1051\n\nRecommendations:\n✅ Content fits well within context window.\n\nContent Analysis:\n==================================================\nModel: o3-mini (o3-mini)\nPrice per 1M tokens:\n- Input: $1.1\n- Output: $4.4\n- Cached Input: $0.55\n\nText Statistics:\nWords: 127,377\nCharacters: 728,841\nEstimated Tokens: 95,533\nContext Window Utilization: 47.77%\nTotal Size: 1.75 MB\n\nGenerating Summary...\n\nSummary Statistics:\n==================================================\nTime taken: 19.98 seconds\nSummary saved to: result.md\nSummary length: 2,078 characters\nSummary tokens: ~1,559\nSummary size: 4.06 KB\n\nCost Analysis:\nInput cost (95,533 tokens): $0.1051\nOutput cost (1,559 tokens): $0.0069\nTotal cost: $0.1119\nSummary tokens: 221\nSummary size: 4.92 KB\n\nCompression Ratio: 99.7%\n```\n\n## Implications\n\nThis large context window suggests that for many use cases:\n- Complex RAG implementations might be unnecessary\n- Direct processing of documents is possible without chunking\n- Real-time analysis of substantial documents is feasible\n- Multiple documents can be processed simultaneously\n\n\n# RAG Approaches: Large Context vs. Vector Database\n\n## 1. Using a Large Context Window for Everything\n\n### Pros\n- **Direct Access**: All reference material is right there in the context, so the model can attend to any part of it directly.\n- **Simpler Architecture**: You might not need an external database since all text is in the prompt.\n\n### Cons\n- **Context-Size Limits**: Even 100k–200k tokens can be used up quickly if your dataset is large.\n- **Cost and Latency**: Larger contexts mean higher pay-per-token costs (if applicable) and slower inference times.\n- **Attention Overhead**: A massive context can introduce noise and reduce performance if relevant info is buried among irrelevant text.\n\n---\n\n## 2. Using a Vector Database (Traditional RAG)\n\n### Pros\n- **Scalability**: Store massive amounts of text. Embeddings and semantic search pull only what's needed.\n- **Efficiency**: Each query includes only the most relevant chunks in the prompt, minimizing token usage and cost.\n- **Better Retrieval for Large Corpora**: Even if the dataset is millions of tokens, the vector DB scales.\n\n### Cons\n- **More Moving Parts**: You must maintain the vector index, manage chunking, and ensure retrieval is well-tuned.\n- **Potential Retrieval Mismatch**: Poorly configured embeddings or chunking can lead to suboptimal context retrieval.\n\n---\n\n## Is Putting Everything in the Context Window Always Higher-Quality?\n\nNot necessarily. It can work well if:\n- Your entire reference text fits comfortably into the context window.\n- You're okay with potential higher cost and latency.\n\nBut if you have a large dataset and need speed or cost-effectiveness, a vector database that retrieves only the relevant passages is typically best.\n\n---\n\n## Bottom Line\n\n1. **Small Dataset (Fits in Context)**:\n   - Loading everything into the prompt can be simpler and effective.\n2. **Large Dataset (Beyond Context Limits)**:\n   - Vector database retrieval is often more practical, scalable, and cost-effective.\n\nThere's no single \"best\" method for all scenarios—choose based on your dataset size, budget, and performance needs.\n\n\n## Setup and Usage\n\nTo install dependencies:\n\n```bash\nbun install\n```\n\nTo run:\n\n```bash\nbun run index.ts\n```\n\nThis project was created using `bun init` in bun v1.2.1. [Bun](https://bun.sh) is a fast all-in-one JavaScript runtime.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faemal%2Frag-killer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faemal%2Frag-killer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faemal%2Frag-killer/lists"}