{"id":20207855,"url":"https://github.com/osllmai/indox","last_synced_at":"2025-04-10T12:54:31.570Z","repository":{"id":233485833,"uuid":"776598834","full_name":"osllmai/inDox","owner":"osllmai","description":"Indox is an advanced search and retrieval technique that efficiently extracts data from diverse document types, including PDFs and HTML, using online or offline large language models such as Openai, Hugging Face , etc. ","archived":false,"fork":false,"pushed_at":"2025-03-08T09:58:22.000Z","size":104432,"stargazers_count":18,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-03-10T02:48:32.801Z","etag":null,"topics":["ai","document","index","llm","ml","rag","structured-data","unstructured-data"],"latest_commit_sha":null,"homepage":"https://docs.osllm.ai/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/osllmai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-24T00:09:28.000Z","updated_at":"2025-03-08T09:58:28.000Z","dependencies_parsed_at":"2024-04-21T20:54:59.379Z","dependency_job_id":"bf3a9e6a-a550-4348-89f2-cfa5b722d14b","html_url":"https://github.com/osllmai/inDox","commit_stats":null,"previous_names":["osllmai/indox"],"tags_count":50,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/osllmai%2FinDox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/osllmai%2FinDox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/osllmai%2FinDox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/osllmai%2FinDox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/osllmai","download_url":"https://codeload.github.com/osllmai/inDox/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248221483,"owners_count":21067560,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","document","index","llm","ml","rag","structured-data","unstructured-data"],"created_at":"2024-11-14T05:32:50.017Z","updated_at":"2025-04-10T12:54:31.563Z","avatar_url":"https://github.com/osllmai.png","language":"Jupyter Notebook","readme":"\u003cdiv align=\"center\"\u003e\n  \u003ch1\u003eIndox Ecosystem\u003c/h1\u003e\n  \u003ca href=\"https://github.com/osllmai/indoxArcg\"\u003e\n    \u003cimg src=\"https://readme-typing-svg.demolab.com?font=Georgia\u0026size=16\u0026duration=3000\u0026pause=500\u0026multiline=true\u0026width=700\u0026height=100\u0026lines=Indox+Ecosystem;Advanced+Search+%7C+Data+Mining+%7C+LLM+Evaluation+%7C+Synthetic+Data;Copyright+©️+OSLLAM.ai\" alt=\"Typing SVG\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/osllmai/inDox/blob/master/docs/indoxArcg/assets/lite-logo%201.png\" alt=\"inDox Lite Logo\"\u003e\n\u003c/p\u003e\n\u003c/br\u003e\n\n[![License](https://img.shields.io/github/license/osllmai/inDox)](https://github.com/osllmai/inDox/blob/master/LICENSE)\n[![Discord](https://img.shields.io/discord/1223867382460579961?label=Discord\u0026logo=Discord\u0026style=social)](https://discord.com/invite/ossllmai)\n\n\u003c!-- [![GitHub stars](https://img.shields.io/github/stars/osllmai/indoxArcg?style=social)](https://github.com/osllmai/inDox) --\u003e\n\n[Official Website](https://osllm.ai) • [Documentation](https://docs.osllm.ai/index.html) • [Discord](https://discord.gg/xGz5tQYaeq)\n\n**NEW:** [Subscribe to our mailing list](https://docs.google.com/forms/d/1CQXJvxLUqLBSXnjqQmRpOyZqD6nrKubLz2WTcIJ37fU/prefill) for updates and news!\n\n\u003c/div\u003e\n\n## 🌟 The Indox Ecosystem\n\nThe Indox Ecosystem is a comprehensive suite of tools designed to revolutionize your AI and data workflows. Our ecosystem consists of four powerful components:\n\n### 1. 🔍 [IndoxArcg](https://github.com/osllmai/indoxArcg)\n\nAdvanced **Retrieval-Augmented Generation (RAG)** and **Cache-Augmented Generation (CAG)** system for intelligent information extraction and processing.\n\n## Key Features:\n\n- **Multi-format document support**: Handles PDF, HTML, Markdown, LaTeX, and more.\n- **Intelligent clustering and chunk processing**: Organizes and processes documents for efficient retrieval.\n- **Support for major LLM providers**: Compatible with OpenAI, Google, Mistral, HuggingFace, Ollama, and others.\n- **Advanced RAG features**:\n  - Semantic caching for faster retrieval.\n  - Multi-query retrieval for improved context extraction.\n  - Reranking and relevance scoring for high-quality results.\n- **Cache-Augmented Generation (CAG)**:\n  - Preloading and caching of documents for faster inference.\n  - Smart retrieval with validation and hallucination detection.\n  - Web search fallback for missing or insufficient context.\n- **Customizable similarity search**: Supports TF-IDF, BM25, and Jaccard similarity algorithms.\n- **Robust error handling**: Includes fallback mechanisms for retrieval failures and hallucination detection.\n\n### 2. ⛏️ [IndoxMiner](https://github.com/osllmai/indoxMiner)\n\nPowerful data extraction and mining tool leveraging LLMs.\n\n- Schema-based structured data extraction\n- Multi-format support with OCR capabilities\n- Flexible validation and type safety\n- Async processing for scalability\n- High-resolution PDF support\n\n### 3. 📊 [IndoxJudge](https://github.com/osllmai/indoxJudge)\n\nComprehensive LLM and RAG evaluation framework.\n\n- Multiple evaluation metrics (Faithfulness, Toxicity, BertScore, etc.)\n- Safety and bias assessment\n- Multi-model comparison capabilities\n- RAG-specific evaluation metrics\n- Extensible framework for custom metrics\n\n### 4. 🔄 [IndoxGen](https://github.com/osllmai/indoxGen)\n\nAdvanced synthetic data generation suite with three specialized components:\n\n- **IndoxGen Core**: LLM-powered synthetic data generation\n- **IndoxGen-Tensor**: TensorFlow-based GAN data generation\n- **IndoxGen-Torch**: PyTorch-based GAN data generation\n\n## 📦 Quick Installation\n\nInstall the entire ecosystem:\n\n```bash\npip install indoxArcg indoxminer indoxjudge indoxgen indoxgen-tensor indoxgen-torch\n```\n\nOr install components separately:\n\n```bash\npip install indoxArcg       # Core RAG or Cag functionality\npip install indoxminer     # Data extraction\npip install indoxjudge     # LLM evaluation\npip install indoxgen       # Synthetic data generation\n```\n\n## 🚀 Model Support\n\n| Model Provider | indoxArcg | IndoxJudge | IndoxGen |\n| -------------- | --------- | ---------- | -------- |\n| OpenAI         | ✅        | ✅         | ✅       |\n| Google         | ✅        | ✅         | ✅       |\n| Mistral        | ✅        | ✅         | ✅       |\n| HuggingFace    | ✅        | ✅         | ✅       |\n| Ollama         | ✅        | ✅         | ❌       |\n| Anthropic      | ❌        | ❌         | ❌       |\n\n## 💡 Getting Started\n\nCheck out our example notebooks:\n\n- [indoxArcg Pipeline](https://colab.research.google.com/github/osllmai/indoxArcg/blob/master/Demo/indox_api_openai.ipynb)\n- [IndoxJudge Evaluation](https://colab.research.google.com/github/osllmai/indoxArcg/blob/master/Demo/indoxJudge_evaluation.ipynb)\n- [IndoxMiner Extraction](examples/indoxminer_extraction.ipynb)\n- [IndoxGen Data Generation](examples/indoxgen_synthetic.ipynb)\n\n## 🛣️ Roadmap\n\n- [ ] Unified web interface for all components\n- [ ] Docker support across the ecosystem\n- [ ] Enhanced integration between components\n- [ ] Advanced privacy and security features\n- [ ] Multi-language support expansion\n- [ ] Additional model provider integrations\n\n## 🤝 Contributing\n\nWe welcome contributions to any component of the Indox ecosystem! Please check our [Contributing Guidelines](CONTRIBUTING.md) for more information.\n\n## 📄 License\n\nThis project is licensed under the AGPL License - see the [LICENSE](https://github.com/osllmai/inDox/blob/master/LICENSE) file for details.\n\n\u003c!--\n## 🌟 Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=osllmai/indoxArcg,osllmai/indoxMiner,osllmai/indoxJudge,osllmai/indoxGen\u0026type=Date)](https://star-history.com/#osllmai/indoxArcg\u0026osllmai/indoxMiner\u0026osllmai/indoxJudge\u0026osllmai/indoxGen) --\u003e\n\n---\n\n\n\n```txt\n  .----------------.  .-----------------. .----------------.  .----------------.  .----------------.\n| .--------------. || .--------------. || .--------------. || .--------------. || .--------------. |\n| |     _____    | || | ____  _____  | || |  ________    | || |     ____     | || |  ____  ____  | |\n| |    |_   _|   | || ||_   \\|_   _| | || | |_   ___ `.  | || |   .'    `.   | || | |_  _||_  _| | |\n| |      | |     | || |  |   \\ | |   | || |   | |   `. \\ | || |  /  .--.  \\  | || |   \\ \\  / /   | |\n| |      | |     | || |  | |\\ \\| |   | || |   | |    | | | || |  | |    | |  | || |    \u003e `' \u003c    | |\n| |     _| |_    | || | _| |_\\   |_  | || |  _| |___.' / | || |  \\  `--'  /  | || |  _/ /'`\\ \\_  | |\n| |    |_____|   | || ||_____|\\____| | || | |________.'  | || |   `.____.'   | || | |____||____| | |\n| |              | || |              | || |              | || |              | || |              | |\n| '--------------' || '--------------' || '--------------' || '--------------' || '--------------' |\n  '----------------'  '----------------'  '----------------'  '----------------'  '----------------'\n```\n\n\u003cdiv align=\"center\"\u003e\n  Made with ❤️ by OSLLM.ai\n\u003c/div\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fosllmai%2Findox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fosllmai%2Findox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fosllmai%2Findox/lists"}