https://github.com/osllmai/indox
Indox is an advanced search and retrieval technique that efficiently extracts data from diverse document types, including PDFs and HTML, using online or offline large language models such as Openai, Hugging Face , etc.
https://github.com/osllmai/indox
ai document index llm ml rag structured-data unstructured-data
Last synced: about 1 year ago
JSON representation
Indox is an advanced search and retrieval technique that efficiently extracts data from diverse document types, including PDFs and HTML, using online or offline large language models such as Openai, Hugging Face , etc.
- Host: GitHub
- URL: https://github.com/osllmai/indox
- Owner: osllmai
- License: agpl-3.0
- Created: 2024-03-24T00:09:28.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2025-03-08T09:58:22.000Z (about 1 year ago)
- Last Synced: 2025-03-10T02:48:32.801Z (about 1 year ago)
- Topics: ai, document, index, llm, ml, rag, structured-data, unstructured-data
- Language: Jupyter Notebook
- Homepage: https://docs.osllm.ai/
- Size: 99.6 MB
- Stars: 18
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/osllmai/inDox/blob/master/LICENSE)
[](https://discord.com/invite/ossllmai)
[Official Website](https://osllm.ai) • [Documentation](https://docs.osllm.ai/index.html) • [Discord](https://discord.gg/xGz5tQYaeq)
**NEW:** [Subscribe to our mailing list](https://docs.google.com/forms/d/1CQXJvxLUqLBSXnjqQmRpOyZqD6nrKubLz2WTcIJ37fU/prefill) for updates and news!
## 🌟 The Indox Ecosystem
The Indox Ecosystem is a comprehensive suite of tools designed to revolutionize your AI and data workflows. Our ecosystem consists of four powerful components:
### 1. 🔍 [IndoxArcg](https://github.com/osllmai/indoxArcg)
Advanced **Retrieval-Augmented Generation (RAG)** and **Cache-Augmented Generation (CAG)** system for intelligent information extraction and processing.
## Key Features:
- **Multi-format document support**: Handles PDF, HTML, Markdown, LaTeX, and more.
- **Intelligent clustering and chunk processing**: Organizes and processes documents for efficient retrieval.
- **Support for major LLM providers**: Compatible with OpenAI, Google, Mistral, HuggingFace, Ollama, and others.
- **Advanced RAG features**:
- Semantic caching for faster retrieval.
- Multi-query retrieval for improved context extraction.
- Reranking and relevance scoring for high-quality results.
- **Cache-Augmented Generation (CAG)**:
- Preloading and caching of documents for faster inference.
- Smart retrieval with validation and hallucination detection.
- Web search fallback for missing or insufficient context.
- **Customizable similarity search**: Supports TF-IDF, BM25, and Jaccard similarity algorithms.
- **Robust error handling**: Includes fallback mechanisms for retrieval failures and hallucination detection.
### 2. ⛏️ [IndoxMiner](https://github.com/osllmai/indoxMiner)
Powerful data extraction and mining tool leveraging LLMs.
- Schema-based structured data extraction
- Multi-format support with OCR capabilities
- Flexible validation and type safety
- Async processing for scalability
- High-resolution PDF support
### 3. 📊 [IndoxJudge](https://github.com/osllmai/indoxJudge)
Comprehensive LLM and RAG evaluation framework.
- Multiple evaluation metrics (Faithfulness, Toxicity, BertScore, etc.)
- Safety and bias assessment
- Multi-model comparison capabilities
- RAG-specific evaluation metrics
- Extensible framework for custom metrics
### 4. 🔄 [IndoxGen](https://github.com/osllmai/indoxGen)
Advanced synthetic data generation suite with three specialized components:
- **IndoxGen Core**: LLM-powered synthetic data generation
- **IndoxGen-Tensor**: TensorFlow-based GAN data generation
- **IndoxGen-Torch**: PyTorch-based GAN data generation
## 📦 Quick Installation
Install the entire ecosystem:
```bash
pip install indoxArcg indoxminer indoxjudge indoxgen indoxgen-tensor indoxgen-torch
```
Or install components separately:
```bash
pip install indoxArcg # Core RAG or Cag functionality
pip install indoxminer # Data extraction
pip install indoxjudge # LLM evaluation
pip install indoxgen # Synthetic data generation
```
## 🚀 Model Support
| Model Provider | indoxArcg | IndoxJudge | IndoxGen |
| -------------- | --------- | ---------- | -------- |
| OpenAI | ✅ | ✅ | ✅ |
| Google | ✅ | ✅ | ✅ |
| Mistral | ✅ | ✅ | ✅ |
| HuggingFace | ✅ | ✅ | ✅ |
| Ollama | ✅ | ✅ | ❌ |
| Anthropic | ❌ | ❌ | ❌ |
## 💡 Getting Started
Check out our example notebooks:
- [indoxArcg Pipeline](https://colab.research.google.com/github/osllmai/indoxArcg/blob/master/Demo/indox_api_openai.ipynb)
- [IndoxJudge Evaluation](https://colab.research.google.com/github/osllmai/indoxArcg/blob/master/Demo/indoxJudge_evaluation.ipynb)
- [IndoxMiner Extraction](examples/indoxminer_extraction.ipynb)
- [IndoxGen Data Generation](examples/indoxgen_synthetic.ipynb)
## 🛣️ Roadmap
- [ ] Unified web interface for all components
- [ ] Docker support across the ecosystem
- [ ] Enhanced integration between components
- [ ] Advanced privacy and security features
- [ ] Multi-language support expansion
- [ ] Additional model provider integrations
## 🤝 Contributing
We welcome contributions to any component of the Indox ecosystem! Please check our [Contributing Guidelines](CONTRIBUTING.md) for more information.
## 📄 License
This project is licensed under the AGPL License - see the [LICENSE](https://github.com/osllmai/inDox/blob/master/LICENSE) file for details.
---
```txt
.----------------. .-----------------. .----------------. .----------------. .----------------.
| .--------------. || .--------------. || .--------------. || .--------------. || .--------------. |
| | _____ | || | ____ _____ | || | ________ | || | ____ | || | ____ ____ | |
| | |_ _| | || ||_ \|_ _| | || | |_ ___ `. | || | .' `. | || | |_ _||_ _| | |
| | | | | || | | \ | | | || | | | `. \ | || | / .--. \ | || | \ \ / / | |
| | | | | || | | |\ \| | | || | | | | | | || | | | | | | || | > `' < | |
| | _| |_ | || | _| |_\ |_ | || | _| |___.' / | || | \ `--' / | || | _/ /'`\ \_ | |
| | |_____| | || ||_____|\____| | || | |________.' | || | `.____.' | || | |____||____| | |
| | | || | | || | | || | | || | | |
| '--------------' || '--------------' || '--------------' || '--------------' || '--------------' |
'----------------' '----------------' '----------------' '----------------' '----------------'
```
Made with ❤️ by OSLLM.ai