https://github.com/kimtth/rag-multimodal-semantic-chunking
🖼️📄E2E Multi-modal Document Preprocessing for Search Indexing with Azure Document Intelligence
https://github.com/kimtth/rag-multimodal-semantic-chunking
azure-document-intelligence chunking image-understanding rag-preparation workshop
Last synced: 10 months ago
JSON representation
🖼️📄E2E Multi-modal Document Preprocessing for Search Indexing with Azure Document Intelligence
- Host: GitHub
- URL: https://github.com/kimtth/rag-multimodal-semantic-chunking
- Owner: kimtth
- Created: 2025-07-14T20:00:30.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-31T04:01:07.000Z (10 months ago)
- Last Synced: 2025-07-31T06:33:23.949Z (10 months ago)
- Topics: azure-document-intelligence, chunking, image-understanding, rag-preparation, workshop
- Language: Python
- Homepage:
- Size: 3.24 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## 📄 Multi-modal Document Preprocessing with Azure Document Intelligence
### ✨ Features
1. 📝 Generate a document parsed results using Document Intelligence, and output it in Markdown format. > [output](./output/contoso_raw.md)
1. 🖼️ Extract figures from documents and save them as PNG images. > [output](./output/contoso_figure_2_region_1.png)
1. 🤖 Generate figure descriptions using Azure OpenAI Multimodal.
1. 📝 Update markdown outputs with generated descriptions. > [output](./output/contoso_updated.md)
1. 📊 Extract tables and convert them into Excel files. > [output](./output/contoso_tables.xlsx)
1. 📖 Text Chunking to markdown ouputs using `MarkdownHeaderTextSplitter`, `RecursiveContentChunker`, and `SemanticContentChunker (TBD)` > [markdown chuck output](./output/chunks_contents.json) | [recursive chunk output](./output/chunks_recursive.json)
### 🚀 Usage
```
python doc_intelli_workflow.py
```
### 📚 Learn More
- [📘 Document Intelligence Official Samples](https://github.com/Azure-Samples/document-intelligence-code-samples): Python (v4.0) / RAG samples / Figure understanding.
- [Build Intelligent RAG for Multimodality and Complex Document Structures](https://techcommunity.microsoft.com/blog/azure-ai-services-blog/build-intelligent-rag-for-multimodality-and-complex-document-structure/4118184)
- [nohanaga: Document Intelligence Samples](https://github.com/nohanaga/document-intelligence-samples) | [🔗 Output](https://qiita.com/nohanaga/items/1263f4a6bc909b6524c8): Article in Japanese
- [📖 7 Chunking Strategies for Langchain](https://medium.com/@anixlynch/7-chunking-strategies-for-langchain-b50dac194813#6da7)
- [MarkdownHeaderTextSplitter](https://python.langchain.com/docs/how_to/markdown_header_metadata_splitter/)
- [⚡ Azure Functions vs. Indexers: AI Data Ingestion](https://devblogs.microsoft.com/ise/unlock-ai-search-potential-the-case-for-azure-functions-in-data-ingestion/)
- [RAG Time: Mastering RAG](https://github.com/microsoft/rag-time)