https://github.com/nvidia-ai-blueprints/multimodal-pdf-data-extraction

NIM Agent Blueprint for multimodal PDF data extraction for enterprise RAG
https://github.com/nvidia-ai-blueprints/multimodal-pdf-data-extraction

Last synced: 5 months ago
JSON representation

NIM Agent Blueprint for multimodal PDF data extraction for enterprise RAG

Host: GitHub
URL: https://github.com/nvidia-ai-blueprints/multimodal-pdf-data-extraction
Owner: NVIDIA-AI-Blueprints
Created: 2024-08-22T16:46:31.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-09-24T20:40:48.000Z (about 1 year ago)
Last Synced: 2024-11-12T20:41:50.183Z (11 months ago)
Homepage: https://build.nvidia.com/nvidia/multimodal-pdf-data-extraction-for-enterprise-rag
Size: 2.16 MB
Stars: 44
Watchers: 3
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

NVIDIA AI Blueprint: Multimodal PDF Data Extraction

![Multimodal PDF Data Extraction for Enterprise RAG-r2](https://github.com/user-attachments/assets/3f33a00b-0d72-4221-a250-04771cb703cc)

Rapidly ingest massive volumes of PDF documents. Extract text, graphs, charts, and tables for highly accurate retrieval.

## Introduction

This blueprint is based on [NVIDIA-Ingest](https://github.com/NVIDIA/nv-ingest) -- a scalable, performance-oriented document content and metadata extraction microservice. It includes support for parsing PDFs, Word and PowerPoint documents, using specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images for use in downstream generative applications.

NVIDIA Ingest enables parallel document splitting to rapidly extract data from many documents at the same time.

## Get Started

1. Apply for [Early Access](https://developer.nvidia.com/nemo-microservices).
2. Follow the getting started documentation [here](https://github.com/NVIDIA/nv-ingest).

**NOTE** -- the downloadable blueprint deploys the document ingestion pipeline. It does not include a retrieval pipeline.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nvidia-ai-blueprints/multimodal-pdf-data-extraction

Awesome Lists containing this project

README

NVIDIA AI Blueprint: Multimodal PDF Data Extraction