https://github.com/awslabs/rhubarb
A Python framework for multi-modal document understanding with Amazon Bedrock
https://github.com/awslabs/rhubarb
amazon-bedrock document-processing generative-ai intelligent-document-processing multi-modal
Last synced: 9 days ago
JSON representation
A Python framework for multi-modal document understanding with Amazon Bedrock
- Host: GitHub
- URL: https://github.com/awslabs/rhubarb
- Owner: awslabs
- License: apache-2.0
- Created: 2024-04-17T01:26:01.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2026-02-11T15:35:18.000Z (about 2 months ago)
- Last Synced: 2026-02-19T07:12:38.875Z (about 2 months ago)
- Topics: amazon-bedrock, document-processing, generative-ai, intelligent-document-processing, multi-modal
- Language: Python
- Homepage:
- Size: 32.7 MB
- Stars: 102
- Watchers: 5
- Forks: 14
- Open Issues: 27
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Notice: NOTICE
Awesome Lists containing this project
README
[](https://aws.amazon.com/bedrock/)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/)
[](https://www.python.org/downloads/release/python-311/)
[](https://github.com/astral-sh/ruff)
# Rhubarb
Rhubarb is a light-weight Python framework that makes it easy to build document and video understanding applications using Multi-modal Large Language Models (LLMs) and Embedding models. Rhubarb is created from the ground up to work with Amazon Bedrock and supports multiple foundation models including Anthropic Claude Multi-modal Language Models and Amazon Nova models for document and video processing, along with Amazon Titan Multi-modal Embedding model for embeddings.
## What can I do with Rhubarb?
Visit Rhubarb [documentation](https://awslabs.github.io/rhubarb/index.html#).
Rhubarb can do multiple document processing tasks such as
- ✅ Document Q&A
- ✅ Streaming chat with documents (Q&A)
- ✅ Document Summarization
- 🚀 Page level summaries
- 🚀 Full summaries
- 🚀 Summaries of specific pages
- 🚀 Streaming Summaries
- ✅ Structured data extraction
- ✅ Extraction Schema creation assistance
- ✅ Named entity recognition (NER)
- 🚀 With 50 built-in common entities
- ✅ PII recognition with built-in entities
- ✅ Figure and image understanding from documents
- 🚀 Explain charts, graphs, and figures
- 🚀 Perform table reasoning (as figures)
- ✅ Large document processing with sliding window approach
- ✅ Document Classification with vector sampling using multi-modal embedding models
- ✅ Logs token usage to help keep track of costs
### Video Analysis (New!)
- ✅ Video summarization
- ✅ Entity extraction from videos
- ✅ Action and movement analysis
- ✅ Text extraction from video frames
- ✅ Streaming video analysis responses
Rhubarb comes with built-in system prompts that makes it easy to use it for a number of different document understanding use-cases. You can customize Rhubarb by passing in your own system prompts. It supports exact JSON schema based output generation which makes it easy to integrate into downstream applications.
- Supports PDF, TIFF, PNG, JPG, DOCX files (support for Excel, PowerPoint, CSV, Webp, eml files coming soon)
- Supports MP4, AVI, MOV, and other common video formats for video analysis (S3 storage required)
- Performs document to image conversion internally to work with the multi-modal models
- Works on local files or files stored in S3
- Supports specifying page numbers for multi-page documents
- Supports chat-history based chat for documents
- Supports streaming and non-streaming mode
- Supports Converse API
- Supports Cross-Region Inference
## MCP Server Integration
Rhubarb now includes a built-in **FastMCP server** that exposes all document and video understanding capabilities through the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). This allows seamless integration with MCP-compatible AI assistants like Cline, Claude Desktop, and other MCP clients.
### MCP Features
- **8 Tools**: Complete access to all Rhubarb capabilities including document analysis, video processing, entity extraction, and document classification
- **4 Resources**: Built-in discovery for entities, models, schemas, and classification samples
- **Native Python**: Direct integration without external dependencies
- **Conversation Memory**: Maintains chat history across interactions
- **Flexible Authentication**: Support for AWS profiles, access keys, and environment variables
### Quick Start with MCP
1. **No installation required** - The MCP server auto-installs when first used
2. **Configure in your MCP client** (example for Cline):
```json
{
"rhubarb": {
"command": "uvx",
"args": [
"pyrhubarb-mcp@latest",
"--aws-profile", "my-profile",
"--default-model", "claude-sonnet"
]
}
}
```
3. **Alternative configurations**:
```json
{
"rhubarb": {
"command": "uvx",
"args": [
"pyrhubarb-mcp@latest",
"--aws-access-key-id", "AKIA...",
"--aws-secret-access-key", "your-secret",
"--aws-region", "us-west-2"
]
}
}
```
For detailed MCP server documentation, see [README_MCP.md](README_MCP.md).
## Installation
Start by installing Rhubarb using `pip`.
```
pip install pyrhubarb
```
### Usage
Create a `boto3` session.
```python
import boto3
session = boto3.Session()
```
#### Call Rhubarb
Local file
```python
from rhubarb import DocAnalysis
da = DocAnalysis(file_path="./path/to/doc/doc.pdf",
boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp
```
With file in Amazon S3
```python
from rhubarb import DocAnalysis
da = DocAnalysis(file_path="s3://path/to/doc/doc.pdf",
boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp
```
#### Video Analysis
```python
from rhubarb import VideoAnalysis
import boto3
session = boto3.Session()
# Initialize video analysis with a video in S3
va = VideoAnalysis(
file_path="s3://my-bucket/my-video.mp4",
boto3_session=session
)
# Ask questions about the video
response = va.run(message="What is happening in this video?")
print(response)
```
#### Large Document Processing
Rhubarb supports processing documents with more than 20 pages using a sliding window approach. This feature is particularly useful when working with Claude models, which have a limitation of processing only 20 pages at a time.
To enable this feature, set `sliding_window_overlap` to a value between 1 and 10 when creating a `DocAnalysis` object:
```python
doc_analysis = DocAnalysis(
file_path="path/to/large-document.pdf",
boto3_session=session,
sliding_window_overlap=2 # Number of pages to overlap between windows (1-10)
)
```
When the sliding window approach is enabled, Rhubarb will:
1. Break the document into chunks of 20 pages
2. Process each chunk separately
3. Combine the results from all chunks
Note: The sliding window technique is not yet supported for document classification. When using classification with large documents, only the first 20 pages will be considered.
For more details, see the [Large Document Processing Cookbook](cookbooks/2-large-document-processing.ipynb).
For more usage examples see [cookbooks](./cookbooks/).
## Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
## License
This project is licensed under the Apache-2.0 License.