{"id":34063128,"url":"https://github.com/awslabs/rhubarb","last_synced_at":"2026-04-02T10:59:43.293Z","repository":{"id":234344842,"uuid":"787680875","full_name":"awslabs/rhubarb","owner":"awslabs","description":"A Python framework for multi-modal document understanding with Amazon Bedrock","archived":false,"fork":false,"pushed_at":"2026-02-11T15:35:18.000Z","size":34328,"stargazers_count":102,"open_issues_count":27,"forks_count":14,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-02-19T07:12:38.875Z","etag":null,"topics":["amazon-bedrock","document-processing","generative-ai","intelligent-document-processing","multi-modal"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awslabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-04-17T01:26:01.000Z","updated_at":"2026-02-12T19:10:54.000Z","dependencies_parsed_at":"2025-09-09T12:12:53.848Z","dependency_job_id":null,"html_url":"https://github.com/awslabs/rhubarb","commit_stats":null,"previous_names":["awslabs/rhubarb"],"tags_count":9,"template":false,"template_full_name":"amazon-archives/__template_Apache-2.0","purl":"pkg:github/awslabs/rhubarb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Frhubarb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Frhubarb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Frhubarb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Frhubarb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awslabs","download_url":"https://codeload.github.com/awslabs/rhubarb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awslabs%2Frhubarb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30635280,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T22:38:22.569Z","status":"ssl_error","status_checked_at":"2026-03-17T22:38:11.804Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon-bedrock","document-processing","generative-ai","intelligent-document-processing","multi-modal"],"created_at":"2025-12-14T05:06:44.050Z","updated_at":"2026-04-02T10:59:43.278Z","avatar_url":"https://github.com/awslabs.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./assets/Rhubarb@0.5x.png\" alt=\"Rhubarb\" width=\"400\"/\u003e\n\u003c/p\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\n[![Amazon Bedrock](https://img.shields.io/badge/Amazon%20Bedrock-8A2BE2?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAALUAAAC1BAMAAADrfaOaAAAAD1BMVEUAAAD///////////////+PQt5oAAAABHRSTlMAgL9ARyeO/QAABIpJREFUeNrt22GSmyAUwHGJPUBsOEC6cgBicoCq3P9MrRryT5CAGJxuZ3yftqbz61t44QlMiz322GOPPfb4XlHqYqsojamKbaJUxpj2WGwRA222SV2O8iapN4aoNqB7ITdIXYzi3x+q7KmXA9eN1f1IXWerPkuTep+x+qYhJvUuX/XVz08q125Ut2oGGk+WQr08udmpTqaxCfn85GDGOKZXX9y2pbOq+lTQFuYeek31yaD9w5juKpQx5zXVF7blONSCkk+pvoitppFWSSUvJjpmG/N7KsQuMe2+iNnlYE/Dnma3xRL7vMrWC2w7JsttgPh4tzzKZVOoepz4Nr99G1U5DE12+zAtC8Yc89ulYT1bb1+vemaPgzLG+SObIcUegiaX3xaSXp3PZjqPxW7/D/alOm1h29jt3f52tvl6CpVsK2fhLJ9tJxJt6b5Ty2x2Y1/YeRCydRr9iguyc+zUTcPB2Dg9vebXBXb1Ejrx9bjTklwZfUroo51w+cCb6cF6G9qSYqoBBvtj29IWb68MNvbnm2xBJSy2r6dgiZDkZaI7HbcZ0e647Pso72O+1FaBepd8Al4XS+0DQxjY5pAJUtyWoUVAzqUEW9g3/XPQbvQK+4fdobQhuzHnFbacpt2YLmCz+ffa5ssfY8LsNue2rdKA/T7a+9Bov22XFJ1s2/5xeGuzWq2wezq/z2ZJWWv/eDvejaUDduWP+/dMvrGZkhU1eLMnTP769jX4NmqznHSnYbr6t3annY5/WmiXwRcW+UoL/m7KWtW9/RCoVCQSt8ml9tszursojtOxA4kzP8GzVvp8G7SJSnW1jtn0zgnvozYRt8Vj5H4y4jlsDkT5puW0b1SjbDPbQvPjOatNxPvOL53Nnkef3ya2tM8b2m02u3KC1p1gl78W/PsLD3vl7D1eezaIna/hRO3brKOdZ7TPGTb1sUu7w8vFqnAnyW4QPTlNUUeL4EhvwOaRxxY0qDhO85zT2IRaUovCsGUlj1ivZZ+uo3h7b2X1/AyhdmuQS7sLX9AgzmA7tFPfaZd2TEynPRf1ftssvbSTTPrsDMGx0y7twGsf7dhJl1/grac2Z33nlHZpx2XWfB7na6xOzJtf3KFdm5rjtjPJpkQCthoTFolNgnkM2Fza9Uk29LzvPOzD8ItxabfQpkT8d7qpl3YAzGPIZqTq5TZ02ObSbrFNicRtoZiUBTbzGLRtXK5pL50COmAnBQDH8vntG/OY3T5A57SJf2H/ZFeb2WYXnN8W7Eey24rWk9sWLOEf2+WRj+yrlKYdfGKXqucj54T0M5sn2JyQfmg390njIxU4IU3a+Ag7adiBE9KkjQ9vtdjBE9Loxufo7fSMSb9iTGju/AEBmxPS9XhzH2xsTkjL1Bpktzfgwn/6exiGqWTSk/GWeXRs54R0Fc48zr9EUxRr4qfnRhibE9JVIT0rXanSe0N84zM/K6iM+dLFetw5WXZSXS2zSybpWRf7GCdpiiYbTtKmzkNTie0j6Wv+/4KQP2lwks6Nk3T2EFskDU7S+fFrsccee+yxxx7fP/4AG81mLMegln0AAAAASUVORK5CYII=)](https://aws.amazon.com/bedrock/)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)\n[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-311/)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\n\u003c/div\u003e\n\n\u003c!-- [![PyPI pyversions](https://img.shields.io/pypi/pyversions/ansicolortags.svg)](https://pypi.python.org/pypi/ansicolortags/) --\u003e\n\n# Rhubarb\n\nRhubarb is a light-weight Python framework that makes it easy to build document and video understanding applications using Multi-modal Large Language Models (LLMs) and Embedding models. Rhubarb is created from the ground up to work with Amazon Bedrock and supports multiple foundation models including Anthropic Claude Multi-modal Language Models and Amazon Nova models for document and video processing, along with Amazon Titan Multi-modal Embedding model for embeddings.\n\n\n## What can I do with Rhubarb?\n\nVisit Rhubarb [documentation](https://awslabs.github.io/rhubarb/index.html#).\n\nRhubarb can do multiple document processing tasks such as\n\n- ✅ Document Q\u0026A\n- ✅ Streaming chat with documents (Q\u0026A)\n- ✅ Document Summarization\n  - 🚀 Page level summaries\n  - 🚀 Full summaries\n  - 🚀 Summaries of specific pages\n  - 🚀 Streaming Summaries\n- ✅ Structured data extraction\n- ✅ Extraction Schema creation assistance\n- ✅ Named entity recognition (NER) \n  - 🚀 With 50 built-in common entities\n- ✅ PII recognition with built-in entities\n- ✅ Figure and image understanding from documents\n  - 🚀 Explain charts, graphs, and figures\n  - 🚀 Perform table reasoning (as figures)\n- ✅ Large document processing with sliding window approach\n- ✅ Document Classification with vector sampling using multi-modal embedding models\n- ✅ Logs token usage to help keep track of costs\n\n### Video Analysis (New!)\n- ✅ Video summarization\n- ✅ Entity extraction from videos\n- ✅ Action and movement analysis\n- ✅ Text extraction from video frames\n- ✅ Streaming video analysis responses\n\nRhubarb comes with built-in system prompts that makes it easy to use it for a number of different document understanding use-cases. You can customize Rhubarb by passing in your own system prompts. It supports exact JSON schema based output generation which makes it easy to integrate into downstream applications.\n\n- Supports PDF, TIFF, PNG, JPG, DOCX files (support for Excel, PowerPoint, CSV, Webp, eml files coming soon)\n- Supports MP4, AVI, MOV, and other common video formats for video analysis (S3 storage required)\n- Performs document to image conversion internally to work with the multi-modal models\n- Works on local files or files stored in S3\n- Supports specifying page numbers for multi-page documents\n- Supports chat-history based chat for documents\n- Supports streaming and non-streaming mode\n- Supports Converse API \n- Supports Cross-Region Inference\n\n## MCP Server Integration\n\nRhubarb now includes a built-in **FastMCP server** that exposes all document and video understanding capabilities through the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). This allows seamless integration with MCP-compatible AI assistants like Cline, Claude Desktop, and other MCP clients.\n\n### MCP Features\n- **8 Tools**: Complete access to all Rhubarb capabilities including document analysis, video processing, entity extraction, and document classification\n- **4 Resources**: Built-in discovery for entities, models, schemas, and classification samples  \n- **Native Python**: Direct integration without external dependencies\n- **Conversation Memory**: Maintains chat history across interactions\n- **Flexible Authentication**: Support for AWS profiles, access keys, and environment variables\n\n### Quick Start with MCP\n\n1. **No installation required** - The MCP server auto-installs when first used\n\n2. **Configure in your MCP client** (example for Cline):\n   ```json\n   {\n     \"rhubarb\": {\n       \"command\": \"uvx\",\n       \"args\": [\n         \"pyrhubarb-mcp@latest\",\n         \"--aws-profile\", \"my-profile\",\n         \"--default-model\", \"claude-sonnet\"\n       ]\n     }\n   }\n   ```\n\n3. **Alternative configurations**:\n   ```json\n   {\n     \"rhubarb\": {\n       \"command\": \"uvx\", \n       \"args\": [\n         \"pyrhubarb-mcp@latest\",\n         \"--aws-access-key-id\", \"AKIA...\",\n         \"--aws-secret-access-key\", \"your-secret\",\n         \"--aws-region\", \"us-west-2\"\n       ]\n     }\n   }\n   ```\n\nFor detailed MCP server documentation, see [README_MCP.md](README_MCP.md).\n\n## Installation\n\nStart by installing Rhubarb using `pip`.\n\n```\npip install pyrhubarb\n```\n\n### Usage\n\nCreate a `boto3` session.\n\n```python\nimport boto3\nsession = boto3.Session()\n```\n\n#### Call Rhubarb\n\nLocal file\n\n```python\nfrom rhubarb import DocAnalysis\n\nda = DocAnalysis(file_path=\"./path/to/doc/doc.pdf\", \n                 boto3_session=session)\nresp = da.run(message=\"What is the employee's name?\")\nresp\n```\n\nWith file in Amazon S3\n\n```python\nfrom rhubarb import DocAnalysis\n\nda = DocAnalysis(file_path=\"s3://path/to/doc/doc.pdf\", \n                 boto3_session=session)\nresp = da.run(message=\"What is the employee's name?\")\nresp\n```\n\n#### Video Analysis\n\n\n```python\nfrom rhubarb import VideoAnalysis\nimport boto3\n\nsession = boto3.Session()\n\n# Initialize video analysis with a video in S3\nva = VideoAnalysis(\n    file_path=\"s3://my-bucket/my-video.mp4\",\n    boto3_session=session\n)\n\n# Ask questions about the video\nresponse = va.run(message=\"What is happening in this video?\")\nprint(response)\n```\n\n#### Large Document Processing\n\nRhubarb supports processing documents with more than 20 pages using a sliding window approach. This feature is particularly useful when working with Claude models, which have a limitation of processing only 20 pages at a time.\n\nTo enable this feature, set `sliding_window_overlap` to a value between 1 and 10 when creating a `DocAnalysis` object:\n\n```python\ndoc_analysis = DocAnalysis(\n    file_path=\"path/to/large-document.pdf\",\n    boto3_session=session,\n    sliding_window_overlap=2     # Number of pages to overlap between windows (1-10)\n)\n```\n\nWhen the sliding window approach is enabled, Rhubarb will:\n1. Break the document into chunks of 20 pages\n2. Process each chunk separately\n3. Combine the results from all chunks\n\nNote: The sliding window technique is not yet supported for document classification. When using classification with large documents, only the first 20 pages will be considered.\n\nFor more details, see the [Large Document Processing Cookbook](cookbooks/2-large-document-processing.ipynb).\n\nFor more usage examples see [cookbooks](./cookbooks/).\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## License\n\nThis project is licensed under the Apache-2.0 License.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawslabs%2Frhubarb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawslabs%2Frhubarb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawslabs%2Frhubarb/lists"}