{"id":29736128,"url":"https://github.com/margato/aws-bedrock-document-entity-extractor","last_synced_at":"2026-04-14T06:33:45.810Z","repository":{"id":304911938,"uuid":"1020499642","full_name":"margato/aws-bedrock-document-entity-extractor","owner":"margato","description":"Document Entity Extraction using AWS Bedrock","archived":false,"fork":false,"pushed_at":"2025-07-16T01:54:58.000Z","size":443,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-17T02:53:17.115Z","etag":null,"topics":["aws","aws-bedrock","aws-bedrock-agents","genai","poc","proof-of-concept","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/margato.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-16T01:16:03.000Z","updated_at":"2025-07-16T01:55:01.000Z","dependencies_parsed_at":"2025-07-17T08:35:29.070Z","dependency_job_id":"66f9f393-c531-48d6-b0b4-8231b14a26d5","html_url":"https://github.com/margato/aws-bedrock-document-entity-extractor","commit_stats":null,"previous_names":["margato/aws-bedrock-document-entity-extractor"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/margato/aws-bedrock-document-entity-extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/margato%2Faws-bedrock-document-entity-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/margato%2Faws-bedrock-document-entity-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/margato%2Faws-bedrock-document-entity-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/margato%2Faws-bedrock-document-entity-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/margato","download_url":"https://codeload.github.com/margato/aws-bedrock-document-entity-extractor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/margato%2Faws-bedrock-document-entity-extractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31785677,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T02:24:21.117Z","status":"ssl_error","status_checked_at":"2026-04-14T02:24:20.627Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-bedrock","aws-bedrock-agents","genai","poc","proof-of-concept","streamlit"],"created_at":"2025-07-25T14:01:39.953Z","updated_at":"2026-04-14T06:33:45.805Z","avatar_url":"https://github.com/margato.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Document Entity Extractor (AWS Bedrock PoC)\n\n## Table of Contents\n\n- [Overview](#overview)\n    - [AWS Bedrock](#aws-bedrock)\n    - [Features](#features)\n- [Prerequisites](#prerequisites)\n- [Installation](#installation)\n- [Running Locally](#running-locally)\n\n## Overview\nThis project is a Streamlit web application for extracting structured entities from uploaded documents (PDF, PNG, JPG, etc.) using AWS Bedrock Agents.\n\n![Architecture](docs/architecture-dark.png#gh-dark-mode-only)\n![Architecture](docs/architecture.png#gh-light-mode-only)\n\nUsers can define custom extraction fields and view the results. This enables automation of manual processes, reducing back-office costs.\n\n![UI](docs/ui-dark.png#gh-dark-mode-only)\n![UI](docs/ui.png#gh-light-mode-only)\n\n### AWS Bedrock\nThis proof of concept was tested on a AWS Bedrock agent using Nova Lite model.\n\nMoreover, this repository is not intended to demonstrate how to implement an agent in AWS Bedrock. However, if you want to reproduce it, the prompt to create one in your account is provided below.\n\n**Agent Prompt:**\n```\nYou are a document extraction agent.\n\nYou will receive:\n- A document in PDF, PNG, or other file extension.\n- A list of extraction objects, each with:\n    - `key`: the entity key to extract.\n    - `description`: a clear description of what the entity represents.\n\nYour task:\n- For each object, extract the corresponding value from the text.\n- Return a JSON with each `key` mapped to the extracted value.\n- If the value is not found, return `null` for that key.\n- Do not include any explanations or extra text, only the JSON output.\n\nExample Input:\n{\n  \"fields\": [\n    {\n      \"key\": \"name\",\n      \"description\": \"Candidate's name\"\n    },\n    {\n      \"key\": \"candidate_resume_summary\",\n      \"description\": \"Make a summary about the candidate resume\"\n    },\n    {\n      \"key\": \"current_job_role\",\n      \"description\": \"Candidate's current job role\"\n    }\n  ]\n}\n\nExpected Output:\n{\n    \"name\": \"John Doe\",\n    \"candidate_resume_summary\": \"Experienced Java Developer. Builds scalable enterprise solutions.\",\n    \"current_job_role\": \"Java Developer\"\n}\n\nIf you cannot find an entity based on the description, return `null` for that key.\n```\n\n### Features\n\n- Upload documents for entity extraction\n- Results displayed in a user-friendly format\n- Uses AWS Bedrock via boto3\n\n## Prerequisites\n\n- AWS credentials and Bedrock agent implementation\n- The following `.env` file in `app/.env` (example):\n\n  ```\n  AWS_REGION=your-region\n  AWS_ACCESS_KEY_ID=your-access-key-id\n  AWS_SECRET_ACCESS_KEY=your-secret-access-key\n  AWS_BEDROCK_AGENT_ID=your-bedrock-agent-id\n  AWS_BEDROCK_AGENT_ALIAS_ID=your-bedrock-agent-alias-id\n  ```\n\n## Installation\n\n1. Clone this repository.\n2. Create and activate a Python virtual environment:\n\n   ```bash\n   python3 -m venv venv\n   source venv/bin/activate\n   ```\n\n3. Install dependencies:\n\n   ```bash\n   pip install -r app/requirements.txt\n   ```\n\n4. Create your `.env` file in `app/.env` as shown above.\n\n## Running Locally\n\nStart the Streamlit app:\n\n```bash\nstreamlit run app/main.py\n```\n\nOpen the provided local URL in your browser to use the application.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmargato%2Faws-bedrock-document-entity-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmargato%2Faws-bedrock-document-entity-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmargato%2Faws-bedrock-document-entity-extractor/lists"}