Projects in Awesome Lists tagged with textract
A curated list of projects in awesome lists tagged with textract .
https://github.com/srcecde/aws-tutorial-code
AWS tutorial code.
amazon-web-services api-gateway aws aws-lambda aws-lambda-python cloudformation comprehend dynamodb ecs glue lambda s3-bucket s3-website textract tutorial
Last synced: 16 Jan 2026
https://github.com/aeksco/aws-pdf-textract-pipeline
:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
aws aws-cdk aws-textract cdk cloudformation data-pipeline dynamodb jest lambda pdf puppeteer s3 serverless sns textract typescript webscraping
Last synced: 06 Oct 2025
https://github.com/likerrr/code4goal-resume-parser
Solution for Code4Goal challenge
cvs js node nodejs resume-parser textract
Last synced: 25 Sep 2025
https://github.com/simonw/s3-ocr
Tools for running OCR against files stored in S3
Last synced: 21 Aug 2025
https://github.com/fourdigits/wagtail_textract
Text extraction for Wagtail document search
django search tesseract text-extraction textract wagtail
Last synced: 06 Oct 2025
https://github.com/sergiocorreia/quipucamayoc
dev repo for article
ocr ocr-post-processing ocr-python poppler table-extraction table-ocr textract
Last synced: 12 Apr 2025
https://github.com/muhimasri/aws-textract-helper
Aws Textract Helper
aws aws-textract javascript nodejs textract
Last synced: 23 Apr 2025
https://github.com/onify/blueprint-aws-textract-pdf-to-form
Onify Blueprint: Amazon AWS Textract - PDF to form example
agent ai amazon amazon-textract blueprint bpmn flow form machine-learning nodejs ocr onify onify-blueprint onify-blueprints pdf rpa s3 s3-bucket textract
Last synced: 13 Apr 2025
https://github.com/t04glovern/aws-textract-adoption-forms
Using Serverless to consume and processing WA Animals adoption forms using Amazon Textract and placing that data in DynamoDB
aws aws-textract charity lambda ocr-recognition serverless serverless-framework textract
Last synced: 24 Jun 2025
https://github.com/slub/textract2page
Convert AWS Textract JSON to PRImA PAGE XML
Last synced: 28 Jul 2025
https://github.com/manuel-lang/autonomous-semantic-search-engine
Submission for HackDataKIBots 2018 - Web crawler combined with document analysis
crawler hackathon machine-learning mannheim microsoft natural-language-processing natural-language-understanding nextiteration rnv semantic-search textract
Last synced: 03 May 2025
https://github.com/hupe1980/go-textractor
📄 Amazon textract response parser written in go.
amazon aws golang parser textract unstructured-data
Last synced: 16 Apr 2025
https://github.com/edelgm6/ledger
Personal accounting tool with Django backend, HTMX+Alpine frontend, and AWS Textract
accounting alpinejs django finance htmx textract
Last synced: 28 Jan 2026
https://github.com/muhimasri/aws-textract-app
Convert an image to an HTML form using Amazon Textract and NodeJS
aws aws-textract javascript nodejs textract
Last synced: 16 Mar 2026
https://github.com/build-on-aws/aiml-like-api-in-your-app
Sample code for adding AI/ML services to your app
aws polly rekognition textract transcribe
Last synced: 04 Mar 2026
https://github.com/briancullen/aws-textract-parser
Library for converting AWS Textract responses into a more usable structure.
aws aws-textract-parser textract tree
Last synced: 13 Sep 2025
https://github.com/devanshu-17/hackscript-hackathon
AI-powered Invoice and Form Label-Fields Extraction for Document Management using OpenAI & Hugging Face Transformers
hackathon huggingface invoice-management openapi textract
Last synced: 29 Apr 2026
https://github.com/iann0036/textract-demo
Demonstration of Amazon Textract using its Boto3 library
Last synced: 29 Apr 2026
https://github.com/moindalvs/resume_classification
Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention
classification classification-algorithm data-science docx docx2txt ensemble-machine-learning pdfplumber resume-app resume-parser text-analysis text-classification text-mining text-processing textract
Last synced: 24 Apr 2026
https://github.com/pxkundu/ai-logbook-analysis
The project involves developing an AI-powered system to extract and analyze data from handwritten logbooks and service records using AWS native services. The goal is to create a scalable, secure, and efficient solution with a robust project structure, modern tech stack, and DevSecOps best practices.
ai aws-lambda bedrock sagemaker text-to-sql textract
Last synced: 18 Feb 2026
https://github.com/atsuyaw/textlint-launcher4ja
textlint導入・実行ツール
batch proofing-tools textlint textlint-config textract
Last synced: 13 Apr 2025
https://github.com/joshmenden/bartleby
A simple AWS Lambda function that extracts Key-Value pairs from an image using AWS Textract for people who would "prefer not to."
Last synced: 26 Jan 2026
https://github.com/gv3n/pdf_file_scanner
A pdf file scanner used to scan pdfs in bulk for automation using PyPDF2, textract & nltk libraries in Python.
bulk nltk pypdf2 python3 scan-resumes scanner textract
Last synced: 14 Oct 2025
https://github.com/josephgoksu/document-analysis-api
Open Source Document Analyzer
angularjs daa flask flask-restful python textract
Last synced: 05 May 2026
https://github.com/mycielski/textract_study
Analysing expense reports/invoices with AWS Textract and boto3.
aws aws-cli boto3 document-understanding expenses invoices script shell textract
Last synced: 06 May 2026
https://github.com/pxkundu/ai-financial-fraud-detection-solution
This project implements an AI-powered financial fraud detection platform using AWS Bedrock, Textract, Dataiku, and OpenAI APIs, deployed on EKS with Terraform. It processes transaction data to detect anomalies and potential fraud.
aws bedrock data-platform-apps data-science dataiku eks-cluster openai-api python3 textract
Last synced: 08 Oct 2025
https://github.com/akshar-raaj/document-processing
A fast, flexible API for extracting text from PDFs and images using smart file detection and OCR—perfect for automating your document workflows.
ai artificial-intelligence document-processing-pipeline ocr optical-character-recognition tesseract textract
Last synced: 30 Jun 2025
https://github.com/drew138/p2
Transcript IA es un software para el consultorio Julian Leon Ramirez Zuluaga, el cual transforma tablas en papel en archivos de Excel a traves de IA
aws grid-recognition machine-learning text-recognition textract
Last synced: 20 May 2026
https://github.com/saifrehman100/expense-tracker-aws
AI-powered expense tracking microservices on AWS. Upload receipts, auto-extract data with OCR (Textract), smart categorization (Comprehend), budget alerts, and spending reports. Serverless architecture using Lambda, DynamoDB, S3, SES, and SAM.
api-gateway aws aws-lambda budget-tracker cognito comprehend dynamodb expense-tracker fastapi microservices ocr python3 receipt-scanner s3 sam serverless ses textract
Last synced: 01 May 2026
https://github.com/fapulito/vercel_textract
Deploy to Vercel - Python Client for AWS Textract | OCR SaaS with Development Roadmap
agenticai flask-application kiro-ide microsaas neon-postgres ocr-python textract
Last synced: 18 Jan 2026
https://github.com/lrasata/serverless-docu-chat-ai
ai api-gateway aws cloudfront cognito dynamodb lambda opensearch s3-bucket sagemaker terraform textract
Last synced: 08 Jan 2026
https://github.com/anmol111pal/billify-invoice-processor
apigateway aws aws-sdk awscdk dynamodb lambda s3 ses textract typescript
Last synced: 29 Apr 2026
https://github.com/abinashsahoo007/project-resume-classification
The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.
corpus count-vectorizer label-encoding lemmitization machine-learning nltk part-of-speech-tagging resume-classification spacy stemming text-mining text-preprocessing textract tfidf-vectorizer tokenization wordcloud
Last synced: 02 Feb 2026
https://github.com/paulo-freitas-junior/dio-bootcamp-nexa-aws-ia
Bootcamp NEXA para análise avançada de textos e imagens com uso de IA na AWS.
amazon-aws aws rekognition sagemaker textract transcribe
Last synced: 05 Feb 2026
https://github.com/kevindellapiazza/vat-compliance-monitor
SERVERLESS AWS INVOICE VALIDATION PIPELINE
athena aws aws-ses dataengineering datascience dynamodb glue invoice-management lambda ocr portfolio s3 serverless textract
Last synced: 01 Mar 2026
https://github.com/monnus/serverlessinvoicescanner
Return extracted data from any invoice
api aws-lambda dynamodb s3-bucket textract
Last synced: 21 Apr 2026
https://github.com/roberto-a-cardenas/intellidoc-engine
Serverless OCR pipeline on AWS using Lambda, API Gateway, S3, and Textract. Accepts base64 PDFs and returns extracted text via API. Built with Terraform.
api-gateway aws aws-lambda cloud-engineering document-processing ocr s3 serverless terraform textract
Last synced: 22 Apr 2026
https://github.com/miozilla/ct3p
ct3p :leaves::sheep: : AI Global Consulting Service # Amazon Comprehend # Textract # Translate # Transcribe # Polly # SageMaker AI # S3
ai amazon audio boto3 comprehend polly sagemaker speech text textract transcribe translate
Last synced: 29 Aug 2025