An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with textract

A curated list of projects in awesome lists tagged with textract .

https://github.com/aeksco/aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

aws aws-cdk aws-textract cdk cloudformation data-pipeline dynamodb jest lambda pdf puppeteer s3 serverless sns textract typescript webscraping

Last synced: 06 Oct 2025

https://github.com/likerrr/code4goal-resume-parser

Solution for Code4Goal challenge

cvs js node nodejs resume-parser textract

Last synced: 25 Sep 2025

https://github.com/simonw/s3-ocr

Tools for running OCR against files stored in S3

ocr s3 textract

Last synced: 21 Aug 2025

https://github.com/fourdigits/wagtail_textract

Text extraction for Wagtail document search

django search tesseract text-extraction textract wagtail

Last synced: 06 Oct 2025

https://github.com/t04glovern/aws-textract-adoption-forms

Using Serverless to consume and processing WA Animals adoption forms using Amazon Textract and placing that data in DynamoDB

aws aws-textract charity lambda ocr-recognition serverless serverless-framework textract

Last synced: 24 Jun 2025

https://github.com/slub/textract2page

Convert AWS Textract JSON to PRImA PAGE XML

ocr page-xml python textract

Last synced: 28 Jul 2025

https://github.com/hupe1980/go-textractor

📄 Amazon textract response parser written in go.

amazon aws golang parser textract unstructured-data

Last synced: 16 Apr 2025

https://github.com/edelgm6/ledger

Personal accounting tool with Django backend, HTMX+Alpine frontend, and AWS Textract

accounting alpinejs django finance htmx textract

Last synced: 28 Jan 2026

https://github.com/muhimasri/aws-textract-app

Convert an image to an HTML form using Amazon Textract and NodeJS

aws aws-textract javascript nodejs textract

Last synced: 16 Mar 2026

https://github.com/build-on-aws/aiml-like-api-in-your-app

Sample code for adding AI/ML services to your app

aws polly rekognition textract transcribe

Last synced: 04 Mar 2026

https://github.com/briancullen/aws-textract-parser

Library for converting AWS Textract responses into a more usable structure.

aws aws-textract-parser textract tree

Last synced: 13 Sep 2025

https://github.com/devanshu-17/hackscript-hackathon

AI-powered Invoice and Form Label-Fields Extraction for Document Management using OpenAI & Hugging Face Transformers

hackathon huggingface invoice-management openapi textract

Last synced: 29 Apr 2026

https://github.com/iann0036/textract-demo

Demonstration of Amazon Textract using its Boto3 library

aws textract

Last synced: 29 Apr 2026

https://github.com/moindalvs/resume_classification

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention

classification classification-algorithm data-science docx docx2txt ensemble-machine-learning pdfplumber resume-app resume-parser text-analysis text-classification text-mining text-processing textract

Last synced: 24 Apr 2026

https://github.com/pxkundu/ai-logbook-analysis

The project involves developing an AI-powered system to extract and analyze data from handwritten logbooks and service records using AWS native services. The goal is to create a scalable, secure, and efficient solution with a robust project structure, modern tech stack, and DevSecOps best practices.

ai aws-lambda bedrock sagemaker text-to-sql textract

Last synced: 18 Feb 2026

https://github.com/atsuyaw/textlint-launcher4ja

textlint導入・実行ツール

batch proofing-tools textlint textlint-config textract

Last synced: 13 Apr 2025

https://github.com/joshmenden/bartleby

A simple AWS Lambda function that extracts Key-Value pairs from an image using AWS Textract for people who would "prefer not to."

aws lambda textract

Last synced: 26 Jan 2026

https://github.com/gv3n/pdf_file_scanner

A pdf file scanner used to scan pdfs in bulk for automation using PyPDF2, textract & nltk libraries in Python.

bulk nltk pypdf2 python3 scan-resumes scanner textract

Last synced: 14 Oct 2025

https://github.com/mycielski/textract_study

Analysing expense reports/invoices with AWS Textract and boto3.

aws aws-cli boto3 document-understanding expenses invoices script shell textract

Last synced: 06 May 2026

https://github.com/pxkundu/ai-financial-fraud-detection-solution

This project implements an AI-powered financial fraud detection platform using AWS Bedrock, Textract, Dataiku, and OpenAI APIs, deployed on EKS with Terraform. It processes transaction data to detect anomalies and potential fraud.

aws bedrock data-platform-apps data-science dataiku eks-cluster openai-api python3 textract

Last synced: 08 Oct 2025

https://github.com/akshar-raaj/document-processing

A fast, flexible API for extracting text from PDFs and images using smart file detection and OCR—perfect for automating your document workflows.

ai artificial-intelligence document-processing-pipeline ocr optical-character-recognition tesseract textract

Last synced: 30 Jun 2025

https://github.com/drew138/p2

Transcript IA es un software para el consultorio Julian Leon Ramirez Zuluaga, el cual transforma tablas en papel en archivos de Excel a traves de IA

aws grid-recognition machine-learning text-recognition textract

Last synced: 20 May 2026

https://github.com/saifrehman100/expense-tracker-aws

AI-powered expense tracking microservices on AWS. Upload receipts, auto-extract data with OCR (Textract), smart categorization (Comprehend), budget alerts, and spending reports. Serverless architecture using Lambda, DynamoDB, S3, SES, and SAM.

api-gateway aws aws-lambda budget-tracker cognito comprehend dynamodb expense-tracker fastapi microservices ocr python3 receipt-scanner s3 sam serverless ses textract

Last synced: 01 May 2026

https://github.com/fapulito/vercel_textract

Deploy to Vercel - Python Client for AWS Textract | OCR SaaS with Development Roadmap

agenticai flask-application kiro-ide microsaas neon-postgres ocr-python textract

Last synced: 18 Jan 2026

https://github.com/abinashsahoo007/project-resume-classification

The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.

corpus count-vectorizer label-encoding lemmitization machine-learning nltk part-of-speech-tagging resume-classification spacy stemming text-mining text-preprocessing textract tfidf-vectorizer tokenization wordcloud

Last synced: 02 Feb 2026

https://github.com/paulo-freitas-junior/dio-bootcamp-nexa-aws-ia

Bootcamp NEXA para análise avançada de textos e imagens com uso de IA na AWS.

amazon-aws aws rekognition sagemaker textract transcribe

Last synced: 05 Feb 2026

https://github.com/monnus/serverlessinvoicescanner

Return extracted data from any invoice

api aws-lambda dynamodb s3-bucket textract

Last synced: 21 Apr 2026

https://github.com/roberto-a-cardenas/intellidoc-engine

Serverless OCR pipeline on AWS using Lambda, API Gateway, S3, and Textract. Accepts base64 PDFs and returns extracted text via API. Built with Terraform.

api-gateway aws aws-lambda cloud-engineering document-processing ocr s3 serverless terraform textract

Last synced: 22 Apr 2026

https://github.com/miozilla/ct3p

ct3p :leaves::sheep: : AI Global Consulting Service # Amazon Comprehend # Textract # Translate # Transcribe # Polly # SageMaker AI # S3

ai amazon audio boto3 comprehend polly sagemaker speech text textract transcribe translate

Last synced: 29 Aug 2025