{"id":13539662,"url":"https://github.com/shelfio/aws-lambda-tesseract","last_synced_at":"2025-04-04T11:09:18.598Z","repository":{"id":33152931,"uuid":"142484747","full_name":"shelfio/aws-lambda-tesseract","owner":"shelfio","description":"6 MB Tesseract (with English training data) to fit inside AWS Lambda","archived":false,"fork":false,"pushed_at":"2025-03-27T01:55:33.000Z","size":42995,"stargazers_count":90,"open_issues_count":12,"forks_count":16,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-03-28T19:44:53.163Z","etag":null,"topics":["aws-lambda","node-module","nodejs","npm-package","ocr","optical-character-recognition","serverless","tesseract"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shelfio.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-26T19:24:37.000Z","updated_at":"2025-03-23T05:33:11.000Z","dependencies_parsed_at":"2023-10-17T15:45:30.337Z","dependency_job_id":"3d82d5c0-9d5f-443e-9065-99a4b82d46fa","html_url":"https://github.com/shelfio/aws-lambda-tesseract","commit_stats":{"total_commits":617,"total_committers":7,"mean_commits":88.14285714285714,"dds":0.5721231766612642,"last_synced_commit":"5c04409066376069f4ec936b6910ff20d48d2e0d"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Faws-lambda-tesseract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Faws-lambda-tesseract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Faws-lambda-tesseract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shelfio%2Faws-lambda-tesseract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shelfio","download_url":"https://codeload.github.com/shelfio/aws-lambda-tesseract/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247166168,"owners_count":20894654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-lambda","node-module","nodejs","npm-package","ocr","optical-character-recognition","serverless","tesseract"],"created_at":"2024-08-01T09:01:30.146Z","updated_at":"2025-04-04T11:09:18.580Z","avatar_url":"https://github.com/shelfio.png","language":"Shell","readme":"# aws-lambda-tesseract [![CircleCI](https://circleci.com/gh/shelfio/aws-lambda-tesseract/tree/master.svg?style=svg)](https://circleci.com/gh/shelfio/aws-lambda-tesseract/tree/master) ![](https://img.shields.io/badge/code_style-prettier-ff69b4.svg) [![Tesseract](https://img.shields.io/badge/tesserract-6_MB-brightgreen.svg)](bin/)\n\n\u003e 6 MB Tesseract 5.3.3 (with English training data) to fit inside AWS Lambda\n\nInspired by [chrome-aws-lambda](https://github.com/alixaxel/chrome-aws-lambda) \u0026 [lambda-scanner-ocr](https://github.com/philippkeller/lambda-scanner-ocr)\n\n## Install\n\n```\n$ yarn add @shelf/aws-lambda-tesseract\n```\n\n`1.x` versions of this library were compiled for Node 8.10.\n\n`2.x` was compiled for Node 10.x runtime.\n\n`3.x` works for Node 12.x runtime.\n\n`4.x` works for Node 16.x runtime and compiled with **Tesseract 5.1.0**. It works with x86_64 CPUs for now only.\n\n`5.x` works for Node 18.x runtime and compiled with **Tesseract 5.3.3**. It works with arm64 CPUs.\n\n`6.x` works for Node 22.x runtime and compiled with **Tesseract 5.3.3**. It works with arm64 CPUs.\n\n## How does it work?\n\nThis package contains an archive with [Tesseract 5.3.3](https://github.com/tesseract-ocr/tesseract) compiled for usage in AWS Lambda environment.\n\nWhen a Lambda starts, it unpacks an archive with a binary to the `/tmp` folder and makes sure it's done only once per Lambda cold start.\n\n## Usage\n\n```js\nconst {getTextFromImage, isSupportedFile} = require('@shelf/aws-lambda-tesseract');\n\nmodule.exports.handler = async event =\u003e {\n  // assuming there is a photo.jpg inside /tmp dir\n  // original file will be deleted afterwards\n\n  if (!isSupportedFile('/tmp/photo.jpg')) {\n    return false;\n  }\n\n  return getTextFromImage('/tmp/photo.jpg');\n};\n```\n\n`isSupportedFile` checks that file has image-like file extension and it's not in the list of\nunsupported by Tesseract file extensions.\n\n## Compile It Yourself\n\nSee [compile-tesseract.sh](compile-tesseract.sh)\n\nSmoke test that it works by running `test.sh` script\n\n## See Also\n\n- [aws-lambda-libreoffice](https://github.com/shelfio/aws-lambda-libreoffice)\n- [chrome-aws-lambda-layer](https://github.com/shelfio/chrome-aws-lambda-layer)\n- [ghostscript-lambda-layer](https://github.com/shelfio/ghostscript-lambda-layer)\n\n## Publish\n\n```sh\n$ git checkout master\n$ yarn version\n$ yarn publish\n$ git push origin master --tags\n```\n\n## License\n\nMIT © [Shelf](https://shelf.io)\n","funding_links":[],"categories":["Shell"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshelfio%2Faws-lambda-tesseract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshelfio%2Faws-lambda-tesseract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshelfio%2Faws-lambda-tesseract/lists"}