{"id":17449355,"url":"https://github.com/briancullen/aws-textract-parser","last_synced_at":"2025-09-13T21:30:48.522Z","repository":{"id":35446623,"uuid":"217703010","full_name":"briancullen/aws-textract-parser","owner":"briancullen","description":"Library for converting AWS Textract responses into a more usable structure.","archived":false,"fork":false,"pushed_at":"2023-01-04T23:29:41.000Z","size":1307,"stargazers_count":2,"open_issues_count":17,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-31T06:26:19.179Z","etag":null,"topics":["aws","aws-textract-parser","textract","tree"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/briancullen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-10-26T12:06:40.000Z","updated_at":"2023-08-16T10:20:26.000Z","dependencies_parsed_at":"2023-01-15T21:25:30.365Z","dependency_job_id":null,"html_url":"https://github.com/briancullen/aws-textract-parser","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/briancullen/aws-textract-parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briancullen%2Faws-textract-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briancullen%2Faws-textract-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briancullen%2Faws-textract-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briancullen%2Faws-textract-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/briancullen","download_url":"https://codeload.github.com/briancullen/aws-textract-parser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briancullen%2Faws-textract-parser/sbom","scorecard":{"id":253061,"data":{"date":"2025-08-11","repo":{"name":"github.com/briancullen/aws-textract-parser","commit":"a34d953c8fea3c5380ee9ce10263c05aa6265efb"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.3,"checks":[{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":0,"reason":"license file not detected","details":["Warn: project does not have a license file"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 9 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"61 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-968p-4wvh-cqc8","Warn: Project is vulnerable to: GHSA-67hx-6x53-jw92","Warn: Project is vulnerable to: GHSA-6chw-6frg-f759","Warn: Project is vulnerable to: GHSA-v88g-cgmw-v5xw","Warn: Project is vulnerable to: GHSA-93q8-gq69-wqmw","Warn: Project is vulnerable to: GHSA-rrc9-gqf8-8rwg","Warn: Project is vulnerable to: GHSA-v6h2-p8h4-qcjw","Warn: Project is vulnerable to: GHSA-grv7-fg5c-xmjg","Warn: Project is vulnerable to: GHSA-3xgq-45jj-v275","Warn: Project is vulnerable to: GHSA-gxpj-cx7g-858c","Warn: Project is vulnerable to: GHSA-w573-4hg7-7wgq","Warn: Project is vulnerable to: GHSA-fjxv-7rqg-78g4","Warn: Project is vulnerable to: GHSA-8r6j-v8pm-fqw3","Warn: Project is vulnerable to: MAL-2023-462","Warn: Project is vulnerable to: GHSA-ww39-953v-wcq6","Warn: Project is vulnerable to: GHSA-2cf5-4w76-r9qv","Warn: Project is vulnerable to: GHSA-3cqr-58rm-57f8","Warn: Project is vulnerable to: GHSA-g9r4-xpmj-mj65","Warn: Project is vulnerable to: GHSA-q2c6-c6pm-g3gh","Warn: Project is vulnerable to: GHSA-765h-qjxv-5f44","Warn: Project is vulnerable to: GHSA-f2jv-r9rf-7988","Warn: Project is vulnerable to: GHSA-vfrc-7r7c-w9mx","Warn: Project is vulnerable to: GHSA-7wwv-vh3v-89cq","Warn: Project is vulnerable to: GHSA-43f8-2h32-f4cj","Warn: Project is vulnerable to: GHSA-qqgx-2p2h-9c37","Warn: Project is vulnerable to: GHSA-gxr4-xjj5-5px2","Warn: Project is vulnerable to: GHSA-jpcq-cgw6-v4j6","Warn: Project is vulnerable to: GHSA-896r-f27r-55mw","Warn: Project is vulnerable to: GHSA-9c47-m6qq-7p4h","Warn: Project is vulnerable to: GHSA-6c8f-qphg-qjgp","Warn: Project is vulnerable to: GHSA-p6mc-m468-83gw","Warn: Project is vulnerable to: GHSA-29mw-wpgm-hmr9","Warn: Project is vulnerable to: GHSA-35jh-r3h4-6jhm","Warn: Project is vulnerable to: GHSA-5v2h-r2cx-5xgj","Warn: Project is vulnerable to: GHSA-rrrm-qjm4-v8hf","Warn: Project is vulnerable to: GHSA-952p-6rrq-rcjv","Warn: Project is vulnerable to: GHSA-f8q6-p94x-37v3","Warn: Project is vulnerable to: GHSA-vh95-rmgr-6w4m","Warn: Project is vulnerable to: GHSA-xvch-5gv4-984h","Warn: Project is vulnerable to: GHSA-5fw9-fq32-wv5p","Warn: Project is vulnerable to: GHSA-hj48-42vr-x3v9","Warn: Project is vulnerable to: GHSA-hrpp-h998-j3pp","Warn: Project is vulnerable to: GHSA-p8p7-x288-28g6","Warn: Project is vulnerable to: GHSA-c2qf-rxjj-qqgw","Warn: Project is vulnerable to: GHSA-4rq4-32rv-6wp6","Warn: Project is vulnerable to: GHSA-64g7-mvw6-v9qj","Warn: Project is vulnerable to: GHSA-3jfq-g458-7qm9","Warn: Project is vulnerable to: GHSA-r628-mhmh-qjhw","Warn: Project is vulnerable to: GHSA-9r2w-394v-53qc","Warn: Project is vulnerable to: GHSA-5955-9wpr-37jh","Warn: Project is vulnerable to: GHSA-qq89-hq3f-393p","Warn: Project is vulnerable to: GHSA-f5x3-32g6-xq36","Warn: Project is vulnerable to: GHSA-52f5-9888-hmc6","Warn: Project is vulnerable to: GHSA-jgrx-mgxx-jf9v","Warn: Project is vulnerable to: GHSA-72xf-g2v4-qvf3","Warn: Project is vulnerable to: GHSA-cf4h-3jhx-xvhq","Warn: Project is vulnerable to: GHSA-6fc8-4gx4-v693","Warn: Project is vulnerable to: GHSA-3h5v-q93c-6h6q","Warn: Project is vulnerable to: GHSA-776f-qx25-q3cc","Warn: Project is vulnerable to: GHSA-c4w7-xm78-47vh","Warn: Project is vulnerable to: GHSA-p9pc-299p-vxgp"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-17T08:57:36.313Z","repository_id":35446623,"created_at":"2025-08-17T08:57:36.314Z","updated_at":"2025-08-17T08:57:36.314Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275029883,"owners_count":25393391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-13T02:00:10.085Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-textract-parser","textract","tree"],"created_at":"2024-10-17T21:36:04.502Z","updated_at":"2025-09-13T21:30:48.211Z","avatar_url":"https://github.com/briancullen.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AWS Textract Parser\n[![Build Status](https://travis-ci.org/briancullen/aws-textract-parser.svg?branch=master)](https://travis-ci.org/briancullen/aws-textract-parser)\n[![Maintainability](https://api.codeclimate.com/v1/badges/566b704c4b4d35be1ea9/maintainability)](https://codeclimate.com/github/briancullen/aws-textract-parser/maintainability)\n[![Test Coverage](https://api.codeclimate.com/v1/badges/566b704c4b4d35be1ea9/test_coverage)](https://codeclimate.com/github/briancullen/aws-textract-parser/test_coverage)\n\nTextract is an AWS service that lets you extract text from pictures or PDF documents. This library was created to process the the response from that service and transform it into something a little more manageable.\n\n\u003e **NOTE**: Currently this library is only setup to deal with responses from the DetectDocumentText calls, either synchronous or asynchronous. Parsing the calls that analyse documents may be added at a later date.\n\n## Rationale\nTextract returns json representing the pages, lines and words it has detected in the input. Below is a simplified example of the data you could expect for a single line of text consisting of two words. As you can see the data describes a tree where the line is identified as a child of the page and the words as children of the line.\n\n```json\n{\n  \"DocumentMetaData\": {\n    \"Pages\": 1\n  },\n  \"Blocks\": [\n    {\n      \"Id\": \"1\",\n      \"BlockType\": \"PAGE\",\n      \"Relationships\": [{\n        \"Type\": \"CHILD\",\n        \"Ids\": [ \"2\" ]\n      }]\n    },\n    { \n      \"Id\": \"1\",\n      \"BlockType\": \"LINE\",\n      \"Relationships\": [{\n        \"Type\": \"CHILD\",\n        \"Ids\": [ \"3\", \"4\" ]\n      }]\n    },\n    { \"Id\": \"3\", \"BlockType\": \"WORD\" },\n    { \"Id\": \"4\", \"BlockType\": \"WORD\" }\n  ]\n}\n```\n\nUnfortunately this tree structure is flattened into a array which makes navigating it more awkward that it should be. The purpose of this library is to process this flattened json to provide the tree structure described by it.\n\n\u003e In some tests the order of the words related to a line did not match that of the text. This is not what you would expect from processing a document. To address this the library will sort the words into left to right order (based on their position on the page).\n\n## Usage\n\nThe default export from the module is a parser instance that supports three different methods, `handleDetectTextCallback`, `handleDetectTextResponse`, and `parseGetTextDetection`.\n\n`handleDetectTextCallback` is a helper method that can be passed in as the standard callback to the Textract method. In turn it will call another callback with the processed tree. An example of this type of usage is shown below.\n\n```typescript\nimport { Textract } from 'aws-sdk'\nimport textractParser from '\u003cTBD\u003e'\n\nconst textract = new Textract()\nconst myCallback = (err, data) =\u003e {\n  if(err) {\n    console.log(err)\n  } else {\n    console.log(data)\n  }\n}\n\nconst request = {\n  Document: {\n    S3Object: {\n      Bucket: \"your-s3-bucket\",\n      Name: \"your-object-key\"\n    }\n  }\n}\n\ntextract.detectDocumentText(request,\n  textractParser.handleDetectTextCallback(myCallback))\n```\n\n`handleDetectTextResponse` will take a value of type `Textract.DetectDocumentTextResponse` and process it synchronously. This can be used with the promises provided by the AWS SDK. An example of how to use it in this manner is shown below.\n\n```typescript\ntextract.detectDocumentText(request).promise()\n  .then(data =\u003e textractParser.parseDetectTextResponse(data))\n  .then(parsedData =\u003e console.log(parsedData))\n  .catch(err =\u003e console.log(err))\n```\n\n`parseGetTextDetection` is a helper method to be used with the GetDocumentTextDetection operation. This operation can return the processed information over multiple requests which causes a problem when trying to construct the complete tree. If all the results are returned in a single response then the `handleDetectTextResponse` can be used as shown above.\n\nHowever, if that is not the case, then this call can be used to retrieve all the data and construct the tree as shown below. To allow the SDK to be configured differently in different environments a instantiated Textract client must be provided to this method.\n\n```typescript\nconst jobId = 'your-job-id'\nconst client = new AWS.Textract()\n\ntextract.detectDocumentText(client, jobId)\n .then(parsedData =\u003e console.log(parsedData))\n .catch(err =\u003e console.log(err))\n```\n\n**NOTE** This method will load the entire set of results into memory which may cause issues for really large documents. To give some context for a 10 page document of text the size of the results returned from textract was in the region of 7MB.\n\n## API\nSee the [API Docs](https://briancullen.github.io/aws-textract-parser/) for more information.\n\nIn particular refer to the API for the Document class as this forms the root of the tree that is returned.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbriancullen%2Faws-textract-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbriancullen%2Faws-textract-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbriancullen%2Faws-textract-parser/lists"}