{"id":24542711,"url":"https://github.com/urbanclap-engg/smart-docs-parser","last_synced_at":"2025-10-05T00:53:52.366Z","repository":{"id":42449853,"uuid":"238626973","full_name":"urbanclap-engg/smart-docs-parser","owner":"urbanclap-engg","description":"An OCR based document parser to extract information from identity document images","archived":false,"fork":false,"pushed_at":"2022-08-25T13:06:52.000Z","size":66,"stargazers_count":21,"open_issues_count":0,"forks_count":7,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-08-29T20:22:34.029Z","etag":null,"topics":["aadhaar","auto-fill","document-parser","google-vision","nodejs","ocr","pancard","typescript","user-onboarding"],"latest_commit_sha":null,"homepage":"https://medium.com/urbanclap-engineering/document-details-parsing-using-ocr-170bf6ad8a97","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/urbanclap-engg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-06T06:58:36.000Z","updated_at":"2024-10-28T08:23:45.000Z","dependencies_parsed_at":"2022-09-16T16:52:55.297Z","dependency_job_id":null,"html_url":"https://github.com/urbanclap-engg/smart-docs-parser","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/urbanclap-engg/smart-docs-parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/urbanclap-engg%2Fsmart-docs-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/urbanclap-engg%2Fsmart-docs-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/urbanclap-engg%2Fsmart-docs-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/urbanclap-engg%2Fsmart-docs-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/urbanclap-engg","download_url":"https://codeload.github.com/urbanclap-engg/smart-docs-parser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/urbanclap-engg%2Fsmart-docs-parser/sbom","scorecard":{"id":911785,"data":{"date":"2025-08-11","repo":{"name":"github.com/urbanclap-engg/smart-docs-parser","commit":"427c7bdf35c99e571e461bfa1056d35006325d1f"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.5,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 1/28 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 3 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-24T19:28:56.647Z","repository_id":42449853,"created_at":"2025-08-24T19:28:56.647Z","updated_at":"2025-08-24T19:28:56.647Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277583953,"owners_count":25843219,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-29T02:00:09.175Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aadhaar","auto-fill","document-parser","google-vision","nodejs","ocr","pancard","typescript","user-onboarding"],"created_at":"2025-01-22T19:17:26.128Z","updated_at":"2025-10-05T00:53:52.336Z","avatar_url":"https://github.com/urbanclap-engg.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# smart-docs-parser\n\n```smart-docs-parser``` is a NodeJs library to parse details from ID images.\n\nhttps://medium.com/urbanclap-engineering/document-details-parsing-using-ocr-170bf6ad8a97\n\n## How does it work?\n\n```smart-docs-parser``` works in three steps:\n- Extraction of raw text from document image using OCR\n- Validation of document image based on passed document type and extracted raw text\n- Parsing relevant information from raw text using document parser\n\n## Installation\n```\n$ npm install smart-docs-parser\n```\n\n## Usage\n### Configuration\nCreate a _config_ folder at the root of your project. Add _default.json_ file to the _config_ folder.\n#### config/default.json\n```Javascript\n{\n  \"smart-docs-parser\": {\n    \"api_keys\": {\n      \"google-vision\": \"YOUR_API_KEY\"\n    }\n  }\n}\n```\n### Code\n```\n// ES6 import statement\nimport SmartDocuments from 'smart-docs-parser';\n\n// Sample Request\nconst extractedDocumentDetails = await SmartDocuments.extractDocumentDetailsFromImage({\n    document_url: 'https://avatars2.githubusercontent.com/u/20634933?s=40\u0026v=4',\n    document_type: 'PAN_CARD',\n    ocr_library: 'google-vision'\n});\n\n// Sample Response\n{ raw_text: \n   [ 'INCOME TAX DEPARTMENT',\n     'GOVT. OF INDIA',\n     'Permanent Account Number Card',\n     'PANAM8144G',\n     '/Name',\n     'ID NAME',\n     'frar TT /Father\\'s Name',\n     'FATHER NAME',\n     'ae of Birth',\n     '13/02/1994',\n     'SIGN',\n     'at / Signature',\n     '' ],\n  is_document_valid: true,\n  document_details: \n   { document_type: 'PAN_CARD',\n     identification_number: 'PANAM8144G',\n     name: 'ID NAME',\n     date_of_birth: '1994-02-13T00:00:00.000Z',\n     fathers_name: 'FATHER NAME' \n   } \n}\n```\n\n## Interfaces\n### Request\n```Javascript\nexport interface ExtractDocumentDetailsFromImageRequest {\n  document_url: string;\n  document_type: string;\n  ocr_library: string;\n  custom_parser?: object; // Only for custom parsers\n  custom_ocr?: object; // Only for custom OCRs\n  timeout?: number; //Optional request timeout parameter, defaults to 30 secs\n}\n```\n### Response\n```Javascript\nexport interface ExtractDocumentDetailsFromImageResponse {\n  raw_text: Array\u003cstring\u003e;\n  is_document_valid: boolean;\n  document_details: DocumentDetails | object;\n}\ninterface DocumentDetails {\n  document_type?: string;\n  identification_number?: string;\n  name?: string;\n  fathers_name?: string;\n  date_of_birth?: string;\n  gender?: 'M'|'F';\n  address?: string;\n}\n```\n**raw_text** is the text extracted by the OCR\n\n**is_document_valid** denotes whether the document is valid based on input *document_type* and extracted *raw_text*\n\n**document_details** is the document information parsed using the specific document parser\n\n## Supported Request Parameters\n### Document Type\n* PAN CARD\n``` Javascript\n    document_type: 'PAN_CARD'\n```\n* AADHAAR CARD\n``` Javascript\n    document_type: 'AADHAAR_CARD'\n```\n### OCR Library\n* Google Vision\n``` Javascript\n    ocr_library: 'google-vision'\n```\n\n## Current limitations\n### Address parsing\nLibrary can parse state name and pin-code but the accuracy of the system for complete address text parsing is not upto the mark due to the noise introduced by multilingual text. \n\n## Contributions\nContributions are welcome. Please create a pull-request if you want to add more document parsers, OCR libraries, test-support or enhance the existing code.\n\n## Extending smart-docs-parser\n### For specific use-cases\n* [Parsing more documents](https://github.com/urbanclap-engg/smart-docs-parser/blob/master/docs/custom_parser.md)\n* [Adding more OCR libraries](https://github.com/urbanclap-engg/smart-docs-parser/blob/master/docs/custom_ocr.md)\n\n### Contributing to the library\n* [Parsing more documents](https://github.com/urbanclap-engg/smart-docs-parser/blob/master/docs/document_parser.md)\n* [Adding more OCR libraries](https://github.com/urbanclap-engg/smart-docs-parser/blob/master/docs/ocr_library.md)\n\n## License\n[MIT](https://github.com/urbanclap-engg/smart-docs-parser/blob/master/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furbanclap-engg%2Fsmart-docs-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Furbanclap-engg%2Fsmart-docs-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furbanclap-engg%2Fsmart-docs-parser/lists"}