{"id":13682293,"url":"https://github.com/maxent-ai/ocrpy","last_synced_at":"2025-04-05T20:05:23.901Z","repository":{"id":39599347,"uuid":"305105028","full_name":"maxent-ai/ocrpy","owner":"maxent-ai","description":"OCR, Archive, Index and Search: Implementation agnostic OCR framework.","archived":false,"fork":false,"pushed_at":"2023-11-03T05:03:49.000Z","size":33983,"stargazers_count":221,"open_issues_count":3,"forks_count":11,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-03-29T19:01:49.373Z","etag":null,"topics":["aws","azure","computer-vision","cv","deep-learning","google-vision-api","image-processing","information-retrieval","nlp","ocr","ocr-python","python","semantic-search","tesseract-ocr","transformers"],"latest_commit_sha":null,"homepage":"https://maxentlabs.com/ocrpy","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxent-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-18T13:13:36.000Z","updated_at":"2025-03-16T13:19:27.000Z","dependencies_parsed_at":"2024-01-14T16:12:00.339Z","dependency_job_id":null,"html_url":"https://github.com/maxent-ai/ocrpy","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Focrpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Focrpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Focrpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxent-ai%2Focrpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxent-ai","download_url":"https://codeload.github.com/maxent-ai/ocrpy/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393568,"owners_count":20931812,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","azure","computer-vision","cv","deep-learning","google-vision-api","image-processing","information-retrieval","nlp","ocr","ocr-python","python","semantic-search","tesseract-ocr","transformers"],"created_at":"2024-08-02T13:01:43.595Z","updated_at":"2025-04-05T20:05:23.884Z","avatar_url":"https://github.com/maxent-ai.png","language":"Jupyter Notebook","readme":"# ocrpy\n\n[![Downloads](https://static.pepy.tech/personalized-badge/ocrpy?period=total\u0026units=abbreviation\u0026left_color=black\u0026right_color=blue\u0026left_text=Downloads)](https://pepy.tech/project/ocrpy)\n![contributors](https://img.shields.io/github/contributors/maxent-ai/ocrpy?color=blue)\n![PyPi](https://img.shields.io/pypi/v/ocrpy?color=blue)\n![tag](https://img.shields.io/github/v/tag/maxent-ai/ocrpy)\n![mit-license](https://img.shields.io/github/license/maxent-ai/ocrpy?color=blue)\n\n__Unified interface to google vision, aws textract, azure, tesseract and other OCR tools__\n\nThe core objective of `ocrpy` is to let users perform OCR, archive, index and search any document with ease, providing an intuitive interface and a powerful Pipeline API to solve common OCR-based tasks.\n\n`ocrpy` achieves this by wrapping around the most popular OCR engines like [Tesseract OCR](https://tesseract-ocr.github.io/), [Aws Textract](https://aws.amazon.com/textract/), [Google Cloud Vision](https://cloud.google.com/vision/docs/ocr) and [Azure Computer Vision](https://azure.microsoft.com/en-in/services/cognitive-services/computer-vision/#features). It unifies the multitude of interfaces provided by a wide range of cloud tools \u0026 other open-source libraries under a common and easy-to-use interface for the user.\n\n![](docs/_static/ocrpy-workflow.png)\n\n## Getting Started\n\n`ocrpy` is a Python-only package hosted on [PyPI](https://pypi.org/project/ocrpy/).\nThe recommended installation method is [pip](https://pip.pypa.io/en/stable/)\n\n```bash\npip install ocrpy\n```\n\n## Day-to-Day Usage\n\n`ocrpy` provides various levels of abstraction for the user to perform OCR on different types of documents. The recommended and the best way to use `ocrpy` is through it's `pipeline` API as shown below.\n\nThe Pipeline API can be invoked in two ways. The first method is to define the config for running the pipeline as a yaml file and and then run the pipeline by loading it as follows: \n\n```python\n\n   from ocrpy import TextOcrPipeline\n\n   ocr_pipeline = TextOcrPipeline.from_config(\"ocrpy_config.yaml\")\n   ocr_pipeline.process()\n```\n\nAlternatively you can run a pipeline by directly instantiating the pipeline class as follows:\n\n```python\n\n   from ocrpy import TextOcrPipeline\n\n   pipeline = TextOcrPipeline(source_dir='s3://document_bucket/', \n                              destination_dir=\"gs://processed_document_bucket/outputs/\", \n                              parser_backend='aws-textract', \n                              credentials_config={\"AWS\": \"path/to/aws-credentials.env/file\", \n                                           \"GCP\": \"path/to/gcp-credentials.json/file\"})\n   pipeline.process()\n```\n\n\u003e :memo: For a more detailed set of examples and tutorials on how you could use ocrpy for your use case can be found at [ocrpy documentation](https://maxentlabs.com/ocrpy/).\n\n## Support and Documentation\n\n* For an in-depth reference of the `ocrpy` API refer to our [API docs](https://maxentlabs.com/ocrpy/api-reference.html).\n* For inspiration on how to use ocrpy for your usecase, check out our [tutorials](https://maxentlabs.com/ocrpy/tutorials.html) or our [examples](https://maxentlabs.com/ocrpy/examples.html).\n* If you're interested in understanding how ocrpy works, check out our [Ocrpy Overview](https://maxentlabs.com/ocrpy/overview.html).\n\n## Feedback and Contributions\n\n* If you have any questions, Feedback or notice something wrong, please open an issue on [GitHub Issues](https://github.com/maxent-ai/ocrpy/issues/).\n* If you are interested in contributing to the project, please open a PR on [GitHub Pull Requests](https://github.com/maxent-ai/ocrpy/pulls).\n* Or if you just want to say hi, feel free to [contact us](info@maxentlabs.com).\n\n## Citation\n\nIf you wish to cite this project, feel free to use this [BibTeX](http://www.bibtex.org/) reference:\n\n```bibtex\n@misc{ocrpy,\n    title={Ocrpy: OCR, Archive, Index and Search any documents with ease},\n    author={maxentlabs},\n    year={2022},\n    publisher = {GitHub},\n    howpublished = {\\url{https://github.com/maxent-ai/ocrpy}}\n}\n```\n\n## License and Credits\n\n* `ocrpy` is licensed under the [MIT](https://choosealicense.com/licenses/mit/) license.\nThe full license text can be also found in the [source code repository](https://github.com/maxent-ai/ocrpy/blob/main/LICENSE).\n* `ocrpy` is written and maintained by [Bharath G.S](https://github.com/bharathgs) and [Rita Anjana](https://github.com/AnjanaRita).\n* A full list of contributors can be found in [GitHub's overview](https://github.com/maxent-ai/ocrpy/graphs/contributors).\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxent-ai%2Focrpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxent-ai%2Focrpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxent-ai%2Focrpy/lists"}