{"id":13415568,"url":"https://github.com/mindee/doctr","last_synced_at":"2025-05-14T09:03:27.776Z","repository":{"id":36956387,"uuid":"327949189","full_name":"mindee/doctr","owner":"mindee","description":"docTR (Document Text Recognition) - a seamless, high-performing \u0026 accessible library for OCR-related tasks powered by Deep Learning.","archived":false,"fork":false,"pushed_at":"2025-04-29T16:05:34.000Z","size":99554,"stargazers_count":4632,"open_issues_count":31,"forks_count":502,"subscribers_count":42,"default_branch":"main","last_synced_at":"2025-05-07T08:02:36.501Z","etag":null,"topics":["deep-learning","document-recognition","ocr","optical-character-recognition","pytorch","tensorflow2","text-detection","text-detection-recognition","text-recognition"],"latest_commit_sha":null,"homepage":"https://mindee.github.io/doctr/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mindee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-01-08T16:05:12.000Z","updated_at":"2025-05-07T01:17:54.000Z","dependencies_parsed_at":"2023-02-10T08:45:41.810Z","dependency_job_id":"67e06602-23c9-4c51-8f94-29b84e344759","html_url":"https://github.com/mindee/doctr","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mindee%2Fdoctr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mindee%2Fdoctr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mindee%2Fdoctr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mindee%2Fdoctr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mindee","download_url":"https://codeload.github.com/mindee/doctr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254110372,"owners_count":22016391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","document-recognition","ocr","optical-character-recognition","pytorch","tensorflow2","text-detection","text-detection-recognition","text-recognition"],"created_at":"2024-07-30T21:00:50.378Z","updated_at":"2025-05-14T09:03:27.712Z","avatar_url":"https://github.com/mindee.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/mindee/doctr/raw/main/docs/images/Logo_doctr.gif\" width=\"40%\"\u003e\n\u003c/p\u003e\n\n[![Slack Icon](https://img.shields.io/badge/Slack-Community-4A154B?style=flat-square\u0026logo=slack\u0026logoColor=white)](https://slack.mindee.com) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) ![Build Status](https://github.com/mindee/doctr/workflows/builds/badge.svg) [![Docker Images](https://img.shields.io/badge/Docker-4287f5?style=flat\u0026logo=docker\u0026logoColor=white)](https://github.com/mindee/doctr/pkgs/container/doctr) [![codecov](https://codecov.io/gh/mindee/doctr/branch/main/graph/badge.svg?token=577MO567NM)](https://codecov.io/gh/mindee/doctr) [![CodeFactor](https://www.codefactor.io/repository/github/mindee/doctr/badge?s=bae07db86bb079ce9d6542315b8c6e70fa708a7e)](https://www.codefactor.io/repository/github/mindee/doctr) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/340a76749b634586a498e1c0ab998f08)](https://app.codacy.com/gh/mindee/doctr?utm_source=github.com\u0026utm_medium=referral\u0026utm_content=mindee/doctr\u0026utm_campaign=Badge_Grade) [![Doc Status](https://github.com/mindee/doctr/workflows/doc-status/badge.svg)](https://mindee.github.io/doctr) [![Pypi](https://img.shields.io/badge/pypi-v0.11.0-blue.svg)](https://pypi.org/project/python-doctr/) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mindee/notebooks/blob/main/doctr/quicktour.ipynb) [![Gurubase](https://img.shields.io/badge/Gurubase-Ask%20docTR%20Guru-006BFF)](https://gurubase.io/g/doctr)\n\n\n**Optical Character Recognition made seamless \u0026 accessible to anyone, powered by TensorFlow 2 \u0026 PyTorch**\n\nWhat you can expect from this repository:\n\n- efficient ways to parse textual information (localize and identify each word) from your documents\n- guidance on how to integrate this in your current architecture\n\n![OCR_example](https://github.com/mindee/doctr/raw/main/docs/images/ocr.png)\n\n## Quick Tour\n\n### Getting your pretrained model\n\nEnd-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word).\nAs such, you can select the architecture used for [text detection](https://mindee.github.io/doctr/latest/modules/models.html#doctr-models-detection), and the one for [text recognition](https://mindee.github.io/doctr/latest//modules/models.html#doctr-models-recognition) from the list of available implementations.\n\n```python\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n```\n\n### Reading files\n\nDocuments can be interpreted from PDF or images:\n\n```python\nfrom doctr.io import DocumentFile\n# PDF\npdf_doc = DocumentFile.from_pdf(\"path/to/your/doc.pdf\")\n# Image\nsingle_img_doc = DocumentFile.from_images(\"path/to/your/img.jpg\")\n# Webpage (requires `weasyprint` to be installed)\nwebpage_doc = DocumentFile.from_url(\"https://www.yoursite.com\")\n# Multiple page images\nmulti_img_doc = DocumentFile.from_images([\"path/to/page1.jpg\", \"path/to/page2.jpg\"])\n```\n\n### Putting it together\n\nLet's use the default pretrained model for an example:\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import ocr_predictor\n\nmodel = ocr_predictor(pretrained=True)\n# PDF\ndoc = DocumentFile.from_pdf(\"path/to/your/doc.pdf\")\n# Analyze\nresult = model(doc)\n```\n\n### Dealing with rotated documents\n\nShould you use docTR on documents that include rotated pages, or pages with multiple box orientations,\nyou have multiple options to handle it:\n\n- If you only use straight document pages with straight words (horizontal, same reading direction),\nconsider passing `assume_straight_boxes=True` to the ocr_predictor. It will directly fit straight boxes\non your page and return straight boxes, which makes it the fastest option.\n\n- If you want the predictor to output straight boxes (no matter the orientation of your pages, the final localizations\nwill be converted to straight boxes), you need to pass `export_as_straight_boxes=True` in the predictor. Otherwise, if `assume_straight_pages=False`, it will return rotated bounding boxes (potentially with an angle of 0°).\n\nIf both options are set to False, the predictor will always fit and return rotated boxes.\n\nTo interpret your model's predictions, you can visualize them interactively as follows:\n\n```python\n# Display the result (requires matplotlib \u0026 mplcursors to be installed)\nresult.show()\n```\n\n![Visualization sample](https://github.com/mindee/doctr/raw/main/docs/images/doctr_example_script.gif)\n\nOr even rebuild the original document from its predictions:\n\n```python\nimport matplotlib.pyplot as plt\n\nsynthetic_pages = result.synthesize()\nplt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()\n```\n\n![Synthesis sample](https://github.com/mindee/doctr/raw/main/docs/images/synthesized_sample.png)\n\nThe `ocr_predictor` returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).\nTo get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/modules/io.html#document-structure):\n\nYou can also export them as a nested dict, more appropriate for JSON format:\n\n```python\njson_output = result.export()\n```\n\n### Use the KIE predictor\n\nThe KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and addresses in a document.\n\nThe KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.\n\n```python\nfrom doctr.io import DocumentFile\nfrom doctr.models import kie_predictor\n\n# Model\nmodel = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)\n# PDF\ndoc = DocumentFile.from_pdf(\"path/to/your/doc.pdf\")\n# Analyze\nresult = model(doc)\n\npredictions = result.pages[0].predictions\nfor class_name in predictions.keys():\n    list_predictions = predictions[class_name]\n    for prediction in list_predictions:\n        print(f\"Prediction for {class_name}: {prediction}\")\n```\n\nThe KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.\n\n### If you are looking for support from the Mindee team\n\n[![Bad OCR test detection image asking the developer if they need help](https://github.com/mindee/doctr/raw/main/docs/images/doctr-need-help.png)](https://mindee.com/product/doctr)\n\n## Installation\n\n### Prerequisites\n\nPython 3.10 (or higher) and [pip](https://pip.pypa.io/en/stable/) are required to install docTR.\n\n### Latest release\n\nYou can then install the latest release of the package using [pypi](https://pypi.org/project/python-doctr/) as follows:\n\n```shell\npip install python-doctr\n```\n\n\u003e :warning: Please note that the basic installation is not standalone, as it does not provide a deep learning framework, which is required for the package to run.\n\nWe try to keep framework-specific dependencies to a minimum. You can install framework-specific builds as follows:\n\n```shell\n# for TensorFlow\npip install \"python-doctr[tf]\"\n# for PyTorch\npip install \"python-doctr[torch]\"\n# optional dependencies for visualization, html, and contrib modules can be installed as follows:\npip install \"python-doctr[torch,viz,html,contib]\"\n```\n\nFor MacBooks with M1 chip, you will need some additional packages or specific versions:\n\n- TensorFlow 2: [metal plugin](https://developer.apple.com/metal/tensorflow-plugin/)\n- PyTorch: [version \u003e= 2.0.0](https://pytorch.org/get-started/locally/#start-locally)\n\n### Developer mode\n\nAlternatively, you can install it from source, which will require you to install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).\nFirst clone the project repository:\n\n```shell\ngit clone https://github.com/mindee/doctr.git\npip install -e doctr/.\n```\n\nAgain, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:\n\n```shell\n# for TensorFlow\npip install -e doctr/.[tf]\n# for PyTorch\npip install -e doctr/.[torch]\n```\n\n## Models architectures\n\nCredits where it's due: this repository is implementing, among others, architectures from published research papers.\n\n### Text Detection\n\n- DBNet: [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf).\n- LinkNet: [LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation](https://arxiv.org/pdf/1707.03718.pdf)\n- FAST: [FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation](https://arxiv.org/pdf/2111.02394.pdf)\n\n### Text Recognition\n\n- CRNN: [An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition](https://arxiv.org/pdf/1507.05717.pdf).\n- SAR: [Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition](https://arxiv.org/pdf/1811.00751.pdf).\n- MASTER: [MASTER: Multi-Aspect Non-local Network for Scene Text Recognition](https://arxiv.org/pdf/1910.02562.pdf).\n- ViTSTR: [Vision Transformer for Fast and Efficient Scene Text Recognition](https://arxiv.org/pdf/2105.08582.pdf).\n- PARSeq: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https://arxiv.org/pdf/2207.06966).\n- VIPTR: [A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition](https://arxiv.org/abs/2401.10110).\n\n## More goodies\n\n### Documentation\n\nThe full package documentation is available [here](https://mindee.github.io/doctr/) for detailed specifications.\n\n### Demo app\n\nA minimal demo app is provided for you to play with our end-to-end OCR models!\n\n![Demo app](https://github.com/mindee/doctr/raw/main/docs/images/demo_update.png)\n\n#### Live demo\n\nCourtesy of :hugs: [Hugging Face](https://huggingface.co/) :hugs:, docTR has now a fully deployed version available on [Spaces](https://huggingface.co/spaces)!\nCheck it out [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mindee/doctr)\n\n#### Running it locally\n\nIf you prefer to use it locally, there is an extra dependency ([Streamlit](https://streamlit.io/)) that is required.\n\n##### Tensorflow version\n\n```shell\npip install -r demo/tf-requirements.txt\n```\n\nThen run your app in your default browser with:\n\n```shell\nUSE_TF=1 streamlit run demo/app.py\n```\n\n##### PyTorch version\n\n```shell\npip install -r demo/pt-requirements.txt\n```\n\nThen run your app in your default browser with:\n\n```shell\nUSE_TORCH=1 streamlit run demo/app.py\n```\n\n#### TensorFlow.js\n\nInstead of having your demo actually running Python, you would prefer to run everything in your web browser?\nCheck out our [TensorFlow.js demo](https://github.com/mindee/doctr-tfjs-demo) to get started!\n\n![TFJS demo](https://github.com/mindee/doctr/raw/main/docs/images/demo_illustration_mini.png)\n\n### Docker container\n\nWe offer Docker container support for easy testing and deployment. [Here are the available docker tags.](https://github.com/mindee/doctr/pkgs/container/doctr).\n\n#### Using GPU with docTR Docker Images\n\nThe docTR Docker images are GPU-ready and based on CUDA `12.2`. Make sure your host is **at least `12.2`**, otherwise Torch or TensorFlow won't be able to initialize the GPU.\nPlease ensure that Docker is configured to use your GPU.\n\nTo verify and configure GPU support for Docker, please follow the instructions provided in the [NVIDIA Container Toolkit Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).\n\nOnce Docker is configured to use GPUs, you can run docTR Docker containers with GPU support:\n\n```shell\ndocker run -it --gpus all ghcr.io/mindee/doctr:torch-py3.9.18-2024-10 bash\n```\n\n#### Available Tags\n\nThe Docker images for docTR follow a specific tag nomenclature: `\u003cdeps\u003e-py\u003cpython_version\u003e-\u003cdoctr_version|YYYY-MM\u003e`. Here's a breakdown of the tag structure:\n\n- `\u003cdeps\u003e`: `tf`, `torch`, `tf-viz-html-contrib` or `torch-viz-html-contrib`.\n- `\u003cpython_version\u003e`: `3.9.18`, `3.10.13` or `3.11.8`.\n- `\u003cdoctr_version\u003e`: a tag \u003e= `v0.11.0`\n- `\u003cYYYY-MM\u003e`: e.g. `2014-10`\n\nHere are examples of different image tags:\n\n| Tag                        | Description                                       |\n|----------------------------|---------------------------------------------------|\n| `tf-py3.10.13-v0.11.0`       | TensorFlow version `3.10.13` with docTR `v0.11.0`. |\n| `torch-viz-html-contrib-py3.11.8-2024-10`       | Torch with extra dependencies version `3.11.8` from latest commit on `main` in `2024-10`. |\n| `torch-py3.11.8-2024-10`| PyTorch version `3.11.8` from latest commit on `main` in `2024-10`. |\n\n#### Building Docker Images Locally\n\nYou can also build docTR Docker images locally on your computer.\n\n```shell\ndocker build -t doctr .\n```\n\nYou can specify custom Python versions and docTR versions using build arguments. For example, to build a docTR image with TensorFlow, Python version `3.9.10`, and docTR version `v0.7.0`, run the following command:\n\n```shell\ndocker build -t doctr --build-arg FRAMEWORK=tf --build-arg PYTHON_VERSION=3.9.10 --build-arg DOCTR_VERSION=v0.7.0 .\n```\n\n### Example script\n\nAn example script is provided for a simple documentation analysis of a PDF or image file:\n\n```shell\npython scripts/analyze.py path/to/your/doc.pdf\n```\n\nAll script arguments can be checked using `python scripts/analyze.py --help`\n\n### Minimal API integration\n\nLooking to integrate docTR into your API? Here is a template to get you started with a fully working API using the wonderful [FastAPI](https://github.com/tiangolo/fastapi) framework.\n\n#### Deploy your API locally\n\nSpecific dependencies are required to run the API template, which you can install as follows:\n\n```shell\ncd api/\npip install poetry\nmake lock\npip install -r requirements.txt\n```\n\nYou can now run your API locally:\n\n```shell\nuvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app\n```\n\nAlternatively, you can run the same server on a docker container if you prefer using:\n\n```shell\nPORT=8002 docker-compose up -d --build\n```\n\n#### What you have deployed\n\nYour API should now be running locally on your port 8002. Access your automatically-built documentation at [http://localhost:8002/redoc](http://localhost:8002/redoc) and enjoy your three functional routes (\"/detection\", \"/recognition\", \"/ocr\", \"/kie\"). Here is an example with Python to send a request to the OCR route:\n\n```python\nimport requests\n\nparams = {\"det_arch\": \"db_resnet50\", \"reco_arch\": \"crnn_vgg16_bn\"}\n\nwith open('/path/to/your/doc.jpg', 'rb') as f:\n    files = [  # application/pdf, image/jpeg, image/png supported\n        (\"files\", (\"doc.jpg\", f.read(), \"image/jpeg\")),\n    ]\nprint(requests.post(\"http://localhost:8080/ocr\", params=params, files=files).json())\n```\n\n### Example notebooks\n\nLooking for more illustrations of docTR features? You might want to check the [Jupyter notebooks](https://github.com/mindee/doctr/tree/main/notebooks) designed to give you a broader overview.\n\n## Citation\n\nIf you wish to cite this project, feel free to use this [BibTeX](http://www.bibtex.org/) reference:\n\n```bibtex\n@misc{doctr2021,\n    title={docTR: Document Text Recognition},\n    author={Mindee},\n    year={2021},\n    publisher = {GitHub},\n    howpublished = {\\url{https://github.com/mindee/doctr}}\n}\n```\n\n## Contributing\n\nIf you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?\n\nYou're in luck, we compiled a short guide (cf. [`CONTRIBUTING`](https://mindee.github.io/doctr/contributing/contributing.html)) for you to easily do so!\n\n## License\n\nDistributed under the Apache 2.0 License. See [`LICENSE`](https://github.com/mindee/doctr?tab=Apache-2.0-1-ov-file#readme) for more information.\n","funding_links":[],"categories":["1. \u003ca name='Software'\u003e\u003c/a\u003eSoftware","Optical Character Recognition Engines and Frameworks","Python","Software"],"sub_categories":["1.1. \u003ca name='OCRengines'\u003e\u003c/a\u003eOCR engines","CTPN [paper:2016](https://arxiv.org/pdf/1609.03605.pdf)","OCR engines"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmindee%2Fdoctr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmindee%2Fdoctr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmindee%2Fdoctr/lists"}