{"id":14482773,"url":"https://github.com/philschmid/document-ai-transformers","last_synced_at":"2025-04-06T10:12:02.864Z","repository":{"id":63299589,"uuid":"526918260","full_name":"philschmid/document-ai-transformers","owner":"philschmid","description":null,"archived":false,"fork":false,"pushed_at":"2024-01-07T04:12:19.000Z","size":4448,"stargazers_count":355,"open_issues_count":9,"forks_count":52,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-30T09:06:44.275Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philschmid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-20T12:19:01.000Z","updated_at":"2025-03-27T04:02:18.000Z","dependencies_parsed_at":"2024-09-03T00:04:07.118Z","dependency_job_id":"7565c1b2-f462-407c-aaca-1922e68e6995","html_url":"https://github.com/philschmid/document-ai-transformers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philschmid%2Fdocument-ai-transformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philschmid%2Fdocument-ai-transformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philschmid%2Fdocument-ai-transformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philschmid%2Fdocument-ai-transformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philschmid","download_url":"https://codeload.github.com/philschmid/document-ai-transformers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247464223,"owners_count":20942970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-03T00:01:17.010Z","updated_at":"2025-04-06T10:12:02.839Z","avatar_url":"https://github.com/philschmid.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# Document AI with Hugging Face Transformers\n\n\n[Document AI](https://en.wikipedia.org/wiki/Document_AI) s a term that has become popular over the last 3 years. It defines machine learning models, tasks, and techniques to classify, parse, and extract information from documents in digital and print forms, like invoices, receipts, licenses, contracts, and business reports.\n\n![logo](./assets/logo.png)\n\nThis repository contains different example and tutorials on how to get started with Document AI and Transformers. Below you can also find a compendium of available models, tasks, datasets and other resources.\n\n**Training**\n* [fine-tuning donut with SROIE](./training/donut_sroie.ipynb)\n* [fine-tuning LayoutLM with FUNSD](./training/layoutlm_funsd.ipynb)\n* [fine-tuning LiLT with FUNSD](./training/lilt_funsd.ipynb)\n\n**Inference**\n* [Donut](./inference/donut_inference.ipynb)\n* [LayoutLM](./inference/layoutlm_inference.ipynb)\n* [LiLT](./inference/lilt_inference.ipynb)\n\n**Data-processing**\n\n* [convert FUNSD to donut document for vqa](./data_processing/FUNSD_for_Donut.ipynb)\n\n**Demos/Spaces**\n\nCommunity: \n* [fedihch/InvoiceReceiptClassifierDemo](https://huggingface.co/spaces/fedihch/InvoiceReceiptClassifierDemo)\n* [nielsr/LayoutLMv2-FUNSD](https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD)\n* [katanaml/LayoutLMv2-CORD](https://huggingface.co/spaces/katanaml/LayoutLMv2-CORD)\n* [nielsr/TrOCR-handwritten](https://huggingface.co/spaces/nielsr/TrOCR-handwritten)\n* [keras-io/ocr-for-captcha](https://huggingface.co/spaces/keras-io/ocr-for-captcha)\n* [nielsr/dit-document-layout-analysis](https://huggingface.co/spaces/nielsr/dit-document-layout-analysis)\n* [PatrickTyBrown/LoanDocumentClassifier](https://huggingface.co/spaces/PatrickTyBrown/LoanDocumentClassifier)\n* [Theivaprakasham/layoutlmv2_invoice](https://huggingface.co/spaces/Theivaprakasham/layoutlmv2_invoice)\n* [TMsp/invoice_processing_layoutlmv3_custom](https://huggingface.co/spaces/Msp/invoice_processing_layoutlmv3_custom)\n* [Epoching/DocumentQA](https://huggingface.co/spaces/Epoching/DocumentQA)\n* [impira/docquery](https://huggingface.co/spaces/impira/docquery)\n\npopular models are layoutlm.... \nand Donut which we will use today get a first impression of how you can build you own document AI System using Hugging Face Transformers.\n\n### Machine Learning Models (Transformers)\n\nBelow you can find a table of the currently available Transformers models, who are achieving state-of-the-art performance on Document AI tasks. \n\n| model                                                                   | paper                                     | license                                                       | checkpoints                                                 |\n|-------------------------------------------------------------------------|-------------------------------------------|---------------------------------------------------------------|-------------------------------------------------------------|\n| [Donut](https://huggingface.co/docs/transformers/main/en/model_doc/donut#overview) | [arxiv](https://arxiv.org/abs/2111.15664) | [MIT](https://github.com/clovaai/donut#license) | [huggingface](https://huggingface.co/models?other=donut) |\n| [LiLT](https://huggingface.co/docs/transformers/main/en/model_doc/lilt#overview) | [arxiv](https://arxiv.org/abs/2202.13669) | [MIT](https://github.com/clovaai/donut#license) | [huggingface](https://huggingface.co/models?other=lilt) |\n| [LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm) | [arxiv](https://arxiv.org/abs/1912.13318) | [MIT](https://github.com/microsoft/unilm/blob/master/LICENSE) | [huggingface](https://huggingface.co/models?other=layoutlm) |\n| [LMLayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlm) | [arxiv](https://arxiv.org/abs/2104.08836) | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | [huggingface](https://huggingface.co/microsoft/layoutxlm-base) |\n| [LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlm) | [arxiv](https://arxiv.org/abs/2012.14740) | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | [huggingface](https://huggingface.co/models?other=layoutlmv2) |\n| [LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlm) | [arxiv](https://arxiv.org/abs/2204.08387) | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | [huggingface](https://huggingface.co/models?other=layoutlmv3) |\n| [DiT](https://huggingface.co/docs/transformers/model_doc/dit) | [arxiv](https://arxiv.org/abs/2203.02378) | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | [huggingface](https://huggingface.co/models?other=dit) |\n| [TrOCR](https://huggingface.co/docs/transformers/main/en/model_doc/trocr) | [arxiv](https://arxiv.org/abs/2109.10282) | [MIT](https://github.com/microsoft/unilm/blob/master/LICENSE) | [huggingface](https://huggingface.co/models?filter=trocr) |\n\n### Tasks\n\nDocument AI includes the following use cases and tasks:\n\n* document classification (image-classification)\n* document parsing  (form understanding \u0026 information extraction)\n* visual question answering\n* table detection/layout analysis\n* optical character recognition (OCR)\n\n### Datasets\n\n| Dataset                                                                   | Task                                      |                        Hugging Face Datasets                          |\n|-------------------------------------------------------------------------|-------------------------------------------|-------------------------------------------------------------|\n| [SROIE](https://github.com/zzzDavid/ICDAR-2019-SROIE) | document parsing | [darentang/sroie](https://huggingface.co/datasets/darentang/sroie/blob/main/sroie.py) |\n| [RVL-CDIP](https://huggingface.co/datasets/rvl_cdip) | document classification | [rvl_cdip](https://huggingface.co/datasets/rvl_cdip) |\n| [XFUND](https://github.com/doc-analysis/XFUND)   | document parsing |[ranpox/xfund](https://huggingface.co/datasets/ranpox/xfund) | \n| [FUNSD](https://guillaumejaume.github.io/FUNSD/)    | document parsing  | [nielsr/funsd](https://huggingface.co/datasets/nielsr/funsd) |\n| [CORD](https://github.com/clovaai/cord)    | information extraction/parsing  | [naver-cola-ix/cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) |\n| [DocVQA](https://www.docvqa.org/)    | visual question answering  | [_load manually_](https://rrc.cvc.uab.es/?ch=17\u0026com=downloads) |\n| [WildReceipt](https://paperswithcode.com/dataset/wildreceipt)    | document parsing | [Theivaprakasham/wildreceipt](https://huggingface.co/datasets/Theivaprakasham/wildreceipt) |\n| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) |  table detection/layout analysis | [_load manually_](https://doc-analysis.github.io/tablebank-page/index.html) |\n| [DocBank](https://doc-analysis.github.io/docbank-page/index.html)    |  table detection/layout analysis | [_load manually_](https://doc-analysis.github.io/docbank-page/index.html) |\n| [ReadingBank](https://github.com/doc-analysis/ReadingBank)    | table detection/layout analysis  | [_load manually_](https://github.com/doc-analysis/ReadingBank) |\n| [EATEN](https://github.com/beacandler/EATEN)    | document parsing  | [_load manually_](https://github.com/beacandler/EATEN) |\n| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)    | table detection/layout analysis  | [jordanparker6/publaynet](https://huggingface.co/datasets/jordanparker6/publaynet) |\n| [ICDAR2019_cTDaR](https://github.com/cndplab-founder/ICDAR2019_cTDaR)    | table detection/layout analysis  | [_load manually_](https://cndplab-founder.github.io/cTDaR2019/dataset-training.html) |\n\n\n### APIs and existing Solutuions\n\n* [Amazon Textract](https://aws.amazon.com/de/textract/)\n* [Google Cloud Document AI](https://cloud.google.com/document-ai/)\n* [Azure Form Recognizer](https://azure.microsoft.com/en-us/services/form-recognizer/#features)\n\n### Other Tools\n\n* [SynthDoG 🐶: Synthetic Document Generator](https://github.com/clovaai/donut/tree/master/synthdog)\n\n### Resources\n\n[OCR-Free Document Understanding with Donut](https://towardsdatascience.com/ocr-free-document-understanding-with-donut-1acfbdf099be)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilschmid%2Fdocument-ai-transformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilschmid%2Fdocument-ai-transformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilschmid%2Fdocument-ai-transformers/lists"}