{"id":13492427,"url":"https://github.com/naiveHobo/InvoiceNet","last_synced_at":"2025-03-28T10:32:12.012Z","repository":{"id":41160127,"uuid":"139323243","full_name":"naiveHobo/InvoiceNet","owner":"naiveHobo","description":"Deep neural network to extract intelligent information from invoice documents.","archived":false,"fork":false,"pushed_at":"2024-05-03T20:12:57.000Z","size":46027,"stargazers_count":2576,"open_issues_count":72,"forks_count":400,"subscribers_count":76,"default_branch":"main","last_synced_at":"2025-03-23T19:09:53.255Z","etag":null,"topics":["billing","classification","deep-learning","deep-neural-networks","deeplearning","information-extraction","information-retrieval","invoice","invoice-insight","invoice-management","invoice-parser","invoice-pdf","invoice-software","invoices","keras","keras-neural-networks","keras-tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/naiveHobo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-01T11:25:57.000Z","updated_at":"2025-03-22T18:48:01.000Z","dependencies_parsed_at":"2024-05-03T21:30:14.029Z","dependency_job_id":"a47b9631-3878-4df7-9526-8c9b25510962","html_url":"https://github.com/naiveHobo/InvoiceNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naiveHobo%2FInvoiceNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naiveHobo%2FInvoiceNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naiveHobo%2FInvoiceNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naiveHobo%2FInvoiceNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/naiveHobo","download_url":"https://codeload.github.com/naiveHobo/InvoiceNet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246012566,"owners_count":20709468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["billing","classification","deep-learning","deep-neural-networks","deeplearning","information-extraction","information-retrieval","invoice","invoice-insight","invoice-management","invoice-parser","invoice-pdf","invoice-software","invoices","keras","keras-neural-networks","keras-tensorflow"],"created_at":"2024-07-31T19:01:05.925Z","updated_at":"2025-03-28T10:32:06.990Z","avatar_url":"https://github.com/naiveHobo.png","language":"Python","readme":"![InvoiceNet Logo](_images/logo.png)\n\n--------------------------------------------------------------------------------\n\nDeep neural network to extract intelligent information from invoice documents.\n\n**TL;DR**\n\n* An easy to use UI to view PDF/JPG/PNG invoices and extract information.\n* Train custom models using the Trainer UI on your own dataset.\n* Add or remove invoice fields as per your convenience.\n* Save the extracted information into your system with the click of a button.\n\n:star: We appreciate your star, it helps!\n\nThe InvoiceNet logo was designed by [Sidhant Tibrewal](https://www.linkedin.com/in/sidhant-tibrewal-864058148/).\n[Check out](https://www.behance.net/tiber_sid) his work for some more beautiful designs.\n\n---\n\n![InvoiceNet](_images/invoicenet.png)\n\n---\n\n**DISCLAIMER**: \n\nPre-trained models for some general invoice fields are not available right now but will soon be provided.\nThe training GUI and data preparation scripts have been made available.\n\nInvoice documents contain sensitive information because of which collecting a sizable dataset has proven to be difficult.\nThis makes it difficult for developers like us to train large-scale generalised models and make them available to the community.\n\nIf you have a dataset of invoice documents that you are comfortable sharing with us, please reach out (\u003csarthakmittal2608@gmail.com\u003e).\nWe have the tools to create the first publicly-available large-scale invoice dataset along with a software platform for structured information extraction.\n\n---\n\n## Installation\n\n#### Ubuntu 20.04\n\nInvoiceNet has been developed and tested on **Ubuntu 20.04** with **CUDA Version: 11.8**, **cuDNN version: 8.9.7**, and **Tensorflow v2.13.1**.\n\nTo install InvoiceNet on Ubuntu, run the following commands:\n\n```bash\ngit clone https://github.com/naiveHobo/InvoiceNet.git\ncd InvoiceNet/\n\n# Run installation script\n./install.sh\n```\n\nThe install.sh script will install all the dependencies, create a virtual environment, and install InvoiceNet in the virtual environment.\n\nTo be able to use InvoiceNet, you need to source the virtual environment that the package was installed in.\n\n```bash\n# Source virtual environment\nsource env/bin/activate\n```\n\n#### Windows 10\n\nThe recommended way is to install InvoiceNet along with its dependencies in an Anaconda environment:\n\n```bash\ngit clone https://github.com/naiveHobo/InvoiceNet.git\ncd InvoiceNet/\n\n# Create conda environment and activate\nconda create --name invoicenet python=3.7\nconda activate invoicenet\n\n# Install InvoiceNet\npip install .\n\n# Install poppler\nconda install -c conda-forge poppler\n```\n\nSome dependencies also need to be installed separately on Windows 10 before running InvoiceNet:\n\n- [Tesseract 5.0.0](https://github.com/UB-Mannheim/tesseract/wiki)\n- [ImageMagick 7.0.10](https://imagemagick.org/script/download.php#windows)\n- [Ghostscript 9.52](https://www.ghostscript.com/download/gsdnld.html)\n\n\n\n## Data Preparation\nThe training data must be arranged in a single directory. The invoice documents are expected be PDF files and each invoice is expected to have a corresponding JSON label file with the same name. Your training data should be in the following format:\n\n```\ntrain_data/\n    invoice1.pdf\n    invoice1.json\n    nike-invoice.pdf\n    nike-invoice.json\n    12345.pdf\n    12345.json\n    ...\n```\n\nThe JSON labels should have the following format:\n```\n{\n \"vendor_name\":\"Nike\",\n \"invoice_date\":\"12-01-2017\",\n \"invoice_number\":\"R0007546449\",\n \"total_amount\":\"137.51\",\n ... other fields\n}\n```\n\nTo begin the data preparation process, click on the \"Prepare Data\" button in the GUI or follow the instructions below if you're using the CLI.\n\n\n## Add Your Own Fields\nTo add your own fields to InvoiceNet, open **invoicenet/\\_\\_init\\_\\_.py**.\n\nThere are 4 pre-defined field types:\n- **FIELD_TYPES[\"general\"]** : General field like names, address, invoice number, etc.\n- **FIELD_TYPES[\"optional\"]** : Optional fields that might not be present in all invoices.\n- **FIELD_TYPES[\"amount\"]** : Fields that represent an amount.\n- **FIELD_TYPES[\"date\"]** : Fields that represent a date.\n\nChoose the appropriate field type for the field and add the line mentioned below.\n\n```python\n# Add the following line at the end of the file\n\n# For example, to add a field total_amount\nFIELDS[\"total_amount\"] = FIELD_TYPES[\"amount\"]\n\n# For example, to add a field invoice_date\nFIELDS[\"invoice_date\"] = FIELD_TYPES[\"date\"]\n\n# For example, to add a field tax_id (which might be optional)\nFIELDS[\"tax_id\"] = FIELD_TYPES[\"optional\"]\n\n# For example, to add a field vendor_name\nFIELDS[\"vendor_name\"] = FIELD_TYPES[\"general\"]\n```\n\n\n## Using the GUI\nInvoiceNet provides you with a GUI to train a model on your data and extract information from invoice documents using this trained model\n\n![Trainer](_images/trainer.png)\n\n\nRun the following command to run the trainer GUI:\n\n```bash\npython trainer.py\n```\n\nRun the following command to run the extractor GUI:\n\n```bash\npython extractor.py\n```\n\nYou need to prepare the data for training first. \nYou can do so by setting the **Data Folder** field to the directory containing your training data and the clicking the **Prepare Data** button.\nOnce the data is prepared, you can start training by clicking the **Start** button.\n\n\n## Using the CLI\n\n### Training \n\nPrepare the data for training first by running the following command:\n```bash\npython prepare_data.py --data_dir train_data/\n```\n\nTrain InvoiceNet using the following command:\n```bash\npython train.py --field enter-field-here --batch_size 8\n\n# For example, for field 'total_amount'\npython train.py --field total_amount --batch_size 8\n```\n\n---\n\n### Prediction\nIf you are trying to use different ocr, change the ocr_engine in this function before running predict.py [create_ngrams.py](https://github.com/naiveHobo/InvoiceNet/blob/e883158a690726afd1de5b76b5810287013577c6/invoicenet/common/util.py#L193)\n\n---\n\n#### Single invoice\nTo extract a field from a single invoice file, run the following command:\n```bash\npython predict.py --field enter-field-here --invoice path-to-invoice-file\n\n# For example, to extract field total_amount from an invoice file invoices/1.pdf\npython predict.py --field total_amount --invoice invoices/1.pdf\n```\n\n---\n\n#### Multiple invoices\nFor extracting information using the trained InvoiceNet model, you just need to place the PDF invoice documents in one directory in the following format:\n\n```\npredict_data/\n    invoice1.pdf\n    invoice2.pdf\n    ...\n```\n\nRun InvoiceNet using the following command:\n```bash\npython predict.py --field enter-field-here --data_dir predict_data/\n\n# For example, for field 'total_amount'\npython predict.py --field total_amount --data_dir predict_data/\n```\n---\n\n## Reference\nThis implementation is largely based on the work of R. Palm et al, who should be cited if this is used in a scientific publication (or the preceding conference papers):\n\n[1] Palm, Rasmus Berg, Florian Laws, and Ole Winther. **\"Attend, Copy, Parse End-to-end information extraction from documents.\"** 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019.\n\n```bibtex\n@inproceedings{palm2019attend,\n  title={Attend, Copy, Parse End-to-end information extraction from documents},\n  author={Palm, Rasmus Berg and Laws, Florian and Winther, Ole},\n  booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)},\n  pages={329--336},\n  year={2019},\n  organization={IEEE}\n}\n```\n\n### Note\nAn implementation of an inferior (also slightly broken) invoice handling system based on the paper **\"Cloudscan - A configuration-free invoice analysis system using recurrent neural networks.\"** is available [here](https://github.com/naiveHobo/InvoiceNet/tree/cloudscan).\n\n[2] Palm, Rasmus Berg, Ole Winther, and Florian Laws. **\"Cloudscan - A configuration-free invoice analysis system using recurrent neural networks.\"** 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017.\n\n```bibtex\n@inproceedings{palm2017cloudscan,\n  title={Cloudscan-a configuration-free invoice analysis system using recurrent neural networks},\n  author={Palm, Rasmus Berg and Winther, Ole and Laws, Florian},\n  booktitle={2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)},\n  volume={1},\n  pages={406--413},\n  year={2017},\n  organization={IEEE}\n}\n```\n","funding_links":[],"categories":["Python","Invoice","Documentation and Presentation","📦 Legacy \u0026 Inactive Projects"],"sub_categories":["European VAT","Extractors"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FnaiveHobo%2FInvoiceNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FnaiveHobo%2FInvoiceNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FnaiveHobo%2FInvoiceNet/lists"}