{"id":18369598,"url":"https://github.com/unstructured-io/pipeline-receipts","last_synced_at":"2025-04-10T19:44:00.279Z","repository":{"id":134709576,"uuid":"612155580","full_name":"Unstructured-IO/pipeline-receipts","owner":"Unstructured-IO","description":"Preprocessing pipeline notebooks and API supporting text extraction from receipts images","archived":false,"fork":false,"pushed_at":"2023-06-20T22:07:10.000Z","size":1453,"stargazers_count":2,"open_issues_count":6,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-15T20:56:36.371Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Unstructured-IO.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-10T10:15:15.000Z","updated_at":"2024-09-28T03:23:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"11456ac9-394d-4e1e-b66b-c4034d03517e","html_url":"https://github.com/Unstructured-IO/pipeline-receipts","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-receipts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-receipts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-receipts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-receipts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Unstructured-IO","download_url":"https://codeload.github.com/Unstructured-IO/pipeline-receipts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248281424,"owners_count":21077423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T23:29:54.952Z","updated_at":"2025-04-10T19:44:00.252Z","avatar_url":"https://github.com/Unstructured-IO.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch3 align=\"center\"\u003e\n  \u003cimg src=\"img/unstructured_logo.png\" height=\"200\"\u003e\n\u003c/h3\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n  \u003ca href=\"\"\u003e![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/l/unstructured.svg)\u003c/a\u003e\n  \u003ca href=\"\"\u003e![https://pypi.python.org/pypi/unstructured/](https://img.shields.io/pypi/pyversions/unstructured.svg)\u003c/a\u003e\n  \u003ca href=\"\"\u003e![https://github.com/Naereen/badges/](https://badgen.net/badge/Open%20Source%20%3F/Yes%21/blue?icon=github)\u003c/a\u003e\n  \n\u003c/div\u003e\n\n\u003ch3 align=\"center\"\u003e\n  \u003cp\u003ePre-Processing Pipeline for Receipts\u003c/p\u003e\n\u003c/h3\u003e\n\n\nThis repo implements a document pre-processing pipeline for receipts. Currently, the pipeline is under development. The pipeline assumes the receipts are in PDF or image formats (JPG, PNG).\n\nThe API is hosted at `https://api.unstructured.io`.\n\n## :coffee: Getting Started \n\n* Using `pyenv` to manage virtualenv's is recommended\n    * Mac install instructions:\n        * `brew install pyenv-virtualenv`\n        * `pyenv install 3.8.15`\n    \n    Create a virtualenv to work in and activate it, e.g. for one named `receipts`:\n    \n    `pyenv  virtualenv 3.8.15 receipts` \u003cbr /\u003e\n    `pyenv activate receipts`\n\n* Run `make install` \n* Start a local jupyter notebook server with `make run-jupyter` \u003cbr /\u003e\n    **OR** \u003cbr /\u003e\n    just start the fast-API locally with `make run-web-app`\n    \n#### Extracting Structured Text from an Receipt Image\nAfter API starts, you can extract the elements of Receipt files with the command:\n```\ncurl -X 'POST' \\\n  'http://localhost:8000/receipts/v0.1.0/receipts' \\\n  -F 'files=@\u003cyour_receipt_file\u003e' \\\n  | jq -C . | less -R\n```\n\n### Generating Python files from the pipeline notebooks\n\nYou can generate the FastAPI APIs from your pipeline notebooks by running `make generate-api`.\n\n## :guardsman: Security Policy\n\nSee our [security policy](https://github.com/Unstructured-IO/pipeline-receipts/security/policy) for\ninformation on how to report security vulnerabilities.\n\n## 🤗 Hugging Face\n\n[Hugging Face Spaces](https://huggingface.co/spaces) offer a simple way to host ML demo apps, models and datasets directly on our organization’s profile. This allows us to showcase our projects and work collaboratively with other people in the ML ecosystem. Visit our space [here](https://huggingface.co/unstructuredio)!\n\n## Learn more\n\n| Section | Description |\n|-|-|\n| [Company Website](https://unstructured.io) | Unstructured.io product and company info |\n| [Fine-tuned Models and Data](https://huggingface.co/naver-clova-ix) | CORD Consolidated Receipt dataset and Donut model |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Fpipeline-receipts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funstructured-io%2Fpipeline-receipts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Fpipeline-receipts/lists"}