{"id":18369635,"url":"https://github.com/unstructured-io/unstructured-inference","last_synced_at":"2026-04-03T05:03:41.622Z","repository":{"id":65023733,"uuid":"580565770","full_name":"Unstructured-IO/unstructured-inference","owner":"Unstructured-IO","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-22T16:14:45.000Z","size":33238,"stargazers_count":117,"open_issues_count":30,"forks_count":32,"subscribers_count":18,"default_branch":"main","last_synced_at":"2024-05-22T16:15:06.446Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Unstructured-IO.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-20T21:54:01.000Z","updated_at":"2024-05-28T19:48:01.223Z","dependencies_parsed_at":"2024-01-15T11:10:53.182Z","dependency_job_id":"d6250271-c951-4228-a3cc-79b8a2b9a51a","html_url":"https://github.com/Unstructured-IO/unstructured-inference","commit_stats":{"total_commits":194,"total_committers":28,"mean_commits":6.928571428571429,"dds":0.711340206185567,"last_synced_commit":"ed5f2c22e9ac83141a99b00a8155e589d3644ccd"},"previous_names":[],"tags_count":103,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Funstructured-inference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Funstructured-inference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Funstructured-inference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Funstructured-inference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Unstructured-IO","download_url":"https://codeload.github.com/Unstructured-IO/unstructured-inference/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247157283,"owners_count":20893220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T23:29:59.989Z","updated_at":"2026-04-03T05:03:41.612Z","avatar_url":"https://github.com/Unstructured-IO.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch3 align=\"center\"\u003e\n  \u003cimg\n    src=\"https://raw.githubusercontent.com/Unstructured-IO/unstructured/main/img/unstructured_logo.png\"\n    height=\"200\"\n  \u003e\n\n\u003c/h3\u003e\n\n\u003ch3 align=\"center\"\u003e\n  \u003cp\u003eOpen-Source Pre-Processing Tools for Unstructured Data\u003c/p\u003e\n\u003c/h3\u003e\n\nThe `unstructured-inference` repo contains hosted model inference code for layout parsing models. \nThese models are invoked via API as part of the partitioning bricks in the `unstructured` package.\n\n**Requires Python 3.12+.**\n\n## Installation\n\n### Package\n\n```shell\npip install unstructured-inference\n```\n\n### Detectron2\n\n[Detectron2](https://github.com/facebookresearch/detectron2) is required for using models from the [layoutparser model zoo](#using-models-from-the-layoutparser-model-zoo) \nbut is not automatically installed with this package. \nFor MacOS and Linux, build from source with:\n```shell\npip install 'git+https://github.com/facebookresearch/detectron2.git@57bdb21249d5418c130d54e2ebdc94dda7a4c01a'\n```\nOther install options can be found in the \n[Detectron2 installation guide](https://detectron2.readthedocs.io/en/latest/tutorials/install.html).\n\nWindows is not officially supported by Detectron2, but some users are able to install it anyway. \nSee discussion [here](https://layout-parser.github.io/tutorials/installation#for-windows-users) for \ntips on installing Detectron2 on Windows.\n\n### Development Setup\n\nThis project uses [uv](https://docs.astral.sh/uv/) for dependency management.\n\n```shell\n# Clone and install all dependencies (including dev/test/lint groups)\ngit clone https://github.com/Unstructured-IO/unstructured-inference.git\ncd unstructured-inference\nmake install\n```\n\nRun `make help` for a full list of available targets.\n\n## Getting Started\n\nTo get started with the layout parsing model, use the following commands:\n\n```python\nfrom unstructured_inference.inference.layout import DocumentLayout\n\nlayout = DocumentLayout.from_file(\"sample-docs/loremipsum.pdf\")\n\nprint(layout.pages[0].elements)\n```\n\nOnce the model has detected the layout and OCR'd the document, the text extracted from the first \npage of the sample document will be displayed.\nYou can convert a given element to a `dict` by running the `.to_dict()` method.\n\n## Models\n\nThe inference pipeline operates by finding text elements in a document page using a detection model, then extracting the contents of the elements using direct extraction (if available), OCR, and optionally table inference models.\n\nWe offer several detection models including [Detectron2](https://github.com/facebookresearch/detectron2) and [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX).\n\n### Using a non-default model\n\nWhen doing inference, an alternate model can be used by passing the model object to the ingestion method via the `model` parameter. The `get_model` function can be used to construct one of our out-of-the-box models from a keyword, e.g.:\n```python\nfrom unstructured_inference.models.base import get_model\nfrom unstructured_inference.inference.layout import DocumentLayout\n\nmodel = get_model(\"yolox\")\nlayout = DocumentLayout.from_file(\"sample-docs/layout-parser-paper.pdf\", detection_model=model)\n```\n\n### Using your own model\n\nAny detection model can be used for in the `unstructured_inference` pipeline by wrapping the model in the `UnstructuredObjectDetectionModel` class. To integrate with the `DocumentLayout` class, a subclass of `UnstructuredObjectDetectionModel` must have a `predict` method that accepts a `PIL.Image.Image` and returns a list of `LayoutElement`s, and an `initialize` method, which loads the model and prepares it for inference.\n\n## Security Policy\n\nSee our [security policy](https://github.com/Unstructured-IO/unstructured-inference/security/policy) for\ninformation on how to report security vulnerabilities.\n\n## Learn more\n\n| Section | Description |\n|-|-|\n| [Unstructured Community Github](https://github.com/Unstructured-IO/community) | Information about Unstructured.io community projects  |\n| [Unstructured Github](https://github.com/Unstructured-IO) | Unstructured.io open source repositories |\n| [Company Website](https://unstructured.io) | Unstructured.io product and company info |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Funstructured-inference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funstructured-io%2Funstructured-inference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Funstructured-inference/lists"}