{"id":18369601,"url":"https://github.com/unstructured-io/pipeline-document-layout","last_synced_at":"2025-08-02T21:33:59.760Z","repository":{"id":152499082,"uuid":"569890652","full_name":"Unstructured-IO/pipeline-document-layout","owner":"Unstructured-IO","description":"Pipeline for layout extraction ","archived":false,"fork":false,"pushed_at":"2023-07-03T05:42:04.000Z","size":1673,"stargazers_count":1,"open_issues_count":2,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-02-15T20:56:32.407Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Unstructured-IO.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-23T21:16:53.000Z","updated_at":"2023-11-30T18:50:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"10a68fca-a2ec-4088-a76a-2a9a22d664da","html_url":"https://github.com/Unstructured-IO/pipeline-document-layout","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-document-layout","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-document-layout/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-document-layout/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Unstructured-IO%2Fpipeline-document-layout/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Unstructured-IO","download_url":"https://codeload.github.com/Unstructured-IO/pipeline-document-layout/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248281424,"owners_count":21077423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T23:29:55.097Z","updated_at":"2025-04-10T19:43:50.584Z","avatar_url":"https://github.com/Unstructured-IO.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch3 align=\"center\"\u003e\n  \u003cimg src=\"img/unstructured_logo.png\" height=\"200\"\u003e\n\u003c/h3\u003e\n\n\u003ch3 align=\"center\"\u003e\n  \u003cp\u003ePre-Processing Pipeline for Layout Detection\u003c/p\u003e\n\u003c/h3\u003e\n\n\nThe description for the pipeline repository goes here.\nThe API is hosted at `https://api.unstructured.io`.\n\n## Developer Quick Start\n\n* Using `pyenv` to manage virtualenv's is recommended\n\t* Mac install instructions. See [here](https://github.com/Unstructured-IO/community#mac--homebrew) for more detailed instructions.\n\t\t* `brew install pyenv-virtualenv`\n\t  * `pyenv install 3.8.15`\n  * Linux instructions are available [here](https://github.com/Unstructured-IO/community#linux).\n\n  * Create a virtualenv to work in and activate it, e.g. for one named `document_layout`:\n\n\t`pyenv  virtualenv 3.8.15 document_layout` \u003cbr /\u003e\n\t`pyenv activate document_layout`\n\n* Run `make install`\n* Run `pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.4#egg=detectron2'`\n* Start a local jupyter notebook server with `make run-jupyter` \u003cbr /\u003e\n\t**OR** \u003cbr /\u003e\n\tjust start the fast-API locally with `make run-web-app`\n\n#### Extracting whatever from some type of document\n\nFor example:\n```\ncurl -X 'POST' \\\n  'http://localhost:8000/document-layout/v1.0.0/layout' \\\n  -H 'accept: application/json' \\\n  -H 'Content-Type: multipart/form-data' \\\n  -F 'files=@sample-docs/example.png' -F 'model_type=yolox'| jq -C . | less -R\n```\n\nWhere `files` includes the file to process, `model_type` can be 'default' (or blank) or 'yolox',\nalso is possible to use `force_ocr` to auto in order to try text extraction from your file, or\n'true', in which case OCR will be used.\n\n### Generating Python files from the pipeline notebooks\n\nYou can generate the FastAPI APIs from your pipeline notebooks by running `make generate-api`.\n\n## Security Policy\n\nSee our [security policy](https://github.com/Unstructured-IO/pipeline-document_layout/security/policy) for\ninformation on how to report security vulnerabilities.\n\n## Learn more\n\n| Section | Description |\n|-|-|\n| [Unstructured Community Github](https://github.com/Unstructured-IO/community) | Information about Unstructured.io community projects  |\n| [Unstructured Github](https://github.com/Unstructured-IO) | Unstructured.io open source repositories |\n| [Company Website](https://unstructured.io) | Unstructured.io product and company info |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Fpipeline-document-layout","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funstructured-io%2Fpipeline-document-layout","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funstructured-io%2Fpipeline-document-layout/lists"}