{"id":17287612,"url":"https://github.com/dnlbauer/pdfact-service","last_synced_at":"2026-02-20T00:01:59.821Z","repository":{"id":63849594,"uuid":"571159669","full_name":"dnlbauer/pdfact-service","owner":"dnlbauer","description":"Analyze pdf files with pdfact using a simple web API","archived":false,"fork":false,"pushed_at":"2024-11-18T19:11:02.000Z","size":38,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-07T23:59:29.564Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dnlbauer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-11-27T11:15:23.000Z","updated_at":"2023-04-23T22:42:34.000Z","dependencies_parsed_at":"2025-10-19T08:30:04.393Z","dependency_job_id":null,"html_url":"https://github.com/dnlbauer/pdfact-service","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dnlbauer/pdfact-service","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnlbauer%2Fpdfact-service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnlbauer%2Fpdfact-service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnlbauer%2Fpdfact-service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnlbauer%2Fpdfact-service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dnlbauer","download_url":"https://codeload.github.com/dnlbauer/pdfact-service/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dnlbauer%2Fpdfact-service/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29637400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T22:32:43.237Z","status":"ssl_error","status_checked_at":"2026-02-19T22:32:38.330Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-15T10:03:48.941Z","updated_at":"2026-02-20T00:01:59.806Z","avatar_url":"https://github.com/dnlbauer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pdfact-service\n[![License](https://img.shields.io/github/license/dnlbauer/pdfact-service)](./LICENSE) \n![actions](https://github.com/dnlbauer/pdfact-service/actions/workflows/publish-docker.yml/badge.svg?branch=main) \n[![Docker](https://img.shields.io/docker/pulls/dnlbauer/pdfact-service)](https://hub.docker.com/r/dnlbauer/pdfact-service/tags) \n\nA Webservice to analyze the content of PDF Documents using a HTTP API. This is a simple HTTP wrapper around [ad-freiburg/pdfact](https://github.com/ad-freiburg/pdfact) and builds the container image directly from their source.\n\n## Usage\nStart the service:\n```bash\n\u003e docker run -p 80:80 dnlbauer/pdfact-service\n\n[2022-11-27 12:36:38 +0000] [1] [INFO] Starting gunicorn 20.1.0\n[2022-11-27 12:36:38 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)\n[2022-11-27 12:36:38 +0000] [1] [INFO] Using worker: gthread\n[2022-11-27 12:36:38 +0000] [7] [INFO] Booting worker with pid: 7\n```\n\nPDFs can be `POST`ed to `/analyze` as multipart file request. The response will contain the output of `pdfact`. The response format can be specified using the correct MIME `Accept` header; `pdfact` van provide json, xml and plain text as output format.\n\n```bash\n\u003e curl -H \"Accept: application/json\" -F file=@testfile.pdf localhost:80/analyze\n\n{\"paragraphs\": [\n  {\"paragraph\": {\n    \"role\": \"page-header\",\n    \"positions\": [{\n      \"minY\": 642.6,\n      \"minX\": 210.3,\n      \"maxY\": 652.6,\n      \"maxX\": 401.6,\n      \"page\": 1\n    }],\n    \"text\": \"This is a test PDF document.\"\n  }},\n  {\"paragraph\": {\n    \"role\": \"body\",\n    \"positions\": [{\n      \"minY\": 628.5,\n      \"minX\": 91.2,\n      \"maxY\": 636.6,\n      \"maxX\": 520.8,\n      \"page\": 1\n    }],\n    \"text\": \"If you can read this, you are lucky.\"\n  }}\n]}\n```\n\nSupported cli arguments (`--units`, `--roles`) can be supplied as http parameters. Example:\n```bash\n\u003e curl ... localhost:80/analyze?roles=body\u0026units=words\n```\n\n## Thanks\nAll credits go to the Algorithms and Data Structures Group from\nUniversity of Freiburg for [ad-freiburg/pdfact](https://github.com/ad-freiburg/pdfact).\n\n## License\nPublished under CC0. Do whatever you want :-)\n\n*(but also check the license of pdfact if you are going to use the image as is).*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdnlbauer%2Fpdfact-service","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdnlbauer%2Fpdfact-service","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdnlbauer%2Fpdfact-service/lists"}