{"id":48503999,"url":"https://github.com/bootlin/pdf-link-checker","last_synced_at":"2026-04-07T15:37:22.083Z","repository":{"id":37481083,"uuid":"170637485","full_name":"bootlin/pdf-link-checker","owner":"bootlin","description":"Checks for broken hyperlinks in PDF documents","archived":false,"fork":false,"pushed_at":"2023-11-15T17:12:50.000Z","size":90,"stargazers_count":19,"open_issues_count":7,"forks_count":5,"subscribers_count":8,"default_branch":"master","last_synced_at":"2023-12-17T11:36:11.422Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://bootlin.com/blog/pdf-link-checker/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bootlin.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-02-14T06:12:30.000Z","updated_at":"2023-08-29T13:30:26.000Z","dependencies_parsed_at":"2023-11-15T11:46:01.667Z","dependency_job_id":null,"html_url":"https://github.com/bootlin/pdf-link-checker","commit_stats":null,"previous_names":[],"tags_count":1,"template":null,"template_full_name":null,"purl":"pkg:github/bootlin/pdf-link-checker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bootlin%2Fpdf-link-checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bootlin%2Fpdf-link-checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bootlin%2Fpdf-link-checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bootlin%2Fpdf-link-checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bootlin","download_url":"https://codeload.github.com/bootlin/pdf-link-checker/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bootlin%2Fpdf-link-checker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31518632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-07T15:37:21.428Z","updated_at":"2026-04-07T15:37:22.071Z","avatar_url":"https://github.com/bootlin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"================\npdf-link-checker\n================\n**pdf-link-checker** is a simple tool that parses a PDF document and checks for\nbroken hyperlinks. This done by sending a simple HTTP request to each link\nfound in a given document.\n\nGetting it running\n==================\n\n::\n\n    pip install git+https://github.com/bootlin/pdf-link-checker.git\n    export PATH=$HOME/.local/bin:$PATH\n    pdf-link-checker my-awesome-slides.pdf\n\nOptions\n=======\n\n* --max-threads\n\n  Specifies the maximum number of allowed threads (default: 100).\n\n  To speedup the run, pdf-link-checker will launch several threads\n  in order to check several links in parallel.\n  This option allows to set a limit to the number of threads.\n\n* --max-requests-per-host\n\n  Specifies the maximum number of allowed requests per host.\n\n  Some URLs may belong to the same host, and since pdf-link-checker\n  can check many URLs at the same time, we may want to set a limit\n  to the number of requests per host.\n  Otherwise, some hosts may confuse the check with a DoS attack.\n\nGetting help\n============\n\nYou can get support by reporting your issue on this project\non GitHub: https://github.com/bootlin/pdf-link-checker/issues\n\nTODO\n====\n\n*(...because there's no active project without a TODO list!)*\n\n* Fix: some documents are failing on doc.initialize().\n\n* Fix: if the URL is a huge document, we should just check and not\n  download it entirely.\n\n* Replace the thread array into a nice thread pool.\n  Each thread from the pool should take an URL from a (protected) queue.\n  We could also have one queue per host and thus handle the\n  max-requests-per-host constraint without a separate parameter.\n\nVersion History\n===============\n\n1.2.0\n  * Repair breakage against newer versions of pdfminer\n\n1.1.1\n  * Remove extra print, just a leftover\n\n1.1.0\n  * Only allow https and ftp URIs. This prevents from failing on mailto:\n    and file:// URIs.\n  * Add better exception handling to avoid crashing\n  * Add better timeout and request exception handling\n  * Fix broken thread management\n  * Remove stupid double-requests\n  * Several small fixes\n\n1.0.2\n  * Updated repo location\n  * Moved from distutils to setuptools\n\n1.0.1\n  * Version bump\n\n1.0\n  * Initial release\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbootlin%2Fpdf-link-checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbootlin%2Fpdf-link-checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbootlin%2Fpdf-link-checker/lists"}