{"id":35777781,"url":"https://github.com/yfedoseev/pdf_oxide","last_synced_at":"2026-05-13T06:14:20.888Z","repository":{"id":322714579,"uuid":"1090617520","full_name":"yfedoseev/pdf_oxide","owner":"yfedoseev","description":"The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation \u0026 editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.","archived":false,"fork":false,"pushed_at":"2026-05-11T02:15:04.000Z","size":213085,"stargazers_count":731,"open_issues_count":41,"forks_count":78,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-11T02:17:53.917Z","etag":null,"topics":["data-extraction","document-processing","fast","image-extraction","llm","markdown","pdf","pdf-editor","pdf-generation","pdf-library","pdf-parser","pdf-to-markdown","pdf-to-text","pyo3","python","rag","rust","text-extraction"],"latest_commit_sha":null,"homepage":"https://oxide.fyi","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yfedoseev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE-APACHE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null},"funding":{"github":["yfedoseev"]}},"created_at":"2025-11-05T22:56:26.000Z","updated_at":"2026-05-11T01:35:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"5256f3ec-ab92-4f3a-99c2-d35b1a79f74b","html_url":"https://github.com/yfedoseev/pdf_oxide","commit_stats":null,"previous_names":["yfedoseev/pdf_oxide"],"tags_count":74,"template":false,"template_full_name":null,"purl":"pkg:github/yfedoseev/pdf_oxide","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yfedoseev%2Fpdf_oxide","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yfedoseev%2Fpdf_oxide/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yfedoseev%2Fpdf_oxide/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yfedoseev%2Fpdf_oxide/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yfedoseev","download_url":"https://codeload.github.com/yfedoseev/pdf_oxide/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yfedoseev%2Fpdf_oxide/sbom","scorecard":{"id":1246914,"data":{"date":"2026-05-03T19:47:00Z","repo":{"name":"github.com/yfedoseev/pdf_oxide","commit":"3bbd730f83ee0a9e0115557627a08fadb9b482c6"},"scorecard":{"version":"v5.3.0","commit":"c22063e786c11f9dd714d777a687ff7c4599b600"},"score":5.6,"checks":[{"name":"Security-Policy","score":4,"reason":"security policy file detected","details":["Info: security policy file detected: SECURITY.md:1","Warn: no linked content found","Info: Found disclosure, vulnerability, and/or timelines in security policy: SECURITY.md:1","Info: Found text in security policy: SECURITY.md:1"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#security-policy"}},{"name":"Code-Review","score":0,"reason":"Found 0/2 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#code-review"}},{"name":"Dependency-Update-Tool","score":10,"reason":"update tool detected","details":["Info: detected update tool: Dependabot: .github/dependabot.yml:1"],"documentation":{"short":"Determines if the project uses a dependency update tool.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#dependency-update-tool"}},{"name":"Maintained","score":10,"reason":"30 commit(s) and 29 issue activity found in the last 90 days -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Info: jobLevel 'actions' permission set to 'read': .github/workflows/codeql.yml:22","Info: jobLevel 'contents' permission set to 'read': .github/workflows/codeql.yml:20","Info: jobLevel 'contents' permission set to 'read': .github/workflows/python.yml:348","Warn: jobLevel 'contents' permission set to 'write': .github/workflows/release.yml:1018","Info: jobLevel 'contents' permission set to 'read': .github/workflows/release.yml:1073","Info: jobLevel 'contents' permission set to 'read': .github/workflows/release.yml:1105","Warn: jobLevel 'contents' permission set to 'write': .github/workflows/release.yml:513","Info: jobLevel 'contents' permission set to 'read': .github/workflows/release.yml:1133","Info: jobLevel 'contents' permission set to 'read': .github/workflows/release.yml:1259","Info: jobLevel 'contents' permission set to 'read': .github/workflows/scorecard.yml:20","Info: jobLevel 'actions' permission set to 'read': .github/workflows/scorecard.yml:21","Info: topLevel 'contents' permission set to 'read': .github/workflows/ci.yml:16","Info: topLevel 'contents' permission set to 'read': .github/workflows/codeql.yml:13","Info: topLevel 'contents' permission set to 'read': .github/workflows/outdated.yml:9","Warn: no topLevel permission defined: .github/workflows/python.yml:1","Info: topLevel 'contents' permission set to 'read': .github/workflows/release.yml:15","Info: topLevel 'contents' permission set to 'read': .github/workflows/scorecard.yml:11"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":6,"reason":"dependency not pinned by hash detected -- score normalized to 6","details":["Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:93: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:560: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:762: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/ci.yml:886: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:924: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:942: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:966: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:996: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:1005: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:1077: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:1110: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:247: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:326: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:361: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:1036: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/ci.yml:1053: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/ci.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:227: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:253: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:270: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:363: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:35: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/python.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/python.yml:201: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/python.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:1318: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:993: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:1120: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:740: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:1044: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:918: update your workflow using https://app.stepsecurity.io/secureworkflow/yfedoseev/pdf_oxide/release.yml/main?enable=pin","Warn: pipCommand not pinned by hash: scripts/setup_benchmark_env.sh:19","Warn: pipCommand not pinned by hash: scripts/setup_benchmark_env.sh:23","Warn: goCommand not pinned by hash: .github/workflows/ci.yml:571","Warn: npmCommand not pinned by hash: .github/workflows/ci.yml:644","Warn: npmCommand not pinned by hash: .github/workflows/ci.yml:715","Warn: pipCommand not pinned by hash: .github/workflows/python.yml:386","Warn: pipCommand not pinned by hash: .github/workflows/python.yml:393","Warn: pipCommand not pinned by hash: .github/workflows/python.yml:333","Warn: npmCommand not pinned by hash: .github/workflows/release.yml:698","Warn: npmCommand not pinned by hash: .github/workflows/release.yml:1303","Info: 114 out of 118 GitHub-owned GitHubAction dependencies pinned","Info:  51 out of  75 third-party GitHubAction dependencies pinned","Info:   1 out of   6 pipCommand dependencies pinned","Info:   0 out of   1 goCommand dependencies pinned","Info:   0 out of   4 npmCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":10,"reason":"SAST tool is run on all commits","details":["Info: SAST configuration detected: CodeQL","Info: all commits (30) are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#sast"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE-APACHE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE-APACHE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#license"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v0.3.42 not signed: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/316849109","Warn: release artifact v0.3.41 not signed: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/316452675","Warn: release artifact v0.3.40 not signed: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/315062764","Warn: release artifact v0.3.39 not signed: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/313840378","Warn: release artifact v0.3.38 not signed: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/312847451","Warn: release artifact v0.3.42 does not have provenance: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/316849109","Warn: release artifact v0.3.41 does not have provenance: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/316452675","Warn: release artifact v0.3.40 does not have provenance: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/315062764","Warn: release artifact v0.3.39 does not have provenance: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/313840378","Warn: release artifact v0.3.38 does not have provenance: https://api.github.com/repos/yfedoseev/pdf_oxide/releases/312847451"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#signed-releases"}},{"name":"Packaging","score":10,"reason":"packaging workflow detected","details":["Info: Project packages its releases by way of GitHub Actions.: .github/workflows/python.yml:342"],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#packaging"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#fuzzing"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"57 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: RUSTSEC-2024-0436","Warn: Project is vulnerable to: RUSTSEC-2023-0071","Warn: Project is vulnerable to: GHSA-f83h-ghpp-7wcc","Warn: Project is vulnerable to: GHSA-wf5f-4jwr-ppcp","Warn: Project is vulnerable to: GHSA-2q4j-m29v-hq73","Warn: Project is vulnerable to: GHSA-2rw7-x74f-jg35","Warn: Project is vulnerable to: GHSA-3crg-w4f6-42mx","Warn: Project is vulnerable to: GHSA-4f6g-68pf-7vhv","Warn: Project is vulnerable to: GHSA-4pxv-j86v-mhcw","Warn: Project is vulnerable to: GHSA-4xc4-762w-m6cg","Warn: Project is vulnerable to: GHSA-7gw9-cf7v-778f","Warn: Project is vulnerable to: GHSA-7hfw-26vp-jp8m","Warn: Project is vulnerable to: GHSA-87mj-5ggw-8qc3","Warn: Project is vulnerable to: GHSA-996q-pr4m-cvgq","Warn: Project is vulnerable to: GHSA-9m86-7pmv-2852","Warn: Project is vulnerable to: GHSA-9mvc-8737-8j8h","Warn: Project is vulnerable to: GHSA-f2v5-7jq9-h8cg","Warn: Project is vulnerable to: GHSA-hqmh-ppp3-xvm7","Warn: Project is vulnerable to: GHSA-jfx9-29x2-rv3j","Warn: Project is vulnerable to: GHSA-jj6c-8h6c-hppx","Warn: Project is vulnerable to: GHSA-m449-cwjh-6pw7","Warn: Project is vulnerable to: GHSA-qpxp-75px-xjcp","Warn: Project is vulnerable to: GHSA-vr63-x8vc-m265","Warn: Project is vulnerable to: GHSA-wgvp-vg3v-2xq3","Warn: Project is vulnerable to: GHSA-x284-j5p8-9c5p","Warn: Project is vulnerable to: GHSA-x7hp-r3qg-r3cj","Warn: Project is vulnerable to: GHSA-3r9x-f23j-gc73","Warn: Project is vulnerable to: GHSA-538c-55jv-c5g9","Warn: Project is vulnerable to: GHSA-cmw6-hcpp-c6jp","Warn: Project is vulnerable to: GHSA-hqmj-h5c6-369m","Warn: Project is vulnerable to: GHSA-p433-9wv8-28xj","Warn: Project is vulnerable to: GHSA-q56x-g2fj-4rj6","Warn: Project is vulnerable to: GHSA-3749-ghw9-m3mg","Warn: Project is vulnerable to: PYSEC-2025-41 / GHSA-53q9-r3pm-6pq6","Warn: Project is vulnerable to: PYSEC-2024-252 / GHSA-5pcm-hx3q-hm94","Warn: Project is vulnerable to: GHSA-887c-mr87-cxwp","Warn: Project is vulnerable to: PYSEC-2024-251 / GHSA-pg7h-5qx3-wjr3","Warn: Project is vulnerable to: PYSEC-2024-250","Warn: Project is vulnerable to: PYSEC-2024-259","Warn: Project is vulnerable to: GHSA-37mw-44qp-f5jm","Warn: Project is vulnerable to: GHSA-37q5-v5qm-c9v8","Warn: Project is vulnerable to: PYSEC-2023-300 / GHSA-3863-2447-669p","Warn: Project is vulnerable to: GHSA-4w7r-h757-3r74","Warn: Project is vulnerable to: GHSA-59p9-h35m-wg4g","Warn: Project is vulnerable to: GHSA-69w3-r845-3855","Warn: Project is vulnerable to: GHSA-6rvg-6v2m-4j46","Warn: Project is vulnerable to: GHSA-9356-575x-2w9m","Warn: Project is vulnerable to: GHSA-fpwr-67px-3qhx","Warn: Project is vulnerable to: PYSEC-2024-229 / GHSA-hxxf-235m-72v3","Warn: Project is vulnerable to: GHSA-jjph-296x-mrcr","Warn: Project is vulnerable to: GHSA-phhr-52qp-3mj4","Warn: Project is vulnerable to: GHSA-q2wp-rjmx-x6x9","Warn: Project is vulnerable to: PYSEC-2025-40 / GHSA-qq3j-4f4f-9583","Warn: Project is vulnerable to: PYSEC-2024-227 / GHSA-qxrp-vhvm-j765","Warn: Project is vulnerable to: GHSA-rcv9-qm8p-9p6j","Warn: Project is vulnerable to: PYSEC-2023-301 / GHSA-v68g-wm8c-6x7j","Warn: Project is vulnerable to: PYSEC-2024-228 / GHSA-wrfc-pvp9-mr9g"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#vulnerabilities"}},{"name":"CI-Tests","score":10,"reason":"2 out of 2 merged PRs checked by a CI test -- score normalized to 10","details":null,"documentation":{"short":"Determines if the project runs tests before pull requests are merged.","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#ci-tests"}},{"name":"Contributors","score":10,"reason":"project has 3 contributing companies or organizations -- score normalized to 10","details":["Info: found contributions from: BD2KGenomics, asoem, inovex"],"documentation":{"short":"Determines if the project has a set of contributors from multiple organizations (e.g., companies).","url":"https://github.com/ossf/scorecard/blob/c22063e786c11f9dd714d777a687ff7c4599b600/docs/checks.md#contributors"}}]},"last_synced_at":"2026-05-04T01:29:58.769Z","repository_id":322714579,"created_at":"2026-05-04T01:29:58.769Z","updated_at":"2026-05-04T01:29:58.769Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32952804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-12T09:19:52.626Z","status":"ssl_error","status_checked_at":"2026-05-12T09:17:33.438Z","response_time":102,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-extraction","document-processing","fast","image-extraction","llm","markdown","pdf","pdf-editor","pdf-generation","pdf-library","pdf-parser","pdf-to-markdown","pdf-to-text","pyo3","python","rag","rust","text-extraction"],"created_at":"2026-01-07T05:27:24.779Z","updated_at":"2026-05-13T06:14:20.856Z","avatar_url":"https://github.com/yfedoseev.png","language":"Rust","funding_links":["https://github.com/sponsors/yfedoseev"],"categories":["Specific Formats Processing","File Format Processing","Libraries","llm","Rust","\u003ca name=\"Rust\"\u003e\u003c/a\u003eRust"],"sub_categories":["Graphics"],"readme":"# PDF Oxide - The Fastest PDF Toolkit for Python, Rust, Go, JS/TS, C#, WASM, CLI \u0026 AI\n\n\u003e **More language bindings coming in May 2026.** Java, Ruby, PHP, Swift, and Kotlin are on the roadmap. Want another language? [Open an issue](https://github.com/yfedoseev/pdf_oxide/issues/new) and tell us.\n\nThe fastest PDF library for text extraction, image extraction, and markdown conversion. Rust core with bindings for Python, Go, JavaScript / TypeScript, C# / .NET, and WASM, plus a CLI tool and MCP server for AI assistants. 0.8ms mean per document, 5× faster than PyMuPDF, 15× faster than pypdf. 100% pass rate on 3,830 real-world PDFs. MIT licensed.\n\n[![Crates.io](https://img.shields.io/crates/v/pdf_oxide.svg)](https://crates.io/crates/pdf_oxide)\n[![PyPI](https://img.shields.io/pypi/v/pdf_oxide.svg)](https://pypi.org/project/pdf_oxide/)\n[![PyPI Downloads](https://img.shields.io/pypi/dm/pdf-oxide)](https://pypi.org/project/pdf-oxide/)\n[![npm](https://img.shields.io/npm/v/pdf-oxide-wasm)](https://www.npmjs.com/package/pdf-oxide-wasm)\n[![Documentation](https://docs.rs/pdf_oxide/badge.svg)](https://docs.rs/pdf_oxide)\n[![Build Status](https://github.com/yfedoseev/pdf_oxide/workflows/CI/badge.svg)](https://github.com/yfedoseev/pdf_oxide/actions)\n[![License: MIT OR Apache-2.0](https://img.shields.io/badge/License-MIT%20OR%20Apache--2.0-blue.svg)](https://opensource.org/licenses)\n\u003c!-- [![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/yfedoseev/pdf_oxide/badge)](https://scorecard.dev/viewer/?uri=github.com/yfedoseev/pdf_oxide) --\u003e\n\u003c!-- [![OpenSSF Best Practices](https://www.bestpractices.dev/projects/NNNN/badge)](https://www.bestpractices.dev/projects/NNNN) --\u003e\n\n\u003e **New in v0.3.24 — now available in Go, JavaScript / TypeScript, and C# / .NET**, alongside the existing Python, Rust, and WASM bindings.\n\u003e Same Rust core, same 0.8 ms extraction speed, same 100% pass rate.\n\u003e See the language guides: [Python](python/README.md) · [Go](go/README.md) · [JavaScript / TypeScript](js/README.md) · [C# / .NET](csharp/README.md) · [WASM](wasm-pkg/README.md)\n\n## Quick Start\n\n### Python\n```python\nfrom pdf_oxide import PdfDocument\n\n# path can be str or pathlib.Path; use with for scoped access\ndoc = PdfDocument(\"paper.pdf\")\n# or: with PdfDocument(\"paper.pdf\") as doc: ...\ntext = doc.extract_text(0)\nchars = doc.extract_chars(0)\nmarkdown = doc.to_markdown(0, detect_headings=True)\n```\n\n```bash\npip install pdf_oxide\n```\n\n### Rust\n```rust\nuse pdf_oxide::PdfDocument;\n\nlet mut doc = PdfDocument::open(\"paper.pdf\")?;\nlet text = doc.extract_text(0)?;\nlet images = doc.extract_images(0)?;\nlet markdown = doc.to_markdown(0, Default::default())?;\n```\n\n```toml\n[dependencies]\npdf_oxide = \"0.3\"\n```\n\n### CLI\n```bash\npdf-oxide text document.pdf\npdf-oxide markdown document.pdf -o output.md\npdf-oxide search document.pdf \"pattern\"\npdf-oxide merge a.pdf b.pdf -o combined.pdf\n```\n\n```bash\nbrew install yfedoseev/tap/pdf-oxide\n```\n\n### MCP Server (for AI assistants)\n```bash\n# Install\nbrew install yfedoseev/tap/pdf-oxide   # includes pdf-oxide-mcp\n\n# Configure in Claude Desktop / Claude Code / Cursor\n{\n  \"mcpServers\": {\n    \"pdf-oxide\": { \"command\": \"crgx\", \"args\": [\"pdf_oxide_mcp@latest\"] }\n  }\n}\n```\n\n## Why pdf_oxide?\n\n- **Fast** — 0.8ms mean per document, 5× faster than PyMuPDF, 15× faster than pypdf, 29× faster than pdfplumber\n- **Reliable** — 100% pass rate on 3,830 test PDFs, zero panics, zero timeouts\n- **Complete** — Text extraction, image extraction, PDF creation, and editing in one library\n- **Multi-platform** — Rust, Python, Go, JavaScript/TypeScript, C#/.NET, WASM, CLI, and MCP server for AI assistants\n- **Permissive license** — MIT / Apache-2.0 — use freely in commercial and open-source projects\n\n## Performance\n\nBenchmarked on 3,830 PDFs from three independent public test suites (veraPDF, Mozilla pdf.js, DARPA SafeDocs). Text extraction libraries only (no OCR). Single-thread, 60s timeout, no warm-up.\n\n### Python Libraries\n\n| Library | Mean | p99 | Pass Rate | License |\n|---------|------|-----|-----------|---------|\n| **PDF Oxide** | **0.8ms** | **9ms** | **100%** | **MIT** |\n| PyMuPDF | 4.6ms | 28ms | 99.3% | AGPL-3.0 |\n| pypdfium2 | 4.1ms | 42ms | 99.2% | Apache-2.0 |\n| pymupdf4llm | 55.5ms | 280ms | 99.1% | AGPL-3.0 |\n| pdftext | 7.3ms | 82ms | 99.0% | GPL-3.0 |\n| pdfminer | 16.8ms | 124ms | 98.8% | MIT |\n| pdfplumber | 23.2ms | 189ms | 98.8% | MIT |\n| markitdown | 108.8ms | 378ms | 98.6% | MIT |\n| pypdf | 12.1ms | 97ms | 98.4% | BSD-3 |\n\n### Rust Libraries\n\n| Library | Mean | p99 | Pass Rate | Text Extraction |\n|---------|------|-----|-----------|-----------------|\n| **PDF Oxide** | **0.8ms** | **9ms** | **100%** | **Built-in** |\n| oxidize_pdf | 13.5ms | 11ms | 99.1% | Basic |\n| unpdf | 2.8ms | 10ms | 95.1% | Basic |\n| pdf_extract | 4.08ms | 37ms | 91.5% | Basic |\n| lopdf | 0.3ms | 2ms | 80.2% | No built-in extraction |\n\n### Text Quality\n\n99.5% text parity vs PyMuPDF and pypdfium2 across the full corpus. PDF Oxide extracts text from 7–10× more \"hard\" files than it misses vs any competitor.\n\n### Corpus\n\n| Suite | PDFs | Pass Rate |\n|-------|-----:|----------:|\n| [veraPDF](https://github.com/veraPDF/veraPDF-corpus) (PDF/A compliance) | 2,907 | 100% |\n| [Mozilla pdf.js](https://github.com/mozilla/pdf.js/tree/master/test/pdfs) | 897 | 99.2% |\n| [SafeDocs](https://github.com/pdf-association/safedocs) (targeted edge cases) | 26 | 100% |\n| **Total** | **3,830** | **100%** |\n\n100% pass rate on all valid PDFs — the 7 non-passing files across the corpus are intentionally broken test fixtures (missing PDF header, fuzz-corrupted catalogs, invalid xref streams).\n\n## Features\n\n| Extract | Create | Edit |\n|---------|--------|------|\n| Text \u0026 Layout | Documents | Annotations |\n| Images | Tables | Form Fields |\n| Forms | Graphics | Bookmarks |\n| Annotations | Templates | Links |\n| Bookmarks | Images | Content |\n\n## Python API\n\n```python\nfrom pdf_oxide import PdfDocument\n\n# Path can be str or pathlib.Path; use \"with PdfDocument(...) as doc\" for context manager\ndoc = PdfDocument(\"report.pdf\")\nprint(f\"Pages: {doc.page_count()}\")\nprint(f\"Version: {doc.version()}\")\n\n# 1. Scoped extraction (v0.3.14)\n# Extract only from a specific area: (x, y, width, height)\nheader = doc.within(0, (0, 700, 612, 92)).extract_text()\n\n# 2. Word-level extraction (v0.3.14)\nwords = doc.extract_words(0)\nfor w in words:\n    print(f\"{w.text} at {w.bbox}\")\n    # Access individual characters in the word\n    # print(w.chars[0].font_name)\n\n# Optional: override the adaptive word gap threshold (in PDF points)\nwords = doc.extract_words(0, word_gap_threshold=2.5)\n\n# 3. Line-level extraction (v0.3.14)\nlines = doc.extract_text_lines(0)\nfor line in lines:\n    print(f\"Line: {line.text}\")\n\n# Optional: override word and/or line gap thresholds (in PDF points)\nlines = doc.extract_text_lines(0, word_gap_threshold=2.5, line_gap_threshold=4.0)\n\n# Inspect the adaptive thresholds before overriding\nparams = doc.page_layout_params(0)\nprint(f\"word gap: {params.word_gap_threshold:.1f}, line gap: {params.line_gap_threshold:.1f}\")\n\n# Use a pre-tuned extraction profile for specific document types\nfrom pdf_oxide import ExtractionProfile\nwords = doc.extract_words(0, profile=ExtractionProfile.form())\nlines = doc.extract_text_lines(0, profile=ExtractionProfile.academic())\n\n# 4. Table extraction (v0.3.14)\ntables = doc.extract_tables(0)\nfor table in tables:\n    print(f\"Table with {table.row_count} rows\")\n\n# 5. Traditional extraction\ntext = doc.extract_text(0)\nchars = doc.extract_chars(0)\n```\n\n### Form Fields\n\n```python\n# Extract form fields\nfields = doc.get_form_fields()\nfor f in fields:\n    print(f\"{f.name} ({f.field_type}) = {f.value}\")\n\n# Fill and save\ndoc.set_form_field_value(\"employee_name\", \"Jane Doe\")\ndoc.set_form_field_value(\"wages\", \"85000.00\")\ndoc.save(\"filled.pdf\")\n```\n\n## Rust API\n\n```rust\nuse pdf_oxide::PdfDocument;\n\nfn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let mut doc = PdfDocument::open(\"paper.pdf\")?;\n\n    // Extract text\n    let text = doc.extract_text(0)?;\n\n    // Character-level extraction\n    let chars = doc.extract_chars(0)?;\n\n    // Extract images\n    let images = doc.extract_images(0)?;\n\n    // Vector graphics\n    let paths = doc.extract_paths(0)?;\n\n    Ok(())\n}\n```\n\n### Form Fields (Rust)\n\n```rust\nuse pdf_oxide::editor::{DocumentEditor, EditableDocument, SaveOptions};\nuse pdf_oxide::editor::form_fields::FormFieldValue;\n\nlet mut editor = DocumentEditor::open(\"w2.pdf\")?;\neditor.set_form_field_value(\"employee_name\", FormFieldValue::Text(\"Jane Doe\".into()))?;\neditor.save_with_options(\"filled.pdf\", SaveOptions::incremental())?;\n```\n\n## Installation\n\n### Python\n\n```bash\npip install pdf_oxide\n```\n\nWheels available for Linux, macOS, and Windows. Python 3.8–3.14.\n\n### Rust\n\n```toml\n[dependencies]\npdf_oxide = \"0.3\"\n```\n\n### JavaScript/WASM\n\n```bash\nnpm install pdf-oxide-wasm\n```\n\n```javascript\nconst { WasmPdfDocument } = require(\"pdf-oxide-wasm\");\n```\n\n### CLI\n\n```bash\nbrew install yfedoseev/tap/pdf-oxide    # Homebrew (macOS/Linux)\ncargo install pdf_oxide_cli             # Cargo\ncargo binstall pdf_oxide_cli            # Pre-built binary via cargo-binstall\n```\n\n### MCP Server\n\n```bash\nbrew install yfedoseev/tap/pdf-oxide    # Included with CLI in Homebrew\ncargo install pdf_oxide_mcp             # Cargo\n```\n\n### Other languages\n\n- **Go** — `go get github.com/yfedoseev/pdf_oxide/go` — see [go/README.md](go/README.md)\n- **JavaScript / TypeScript (Node.js)** — `npm install pdf-oxide` — see [js/README.md](js/README.md)\n- **C# / .NET** — `dotnet add package PdfOxide` — see [csharp/README.md](csharp/README.md)\n\nAll three share the same Rust core as the Python and WASM bindings, so everything you read in this README applies to them as well — just with each language's native naming conventions.\n\n## CLI\n\n22 commands for PDF processing directly from your terminal:\n\n```bash\npdf-oxide text report.pdf                      # Extract text\npdf-oxide markdown report.pdf -o report.md     # Convert to Markdown\npdf-oxide html report.pdf -o report.html       # Convert to HTML\npdf-oxide info report.pdf                      # Show metadata\npdf-oxide search report.pdf \"neural.?network\"  # Search (regex)\npdf-oxide images report.pdf -o ./images/       # Extract images\npdf-oxide merge a.pdf b.pdf -o combined.pdf    # Merge PDFs\npdf-oxide split report.pdf -o ./pages/         # Split into pages\npdf-oxide watermark doc.pdf \"DRAFT\"            # Add watermark\npdf-oxide forms w2.pdf --fill \"name=Jane\"      # Fill form fields\n```\n\nRun `pdf-oxide` with no arguments for interactive REPL mode. Use `--pages 1-5` to process specific pages, `--json` for machine-readable output.\n\n## MCP Server\n\n`pdf-oxide-mcp` lets AI assistants (Claude, Cursor, etc.) extract content from PDFs locally via the [Model Context Protocol](https://modelcontextprotocol.io/).\n\nAdd to your MCP client configuration:\n\n```json\n{\n  \"mcpServers\": {\n    \"pdf-oxide\": { \"command\": \"crgx\", \"args\": [\"pdf_oxide_mcp@latest\"] }\n  }\n}\n```\n\nThe server exposes an `extract` tool that supports text, markdown, and HTML output formats with optional page ranges and image extraction. All processing runs locally — no files leave your machine.\n\n## Building from Source\n\n```bash\n# Clone and build\ngit clone https://github.com/yfedoseev/pdf_oxide\ncd pdf_oxide\ncargo build --release\n\n# Run tests\ncargo test\n\n# Build Python bindings\nmaturin develop\n\n# Build the shared library for Go, JS/TS, and C# bindings\ncargo build --release --lib\n# Output: target/release/libpdf_oxide.{so,dylib} or pdf_oxide.dll\n```\n\n## Documentation\n\n- **[Full Documentation](https://pdf.oxide.fyi)** — Complete documentation site\n- **[Getting Started (Rust)](docs/getting-started-rust.md)** — Rust guide\n- **[Getting Started (Python)](docs/getting-started-python.md)** — Python guide\n- **[Getting Started (Go)](go/README.md)** — Go guide\n- **[Getting Started (JavaScript / TypeScript)](js/README.md)** — Node.js guide\n- **[Getting Started (C# / .NET)](csharp/README.md)** — .NET guide\n- **[Getting Started (WASM)](docs/getting-started-wasm.md)** — Browser and Node.js WASM guide\n- **[API Docs](https://docs.rs/pdf_oxide)** — Full Rust API reference\n- **[Performance Benchmarks](https://pdf.oxide.fyi/docs/performance)** — Full benchmark methodology and results\n\n## Use Cases\n\n- **RAG / LLM pipelines** — Convert PDFs to clean Markdown for retrieval-augmented generation with LangChain, LlamaIndex, or any framework\n- **AI assistants** — Give Claude, Cursor, or any MCP-compatible tool direct PDF access via the MCP server\n- **Document processing at scale** — Extract text, images, and metadata from thousands of PDFs in seconds\n- **Data extraction** — Pull structured data from forms, tables, and layouts\n- **Academic research** — Parse papers, extract citations, and process large corpora\n- **PDF generation** — Create invoices, reports, certificates, and templated documents programmatically\n- **PyMuPDF alternative** — MIT licensed, 5× faster, no AGPL restrictions\n\n## Why I built this\n\nI needed PyMuPDF's speed without its AGPL license, and I needed it in more than one language. Nothing existed that ticked all three boxes — fast, MIT, multi-language — so I wrote it. The Rust core is what does the real work; the bindings for Python, Go, JS/TS, C#, and WASM are thin shells around the same code, so a bug fix in one lands in all of them. It now passes 100% of the veraPDF + Mozilla pdf.js + DARPA SafeDocs test corpora (3,830 PDFs) on every platform I've tested.\n\nIf it's useful to you, a star on GitHub genuinely helps. If something's broken or missing, [open an issue](https://github.com/yfedoseev/pdf_oxide/issues) — I read all of them.\n\n— Yury\n\n## License\n\nDual-licensed under [MIT](LICENSE-MIT) or [Apache-2.0](LICENSE-APACHE) at your option. Unlike AGPL-licensed alternatives, pdf_oxide can be used freely in any project — commercial or open-source — with no copyleft restrictions.\n\n## Contributing\n\nWe welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\n```bash\ncargo build \u0026\u0026 cargo test \u0026\u0026 cargo fmt \u0026\u0026 cargo clippy -- -D warnings\n```\n\n## Citation\n\n```bibtex\n@software{pdf_oxide,\n  title = {PDF Oxide: Fast PDF Toolkit for Rust, Python, Go, JavaScript, and C#},\n  author = {Yury Fedoseev},\n  year = {2025},\n  url = {https://github.com/yfedoseev/pdf_oxide}\n}\n```\n\n---\n\n**Rust** + **Python** + **Go** + **JS/TS** + **C#** + **WASM** + **CLI** + **MCP** | MIT/Apache-2.0 | 100% pass rate on 3,830 PDFs | 0.8ms mean | 5× faster than the industry leaders\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyfedoseev%2Fpdf_oxide","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyfedoseev%2Fpdf_oxide","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyfedoseev%2Fpdf_oxide/lists"}