{"id":13581552,"url":"https://github.com/sajari/docconv","last_synced_at":"2026-03-17T14:13:54.745Z","repository":{"id":11549956,"uuid":"14035585","full_name":"sajari/docconv","owner":"sajari","description":"Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text","archived":false,"fork":false,"pushed_at":"2024-07-01T12:41:17.000Z","size":1702,"stargazers_count":1771,"open_issues_count":34,"forks_count":245,"subscribers_count":39,"default_branch":"master","last_synced_at":"2026-02-27T06:04:19.030Z","etag":null,"topics":["conversion","docs","docx","go","html","pdf","pdf-converter","rtf","rtf-files","word","xml"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sajari.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-11-01T04:27:38.000Z","updated_at":"2026-02-11T08:06:40.000Z","dependencies_parsed_at":"2023-01-11T20:17:00.714Z","dependency_job_id":"29cae880-5f99-40d4-9190-5911408dd578","html_url":"https://github.com/sajari/docconv","commit_stats":{"total_commits":193,"total_committers":24,"mean_commits":8.041666666666666,"dds":0.6632124352331606,"last_synced_commit":"785a29a00de4b976c379fd38299c220307220684"},"previous_names":["sajari/sajari-convert"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/sajari/docconv","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Fdocconv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Fdocconv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Fdocconv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Fdocconv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sajari","download_url":"https://codeload.github.com/sajari/docconv/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Fdocconv/sbom","scorecard":{"id":795939,"data":{"date":"2025-08-11","repo":{"name":"github.com/sajari/docconv","commit":"785a29a00de4b976c379fd38299c220307220684"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.6,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":4,"reason":"Found 11/27 approved changesets -- score normalized to 4","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/docd.yml:1","Warn: no topLevel permission defined: .github/workflows/go.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/docd.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/docd.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docd.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/docd.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docd.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/docd.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docd.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/docd.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docd.yml:25: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/docd.yml/master?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/docd.yml:37: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/docd.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/go.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/go.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/go.yml:22: update your workflow using https://app.stepsecurity.io/secureworkflow/sajari/docconv/go.yml/master?enable=pin","Warn: containerImage not pinned by hash: docd/Dockerfile:1","Warn: containerImage not pinned by hash: docd/Dockerfile:9","Warn: containerImage not pinned by hash: docd/appengine/Dockerfile:4: pin your Docker image by updating sajari/docd:1.3.8 to sajari/docd:1.3.8@sha256:5a0a75f75d465d3a9ebb9556db35e24452595039957af19708715101ad76212f","Info:   0 out of   3 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   5 third-party GitHubAction dependencies pinned","Info:   0 out of   3 containerImage dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 29 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":4,"reason":"6 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GO-2024-2687 / GHSA-4v7x-pqxf-cx7m","Warn: Project is vulnerable to: GO-2024-3333","Warn: Project is vulnerable to: GO-2025-3503 / GHSA-qxp5-gwg8-xv66","Warn: Project is vulnerable to: GO-2025-3595 / GHSA-vvgc-356p-c3xw","Warn: Project is vulnerable to: GO-2025-3488 / GHSA-6v2p-p543-phr9","Warn: Project is vulnerable to: GO-2024-2611 / GHSA-8r3f-844c-mc37"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-23T08:57:24.164Z","repository_id":11549956,"created_at":"2025-08-23T08:57:24.165Z","updated_at":"2025-08-23T08:57:24.165Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30625802,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T11:26:08.186Z","status":"ssl_error","status_checked_at":"2026-03-17T11:24:37.311Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversion","docs","docx","go","html","pdf","pdf-converter","rtf","rtf-files","word","xml"],"created_at":"2024-08-01T15:02:05.442Z","updated_at":"2026-03-17T14:13:54.701Z","avatar_url":"https://github.com/sajari.png","language":"Go","readme":"# docconv\n\n[![Go reference](https://pkg.go.dev/badge/code.sajari.com/docconv/v2.svg)](https://pkg.go.dev/code.sajari.com/docconv/v2)\n[![Build status](https://github.com/sajari/docconv/workflows/Go/badge.svg?branch=master)](https://github.com/sajari/docconv/actions)\n[![Report card](https://goreportcard.com/badge/code.sajari.com/docconv/v2)](https://goreportcard.com/report/code.sajari.com/docconv/v2)\n[![Sourcegraph](https://sourcegraph.com/github.com/sajari/docconv/v2/-/badge.svg)](https://sourcegraph.com/github.com/sajari/docconv/v2)\n\nA Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text.\n\n## Installation\n\nIf you haven't setup Go before, you first need to [install Go](https://golang.org/doc/install).\n\nTo fetch and build the code:\n\n```console\n$ go install code.sajari.com/docconv/v2/docd@latest\n```\n\nSee `go help install` for details on the installation location of the installed `docd` executable. Make sure that the full path to the executable is in your `PATH` environment variable.\n\n## Dependencies\n\n- tidy\n- wv\n- popplerutils\n- unrtf\n- https://github.com/JalfResi/justext\n\n### Debian-based Linux\n\n```console\n$ sudo apt-get install poppler-utils wv unrtf tidy\n$ go get github.com/JalfResi/justext\n```\n\n### macOS\n\n```console\n$ brew install poppler-qt5 wv unrtf tidy-html5\n$ go get github.com/JalfResi/justext\n```\n\n### Optional dependencies\n\nTo add image support to the `docconv` library you first need to [install and build gosseract](https://github.com/otiai10/gosseract/tree/v2.2.4).\n\nNow you can add `-tags ocr` to any `go` command when building/fetching/testing `docconv` to include support for processing images:\n\n```console\n$ go get -tags ocr code.sajari.com/docconv/v2/...\n```\n\nThis may complain on macOS, which you can fix by installing [tesseract](https://tesseract-ocr.github.io) via brew:\n\n```console\n$ brew install tesseract\n```\n\n## docd tool\n\nThe `docd` tool runs as either:\n\n1.  a service on port 8888 (by default)\n\n    Documents can be sent as a multipart POST request and the plain text (body) and meta information are then returned as a JSON object.\n\n2.  a service exposed from within a Docker container\n\n    This also runs as a service, but from within a Docker container.\n    Official images are published at https://hub.docker.com/r/sajari/docd.\n\n    Optionally you can build it yourself:\n\n    ```console\n    $ cd docd\n    $ docker build -t docd .\n    ```\n\n3.  via the command line.\n\n    Documents can be sent as an argument, e.g.\n\n    ```console\n    $ docd -input document.pdf\n    ```\n\n### Optional flags\n\n- `addr` - the bind address for the HTTP server, default is \":8888\"\n- `readability-length-low` - sets the readability length low if the ?readability=1 parameter is set\n- `readability-length-high` - sets the readability length high if the ?readability=1 parameter is set\n- `readability-stopwords-low` - sets the readability stopwords low if the ?readability=1 parameter is set\n- `readability-stopwords-high` - sets the readability stopwords high if the ?readability=1 parameter is set\n- `readability-max-link-density` - sets the readability max link density if the ?readability=1 parameter is set\n- `readability-max-heading-distance` - sets the readability max heading distance if the ?readability=1 parameter is set\n- `readability-use-classes` - comma separated list of readability classes to use if the ?readability=1 parameter is set\n\n### How to start the service\n\n```console\n$ # This runs on port 8000\n$ docd -addr :8000\n```\n\n## Example usage (code)\n\nSome basic code is shown below, but normally you would accept the file by HTTP or open it from the file system.\n\nThis should be enough to get you started though.\n\n### Use case 1: run locally\n\n\u003e Note: this assumes you have the [dependencies](#dependencies) installed.\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\n\t\"code.sajari.com/docconv/v2\"\n)\n\nfunc main() {\n\tres, err := docconv.ConvertPath(\"your-file.pdf\")\n\tif err != nil {\n\t\t// TODO: handle\n\t}\n\tfmt.Println(res)\n}\n```\n\n### Use case 2: request over the network\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\n\t\"code.sajari.com/docconv/v2/client\"\n)\n\nfunc main() {\n\t// Create a new client, using the default endpoint (localhost:8888)\n\tc := client.New()\n\n\tres, err := client.ConvertPath(c, \"your-file.pdf\")\n\tif err != nil {\n\t\t// TODO: handle\n\t}\n\tfmt.Println(res)\n}\n```\n\nAlternatively, via a `curl`:\n\n```console\n$ curl -s -F input=@your-file.pdf http://localhost:8888/convert\n```\n","funding_links":[],"categories":["Go","GO","Repositories"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsajari%2Fdocconv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsajari%2Fdocconv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsajari%2Fdocconv/lists"}