{"id":21832773,"url":"https://github.com/papercutsoftware/pdfsearch","last_synced_at":"2025-06-30T16:39:06.387Z","repository":{"id":57544075,"uuid":"189910899","full_name":"PaperCutSoftware/pdfsearch","owner":"PaperCutSoftware","description":"A full text search library for PDFs.","archived":false,"fork":false,"pushed_at":"2020-09-29T05:29:48.000Z","size":258,"stargazers_count":67,"open_issues_count":0,"forks_count":4,"subscribers_count":33,"default_branch":"main","last_synced_at":"2025-04-14T07:45:28.948Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PaperCutSoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-03T00:49:20.000Z","updated_at":"2025-03-03T14:46:33.000Z","dependencies_parsed_at":"2022-09-16T23:01:17.346Z","dependency_job_id":null,"html_url":"https://github.com/PaperCutSoftware/pdfsearch","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/PaperCutSoftware/pdfsearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaperCutSoftware%2Fpdfsearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaperCutSoftware%2Fpdfsearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaperCutSoftware%2Fpdfsearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaperCutSoftware%2Fpdfsearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PaperCutSoftware","download_url":"https://codeload.github.com/PaperCutSoftware/pdfsearch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaperCutSoftware%2Fpdfsearch/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262811453,"owners_count":23368112,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-27T19:25:39.602Z","updated_at":"2025-06-30T16:39:06.362Z","avatar_url":"https://github.com/PaperCutSoftware.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pure Go Full Text Search of PDF Files\n\nThis library implements full text search for PDFs.\n* The public APIs are in [index_search.go](index_search.go).\n\nThe are some command lines programs that demonstrate the library's functionality.\n* [examples/pdf_search_demo.go](examples/pdf_search_demo.go) demonstrates the main APIs.\n* [examples/index.go](examples/index.go) builds an index over a set of PDFs.\n* [examples/search.go](examples/search.go) searches the index build by [examples/index.go](examples/index.go).\n\nBinary versions (executables) of these three programs are available in\n[releases](https://github.com/PaperCutSoftware/pdfsearch/releases/tag/v0.0.1).\nThere are 64-bit binaries for Windows, Mac and Linux. The binaries do not require a UniDoc license.\n\n## Installation\n\n    git clone https://github.com/PaperCutSoftware/pdfsearch\n\nReplace `uniDocLicenseKey` and `companyName` in [unidoc_glue.go](internal/doclib/unidoc_glue.go)\nwith valid [UniDoc](https://unidoc.io/) license fields.\n\n    cd pdfsearch/examples\n    go build pdf_search_demo.go\n    go build index.go\n    go build search.go\n\n### [examples/pdf_search_demo.go](examples/pdf_search_demo.go)\n\n__Usage__: `./pdf_search_demo  -f \u003cPDF path\u003e \u003csearch term\u003e`\n\n__Example__: `./pdf_search_demo  -f PDF32000_2008.pdf cubic Bézier curve`\n\nThe example will search `PDF32000_2008.pdf` for _cubic Bézier curve_.\n\n`pdf_search_demo.go` shows how to use the APIs in [index_search.go](index_search.go) to\n* create indexes over PDFs,\n* search those indexes using full-text search, and\n* mark up PDFs with the locations of the search matches on pages.\n\n### [examples/index.go](examples/index.go)\n\n__Usage__: `./index \u003cfile pattern\u003e`\n\n__Example__: `./index ~/climate/**/*.pdf`\n\nThe example creates an on-disk index over the PDFs in `~/climate/` and its subdirectories.\n\n### [examples/search.go](examples/search.go)\n\n__Usage__: `./search \u003csearch term\u003e`\n\n__Example__: `./search integrated assessment model`\n\nThe example searches the on-disk index created by [examples/index.go](examples/index.go)\nfor _integrated assessment model_.\n\n## Libraries\n\n[index_search.go](index_search.go) uses [UniDoc](https://unidoc.io/) for PDF parsing and [bleve](http://github.com/blevesearch/bleve) for search.\n\n\n## Talks about this library\n[GopherCon AU 2019](https://docs.google.com/presentation/d/14FDuKAPgWM2z4V1xag0HFEzL3IJfaS4a7Wt0ChxDG6s/edit?usp=sharing)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpapercutsoftware%2Fpdfsearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpapercutsoftware%2Fpdfsearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpapercutsoftware%2Fpdfsearch/lists"}