{"id":15584259,"url":"https://github.com/jhund/pdfbox_text_extraction","last_synced_at":"2025-10-03T22:10:48.890Z","repository":{"id":56887681,"uuid":"54211271","full_name":"jhund/pdfbox_text_extraction","owner":"jhund","description":"Provides a Jruby wrapper for Apache PDFBox library to extract plain text from PDF documents.","archived":false,"fork":false,"pushed_at":"2019-07-12T21:18:06.000Z","size":4459,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-29T09:02:52.646Z","etag":null,"topics":["extract","jruby","pdfbox","plain-text"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jhund.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-03-18T15:20:03.000Z","updated_at":"2024-12-06T09:16:08.000Z","dependencies_parsed_at":"2022-08-21T00:50:43.716Z","dependency_job_id":null,"html_url":"https://github.com/jhund/pdfbox_text_extraction","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/jhund/pdfbox_text_extraction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhund%2Fpdfbox_text_extraction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhund%2Fpdfbox_text_extraction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhund%2Fpdfbox_text_extraction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhund%2Fpdfbox_text_extraction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jhund","download_url":"https://codeload.github.com/jhund/pdfbox_text_extraction/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhund%2Fpdfbox_text_extraction/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262566830,"owners_count":23329680,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extract","jruby","pdfbox","plain-text"],"created_at":"2024-10-02T20:40:31.396Z","updated_at":"2025-10-03T22:10:43.829Z","avatar_url":"https://github.com/jhund.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDFBox text extraction\n\nThis gem lets you extract plain text from PDF documents. It is a Jruby wrapper for the [Apache PDFBox](https://pdfbox.apache.org/) library.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n    gem 'pdfbox_text_extraction'\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install pdfbox_text_extraction\n\n## Usage\n\nTo extract all text on every page:\n\n    extracted_text = PdfboxTextExtraction.run(path_to_pdf)\n\nTo extract text inside a crop area:\n\n    extracted_text = PdfboxTextExtraction.run(\n      path_to_pdf,\n      {\n        crop_x: 0, # crop area top left corner x-coordinate\n        crop_y: 1.0, # crop area top left corner y-coordinate\n        crop_width: 8.5, # crop area width\n        crop_height: 9.4, # crop area height\n      }\n    )\n\n## Contributing\n\n1. Fork it ( https://github.com/jhund/pdfbox_text_extraction/fork )\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create a new Pull Request\n\n### Resources\n\n* [Source code (github)](https://github.com/jhund/pdfbox_text_extraction)\n* [Issues](https://github.com/jhund/pdfbox_text_extraction/issues)\n* [Rubygems.org](http://rubygems.org/gems/pdfbox_text_extraction)\n\n### License\n\n[MIT licensed](https://github.com/jhund/pdfbox_text_extraction/blob/master/LICENSE.txt).\n\n### Copyright\n\nCopyright (c) 2016 Jo Hund. See [(MIT) LICENSE](https://github.com/jhund/pdfbox_text_extraction/blob/master/LICENSE.txt) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhund%2Fpdfbox_text_extraction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjhund%2Fpdfbox_text_extraction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhund%2Fpdfbox_text_extraction/lists"}