{"id":13625826,"url":"https://github.com/jalan/pdftotext","last_synced_at":"2025-10-21T19:30:06.192Z","repository":{"id":20152937,"uuid":"89026317","full_name":"jalan/pdftotext","owner":"jalan","description":"Simple PDF text extraction","archived":false,"fork":false,"pushed_at":"2024-05-06T01:32:20.000Z","size":221,"stargazers_count":828,"open_issues_count":12,"forks_count":100,"subscribers_count":18,"default_branch":"master","last_synced_at":"2024-05-22T05:03:00.771Z","etag":null,"topics":["pdf","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jalan.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-21T21:50:25.000Z","updated_at":"2024-05-21T16:10:47.000Z","dependencies_parsed_at":"2023-02-10T13:00:32.853Z","dependency_job_id":"976d33e1-e930-4e72-8bb7-75fd6a3fb1b4","html_url":"https://github.com/jalan/pdftotext","commit_stats":{"total_commits":140,"total_committers":4,"mean_commits":35.0,"dds":0.02857142857142858,"last_synced_commit":"8cffb5ebd4bf7861bbd01ac41ff46d106a62177b"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jalan%2Fpdftotext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jalan%2Fpdftotext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jalan%2Fpdftotext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jalan%2Fpdftotext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jalan","download_url":"https://codeload.github.com/jalan/pdftotext/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249228273,"owners_count":21233852,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pdf","python"],"created_at":"2024-08-01T21:02:02.983Z","updated_at":"2025-10-21T19:30:01.128Z","avatar_url":"https://github.com/jalan.png","language":"Python","readme":"# pdftotext\n\n[![PyPI](https://img.shields.io/pypi/v/pdftotext.svg)](https://pypi.python.org/pypi/pdftotext)\n[![Tests](https://github.com/jalan/pdftotext/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/jalan/pdftotext/actions)\n[![Downloads](https://pepy.tech/badge/pdftotext)](https://pepy.tech/project/pdftotext)\n\nSimple PDF text extraction\n\n```python\nimport pdftotext\n\n# Load your PDF\nwith open(\"lorem_ipsum.pdf\", \"rb\") as f:\n    pdf = pdftotext.PDF(f)\n\n# If it's password-protected\nwith open(\"secure.pdf\", \"rb\") as f:\n    pdf = pdftotext.PDF(f, \"secret\")\n\n# How many pages?\nprint(len(pdf))\n\n# Iterate over all the pages\nfor page in pdf:\n    print(page)\n\n# Read some individual pages\nprint(pdf[0])\nprint(pdf[1])\n\n# Read all the text into one string\nprint(\"\\n\\n\".join(pdf))\n```\n\n\n## OS Dependencies\n\nThese instructions assume you're on a recent OS. Package names may differ for an\nolder OS.\n\n### Debian, Ubuntu, and friends\n\n```\nsudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev\n```\n\n### Fedora, Red Hat, and friends\n\n```\nsudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel\n```\n\n### macOS\n\n```\nbrew install pkg-config poppler python\n```\n\n### Windows\n\nCurrently tested only when using conda:\n\n - Install the Microsoft Visual C++ Build Tools\n - Install poppler through conda:\n   ```\n   conda install -c conda-forge poppler\n   ```\n\n\n## Install\n\n```\npip install pdftotext\n```\n","funding_links":[],"categories":["Python","PDF"],"sub_categories":["Open USP Tsukubai"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjalan%2Fpdftotext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjalan%2Fpdftotext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjalan%2Fpdftotext/lists"}