{"id":20650879,"url":"https://github.com/nlpatvcu/pdf2txt","last_synced_at":"2026-04-20T00:37:36.184Z","repository":{"id":96598204,"uuid":"329980883","full_name":"NLPatVCU/PDF2TXT","owner":"NLPatVCU","description":"Converts a pdf document to text. ","archived":false,"fork":false,"pushed_at":"2022-04-15T19:06:03.000Z","size":67,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-17T10:24:46.345Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NLPatVCU.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-15T17:38:07.000Z","updated_at":"2025-01-01T15:00:58.000Z","dependencies_parsed_at":"2023-04-25T21:26:04.094Z","dependency_job_id":null,"html_url":"https://github.com/NLPatVCU/PDF2TXT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPatVCU%2FPDF2TXT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPatVCU%2FPDF2TXT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPatVCU%2FPDF2TXT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPatVCU%2FPDF2TXT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NLPatVCU","download_url":"https://codeload.github.com/NLPatVCU/PDF2TXT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242750783,"owners_count":20179256,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T17:24:02.347Z","updated_at":"2026-04-20T00:37:36.140Z","avatar_url":"https://github.com/NLPatVCU.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF2TXT\n\nPDF2TXT can be used to either convert a single .pdf file to a .txt file or all .pdf files in a given directory to .txt files.\n\n![alt text](https://nlp.cs.vcu.edu/images/Edit_NanomedicineDatabase.png \"Nanoinformatics\")\n\nInstallation\n============\nwhen in the python 3 virtual environment:\n\nTo install PDF2TXT:\n```python\ngit clone https://github.com/NLPatVCU/PDF2TXT.git\n```\nYou would also need to install the Haystack framework and milvus.\n```python\npip3 install pymilvus==1.0.0\npip3 install farm-haystack==1.0.0\n```\nIf you experience any difficulties, try visiting their site: https://github.com/deepset-ai/haystack\n\nUse\n===\n\nTo convert a single file, run:\n```python\npython3 pdf2txt.py -f \u003cinput_file_path\u003e\n```\n\nTo convert an entire directory, run:\n```python\npython3 pdf2txt.py -d \u003cinput_directory_path\u003e\n```\nTo write output files into a specific directory, append with:\n```python\n-o \u003coutput_directory_path\u003e\n```\nLicense\n=======\nThis package is licensed under the GNU General Public License\n\nAcknowledgments\n===============\n- [VCU Natural Language Processing Lab](https://nlp.cs.vcu.edu/)     ![alt text](https://nlp.cs.vcu.edu/images/vcu_head_logo \"VCU\")\n- [Nanoinformatics Vertically Integrated Projects](https://rampages.us/nanoinformatics/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnlpatvcu%2Fpdf2txt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnlpatvcu%2Fpdf2txt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnlpatvcu%2Fpdf2txt/lists"}