{"id":23241186,"url":"https://github.com/nikhil-swamix/pdf2text","last_synced_at":"2025-04-05T22:33:08.663Z","repository":{"id":135299325,"uuid":"272991284","full_name":"nikhil-swamix/PDF2Text","owner":"nikhil-swamix","description":null,"archived":false,"fork":false,"pushed_at":"2020-08-08T16:25:01.000Z","size":11309,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-19T05:08:46.076Z","etag":null,"topics":["ocr-python","pdf","pdf-converter","pdf-reader","python","text-mining"],"latest_commit_sha":null,"homepage":null,"language":"Roff","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nikhil-swamix.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-06-17T14:09:48.000Z","updated_at":"2021-09-09T07:19:23.000Z","dependencies_parsed_at":"2023-05-26T09:15:37.384Z","dependency_job_id":null,"html_url":"https://github.com/nikhil-swamix/PDF2Text","commit_stats":null,"previous_names":["nikhil-software-cartel/pdf2text","nikhil-swamix/pdf2text"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhil-swamix%2FPDF2Text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhil-swamix%2FPDF2Text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhil-swamix%2FPDF2Text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nikhil-swamix%2FPDF2Text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nikhil-swamix","download_url":"https://codeload.github.com/nikhil-swamix/PDF2Text/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247411265,"owners_count":20934650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ocr-python","pdf","pdf-converter","pdf-reader","python","text-mining"],"created_at":"2024-12-19T05:15:46.083Z","updated_at":"2025-04-05T22:33:08.635Z","avatar_url":"https://github.com/nikhil-swamix.png","language":"Roff","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF2Text\n\n# Setup:\n\n  if on linux:\n  \n    sudo apt install tesseract-ocr \n    \n    sudo apt-get install tesseract-ocr\n    \n  IF on windows install this\n  \n    https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe\n    \n    ans now add this to path \"C:\\Program Files\\Tesseract-OCR\" without this nothing will work!\n    \n    \n  Now you need these Below commands to access these libraries, first pip install these:\n  paste these in terminal:\n  \n    pip3 install PIL\n    \n    pip3 install pytesseract\n    \n    pip3 install pdf2image\n  \n  when all setup now just run the pdfreader.py\n  \n    what you will see is the text in the pdf printed in console/output\n    that is done by -\u003e image_to_string(Image.open('pdfimg.jpg') command\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikhil-swamix%2Fpdf2text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnikhil-swamix%2Fpdf2text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnikhil-swamix%2Fpdf2text/lists"}