{"id":13415658,"url":"https://github.com/LeoFCardoso/pdf2pdfocr","last_synced_at":"2025-03-14T23:30:56.972Z","repository":{"id":13269616,"uuid":"48244523","full_name":"LeoFCardoso/pdf2pdfocr","owner":"LeoFCardoso","description":"A free tool to OCR a PDF and add a text \"layer\" in the original file, making a searchable PDF. Use only open source tools. Please tip!","archived":false,"fork":false,"pushed_at":"2024-02-04T14:23:30.000Z","size":640,"stargazers_count":261,"open_issues_count":2,"forks_count":33,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-07-31T21:54:19.956Z","etag":null,"topics":["docker","ocr","pdf","pdftk","python","tesseract"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LeoFCardoso.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-12-18T16:17:04.000Z","updated_at":"2024-07-30T05:10:12.000Z","dependencies_parsed_at":"2024-10-26T12:05:57.707Z","dependency_job_id":"e5724478-6667-42f5-a1bc-a61d0e30cee6","html_url":"https://github.com/LeoFCardoso/pdf2pdfocr","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeoFCardoso%2Fpdf2pdfocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeoFCardoso%2Fpdf2pdfocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeoFCardoso%2Fpdf2pdfocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LeoFCardoso%2Fpdf2pdfocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LeoFCardoso","download_url":"https://codeload.github.com/LeoFCardoso/pdf2pdfocr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243663341,"owners_count":20327299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","ocr","pdf","pdftk","python","tesseract"],"created_at":"2024-07-30T21:00:51.127Z","updated_at":"2025-03-14T23:30:56.588Z","avatar_url":"https://github.com/LeoFCardoso.png","language":"Python","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026business=PZZU5APJGSWVA\u0026lc=GB\u0026item_name=pdf2pdfocr%20development\u0026currency_code=USD"],"categories":["1. \u003ca name='Software'\u003e\u003c/a\u003eSoftware","Software"],"sub_categories":["1.4. \u003ca name='OCRCLI'\u003e\u003c/a\u003eOCR CLI","OCR CLI"],"readme":"# pdf2pdfocr\nA tool to OCR a PDF (or supported images) and add a text \"layer\" (a \"pdf sandwich\") in the original file making it a searchable PDF.\nThe script uses only open source tools.\n\n# donations\nThis software is free, but if you like it, please donate to support new features.\n\n[![paypal](https://www.paypalobjects.com/en_US/GB/i/btn/btn_donateCC_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026business=PZZU5APJGSWVA\u0026lc=GB\u0026item_name=pdf2pdfocr%20development\u0026currency_code=USD)\n\nBitcoin (BTC) address: [173D1zQQyzvCCCek9b1SpDvh7JikBEdtRJ](https://blockchair.com/bitcoin/address/173D1zQQyzvCCCek9b1SpDvh7JikBEdtRJ)\n\n# tips\nTips are also welcome!\n\n[![tippin.me](https://badgen.net/badge/%E2%9A%A1%EF%B8%8Ftippin.me/@LeoFCardoso/F0918E)](https://tippin.me/@LeoFCardoso)\n\nDogecoin (DOGE) address: [D94hD2qPnkxmZk8qa1b6F1d7NfUrPkmcrG](https://blockchair.com/dogecoin/address/D94hD2qPnkxmZk8qa1b6F1d7NfUrPkmcrG)\n\nPIX (Brazilian Instant Payments): 0726e8f2-7e59-488a-8abb-bda8f0d7d9ce\n\n[![chave PIX](https://raw.githubusercontent.com/LeoFCardoso/pdf2pdfocr/master/pix_qrcode.png)](https://nubank.com.br/pagar/414xb/ndt4lfy9GT)\n\nPlease contact for donations and tips in other cryptocurrencies.\n\n# installation\nIn Linux, installation is straightforward. Just install required packages and be happy.\nYou can use \"install_command\" script to copy required files to \"/usr/local/bin\".\n\nIn macOS, you will need macports.\n    \n    # First install Xcode from Mac App Store, then:\n    xcode-select --install\n    sudo xcodebuild -license\n    # Install Macports from https://www.macports.org/install.php\n    sudo port selfupdate\n    # Install tesseract as main ocr engine (Portuguese included below - please add your preferred languages)\n    sudo port install git libtool automake autoconf tesseract tesseract-por tesseract-osd tesseract-eng\n    # Install cuneiform (the optional ocr engine - see flag \"-c\")\n    sudo port install cuneiform\n    # Install qpdf (optional for better performance)\n    sudo port install qpdf\n    # Install python 3 and other dependencies\n    sudo port install python39 py39-pip poppler poppler-data ImageMagick ghostscript\n    # Configure default python3 installer\n    sudo port select --set python python39\n    sudo port select --set python3 python39\n    sudo port select --set pip pip39\n    sudo port select --set pip3 pip39\n    # Configure venv and python deps in fixed home directory\n    python3 -m venv ~/pdf2pdfocr-venv\n    ~/pdf2pdfocr-venv/bin/python3 -m pip install --upgrade pip\n    ~/pdf2pdfocr-venv/bin/pip3 install --upgrade setuptools\n    ~/pdf2pdfocr-venv/bin/pip3 install -r requirements.txt\n    ~/pdf2pdfocr-venv/bin/pip3 install -r requirements_gui.txt\n    # Copy main scripts to venv\n    cp pdf2pdfocr.py pdf2pdfocr_gui.py pdf2pdfocr_multibackground.py ~/pdf2pdfocr-venv/bin\n    sudo ./install_command\n\nCuneiform and qpdf are optional.\n\nIn Windows, you will need to manually install required software. Please read \"install_windows.txt\" file and try the tutorial with scoop tool. It's easy! :-)\n\n# docker (without GUI)\nThe Dockerfile can be used to build a docker image to run pdf2pdfocr inside a container. To build the image, please download all sources and run.\n\n    docker build -t leofcardoso/pdf2pdfocr:latest .\nIt's also possible to pull the docker image from docker hub.\n\n    docker pull leofcardoso/pdf2pdfocr\nYou can run the application with docker run.\n\n    docker run --rm -v \"$(pwd):/home/docker\" leofcardoso/pdf2pdfocr -v -i ./sample_file.pdf\n# basic usage\nThis will create a searchable (OCR) PDF file in the same dir of \"input_file\".  \n\n    pdf2pdfocr.py -i \u003cinput_file\u003e  \nIn some cases, you will want to deal with option flags. Please use:  \n\n    pdf2pdfocr.py --help \nto view all the options.\n\nIt's also possible to use GUI.\n    \n    pdf2pdfocr_gui.py \u003c\u003coptional input file\u003e\u003e\n\n# fun\nCaseiro com orgulho! ;-)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLeoFCardoso%2Fpdf2pdfocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLeoFCardoso%2Fpdf2pdfocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLeoFCardoso%2Fpdf2pdfocr/lists"}