Projects in Awesome Lists tagged with pdftotext
A curated list of projects in awesome lists tagged with pdftotext .
https://github.com/lu4p/cat
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
cat cross-platform docx2txt extract-text go golang odt2txt pdf2txt pdftotext rtf-to-text text-extraction textextracting
Last synced: 19 Dec 2024
https://github.com/iron-software/iron-ocr-image-to-text-in-csharp
Image to Text Tutorial in C# - See https://ironsoftware.com/csharp/ocr/tutorials/how-to-read-text-from-an-image-in-csharp-net/
csharp csharp-code imagetotext ocr pdftotext
Last synced: 09 Apr 2025
https://github.com/iron-software/Iron-OCR-Image-to-Text-in-CSharp
Image to Text Tutorial in C# - See https://ironsoftware.com/csharp/ocr/tutorials/how-to-read-text-from-an-image-in-csharp-net/
csharp csharp-code imagetotext ocr pdftotext
Last synced: 04 May 2025
https://github.com/ashutoshvarma/pyxpdf
Fast and memory-efficient Python PDF Parser based on xpdf sources
cython pdf pdf-converter pdf-parser pdfparser pdftohtml pdftopng pdftotext python xpdf xpdf-reader
Last synced: 12 Apr 2025
https://github.com/icaropires/pdf2dataset
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
data-science distributed-computing distributed-systems ocr pandas-dataframe parallel parquet pdf pdf2image pdftotext pyarrow pytesseract pytesseract-ocr python python3 ray tesseract tesseract-ocr
Last synced: 13 Apr 2025
https://github.com/amenezes/aiopytesseract
A Python asyncio wrapper for Tesseract-OCR.
asyncio ocr optical-character-recognition pdftotext pytesseract pytesseract-ocr tesseract tesseract-ocr text-extraction
Last synced: 18 Dec 2024
https://github.com/yedhink/covid19-kerala-api-deprecated
Deprecated - A fast API service for retrieving day to day stats about Coronavirus(COVID-19, SARS-CoV-2) outbreak in Kerala(India).
api coronavirus coronavirus-real-time coronavirus-tracking covid-19-india covid-data covid19 covid19-data covid19dataindia covid19datakerala covid19india covid19kerala gin pdftotext
Last synced: 06 Dec 2024
https://github.com/tecosaur/pdftotext.el
A mirror of https://git.tecosaur.net/tec/pdftotext.el
emacs emacs-pacakge mirror pdftotext
Last synced: 14 Feb 2025
https://github.com/andrealenzi11/py-poppleract
Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents
ocr optical-character-recognition pdf-reader pdf-splitting pdf-to-text pdf2text pdftotext poppler poppleract py-poppleract tesseract tesseract-ocr text-extraction
Last synced: 26 Mar 2025
https://github.com/tmsincomb/imagetocsv
Converts an image to a CSV. This exists because Chorus 3.0 is bat-shit and only show images for vital metadata.
csv image2csv imagetocsv opencv pdftotext pytesseract python tesseract
Last synced: 17 Jan 2025
https://github.com/chanmo/docker-poppler
A simple RESTFul API service for poppler
pdftocairo pdftohtml pdftoppm pdftotext poppler
Last synced: 12 May 2025
https://github.com/euyogi/projeto-anceu-cs50
Meu projeto do curso CS50: Um analisador de pdfs que processa as notas dos aprovados pelo Acesso Enem e organiza tudo. Agora em C++
acesso-enem-unb cpp cs50 cs50course cs50x customtkinter enem exe extract-text-from-pdf imgui pdftotext portuguese-brazilian project python unb yogi zlib
Last synced: 14 Apr 2025
https://github.com/zeeshanahmad4/nlp-pdf-minning-extracting-text-from-pdf
NLP Pdf Minning Extracting text from pdf
extract-text pdf pdf-converter pdf-document-processor pdf-files pdf-format pdf-text-extraction pdfcon pdfkit pdftohtml pdftoimage pdftools pdftotext python text-extraction
Last synced: 01 Apr 2025
https://github.com/amitsuthar69/pdf2text
A pdf to text extractor web service written in Go.
Last synced: 20 Feb 2025
https://github.com/deardurham/ciprs-reader
Python library for reading CIPRS PDFs
codeforamerica coverage docker pdf pdftotext pytest python
Last synced: 28 Dec 2024
https://github.com/drmccoy/pdftextorizer
Interactively extract text from multi-column PDFs
gui pdf pdf-extractor pdf-files pdf2text pdftotext pyqt5 qt5
Last synced: 27 Mar 2025
https://github.com/farhan0167/bankaiagent
A tool to convert bank statements into Excel files
bank-statement-parser detr object-detection ocr pdftotext tabledetection transformers
Last synced: 12 Jan 2025
https://github.com/bradsec/pdftotext
Client browser tool to extract text from a PDF file using PDF.js
Last synced: 25 Feb 2025
https://github.com/zbioe/grapnel
Repository with tools for convert body in response to plain text
Last synced: 15 Mar 2025
https://github.com/raul23/clustering-text
Experimenting with clustering text documents (ebooks and HTML pages)
beautifulsoup clustering diskcache djvu ebooks kmeans kmeans-clustering machine-learning matplotlib nlp numpy ocr pandas pdf pdftotext python scikit-learn tesseract unsupervised-learning wikipedia
Last synced: 03 Mar 2025
https://github.com/joeychilson/pdftotext
A Go library for converting PDF files to text using the pdftotext utility.
Last synced: 07 Apr 2025
https://github.com/ananthakrishnan12/resume-analyzer-using-bert
Resume Analyzer Using BERT
bert-embeddings bert-model cosine-similarity nlp-parsing pdf pdftotext pymupdf python3 spacy-nlp streamlit transformers
Last synced: 15 Mar 2025