https://github.com/aweirddev/crapdf

🦀 Extract text from PDF files.
https://github.com/aweirddev/crapdf

extract-pdf-text pdf pypdf python rust

Last synced: 4 months ago
JSON representation

🦀 Extract text from PDF files.

Host: GitHub
URL: https://github.com/aweirddev/crapdf
Owner: AWeirdDev
Created: 2024-10-30T15:37:12.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-03T02:25:59.000Z (over 1 year ago)
Last Synced: 2026-01-07T03:22:09.367Z (5 months ago)
Topics: extract-pdf-text, pdf, pypdf, python, rust
Language: Python
Homepage: https://pypi.org/project/crapdf
Size: 11.7 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # 🦀 crapdf

Extract text from a PDF file. Uses the `lopdf` crate. Kind of crappy.

```python

from crapdf import extract, extract_bytes

# Extract from file path

texts: list[str] = extract("file.pdf")

# Extract from bytes

with open("file.pdf", "rb") as f:

    content = f.read()

texts: list[str] = extract_bytes(content)

```

## Performance

Run the benchmarks using `bench.py`. Make sure to install dev dependencies from `requirements-dev.txt`.

The overall performance is similar to [`pypdf`](https://pypi.org/project/pypdf).

***

AWeirdDev. [GitHub Repo](https://github.com/AWeirdDev/crapdf)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aweirddev/crapdf

Awesome Lists containing this project

README