Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aweirddev/crapdf

🦀 Extract text from PDF files.
https://github.com/aweirddev/crapdf

extract-pdf-text pdf pypdf python rust

Last synced: about 2 months ago
JSON representation

🦀 Extract text from PDF files.

Awesome Lists containing this project

README

        

# 🦀 crapdf
Extract text from a PDF file. Uses the `lopdf` crate. Kind of crappy.

```python
from crapdf import extract, extract_bytes

# Extract from file path
texts: list[str] = extract("file.pdf")

# Extract from bytes
with open("file.pdf", "rb") as f:
content = f.read()

texts: list[str] = extract_bytes(content)
```

## Performance

Run the benchmarks using `bench.py`. Make sure to install dev dependencies from `requirements-dev.txt`.

The overall performance is similar to [`pypdf`](https://pypi.org/project/pypdf).

***

AWeirdDev. [GitHub Repo](https://github.com/AWeirdDev/crapdf)