Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aweirddev/crapdf
🦀 Extract text from PDF files.
https://github.com/aweirddev/crapdf
extract-pdf-text pdf pypdf python rust
Last synced: about 2 months ago
JSON representation
🦀 Extract text from PDF files.
- Host: GitHub
- URL: https://github.com/aweirddev/crapdf
- Owner: AWeirdDev
- Created: 2024-10-30T15:37:12.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-31T07:53:16.000Z (3 months ago)
- Last Synced: 2024-10-31T08:03:37.908Z (3 months ago)
- Topics: extract-pdf-text, pdf, pypdf, python, rust
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🦀 crapdf
Extract text from a PDF file. Uses the `lopdf` crate. Kind of crappy.```python
from crapdf import extract, extract_bytes# Extract from file path
texts: list[str] = extract("file.pdf")# Extract from bytes
with open("file.pdf", "rb") as f:
content = f.read()texts: list[str] = extract_bytes(content)
```## Performance
Run the benchmarks using `bench.py`. Make sure to install dev dependencies from `requirements-dev.txt`.
The overall performance is similar to [`pypdf`](https://pypi.org/project/pypdf).
***
AWeirdDev. [GitHub Repo](https://github.com/AWeirdDev/crapdf)