https://github.com/leoneversberg/pdf2md_llm
Use a local LLM to convert PDF to Markdown
https://github.com/leoneversberg/pdf2md_llm
conversion llm markdown parser pdf
Last synced: 15 days ago
JSON representation
Use a local LLM to convert PDF to Markdown
- Host: GitHub
- URL: https://github.com/leoneversberg/pdf2md_llm
- Owner: leoneversberg
- License: mit
- Created: 2025-03-05T09:22:28.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-03-10T16:00:24.000Z (11 months ago)
- Last Synced: 2025-10-29T22:54:07.581Z (3 months ago)
- Topics: conversion, llm, markdown, parser, pdf
- Language: Python
- Homepage: https://ai.gopubby.com/pdf-to-markdown-document-conversion-with-local-llms-e8ad26c13c8d
- Size: 170 KB
- Stars: 21
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pdf2md_llm
`pdf2md_llm` is a Python package that converts PDF files to Markdown using a local Large Language Model (LLM).
The package leverages the `pdf2image` library to convert PDF pages to images and a vision language model to generate Markdown text from these images.
## Features
- Convert PDF files to images.
- Generate Markdown text from images using a local LLM.
- Keep your data private. No third-party file uploads.
## Installation
You need a CUDA compatible GPU to run local LLMs with vLLM.
You can use `pip` to install the package:
```bash
pip install pdf2md-llm
```
## Usage
### CLI
You can use the `pdf2md_llm` package via the **command line interface (CLI)**.
To convert a PDF file to Markdown, run the following command:
```bash
pdf2md_llm [options]
```
#### Options
* `pdf_file`: Path to the PDF file to convert.
* `--model`: Name of the model to use (default: `Qwen/Qwen2.5-VL-3B-Instruct-AWQ`).
* `--dtype`: Data type for the model weights and activations (default: `None`).
* `--max_model_len`: Max model context length (default: `7000`).
* `--prompt`: Custom prompt for the LLM. (default: `None`).
* `--size`: Image size as a tuple (default: `(700, None)`).
* `--dpi`: DPI of the images (default: `200`).
* `--fmt`: Image format (default: `jpeg`).
* `--output_folder`: Folder to save the output Markdown file (default: `./out`).
#### Example
```bash
pdf2md_llm example.pdf --model "Qwen/Qwen2.5-VL-3B-Instruct-AWQ" --output_folder "./output"
```
##### Model Support:
Currently the following Qwen2.5-VL models are supported:
* `Qwen/Qwen2.5-VL-3B-Instruct`
* `Qwen/Qwen2.5-VL-3B-Instruct-AWQ`
* `Qwen/Qwen2.5-VL-7B-Instruct`
* `Qwen/Qwen2.5-VL-7B-Instruct-AWQ`
* `Qwen/Qwen2.5-VL-72B-Instruct`
* `Qwen/Qwen2.5-VL-72B-Instruct-AWQ`
If you want to use a different model, feel free to add a vLLM compatible model to the factory function `llm_model()` in `llm.py`
### Python API
You can use the `pdf2md_llm` package via the **Python API**.
Basic usage:
```python
from vllm import SamplingParams
from pdf2md_llm.llm import llm_model
from pdf2md_llm.pdf2img import PdfToImg
pdf2img = PdfToImg(size=(700, None), output_folder="./out")
img_files = pdf2img.convert("example.pdf")
llm = llm_model(
model="Qwen/Qwen2.5-VL-3B-Instruct-AWQ", # Name of the huggingface model
dtype="half", # Model data type
)
sampling_params = SamplingParams(
temperature=0.1,
min_p=0.1,
max_tokens=8192,
stop_token_ids=[],
)
# Append all pages to one output Markdown file
for img_file in img_files:
markdown_text = llm.generate(
img_file, sampling_params=sampling_params
) # convert image to Markdown with LLM
with open("example.md", "a", encoding="utf-8") as myfile:
myfile.write(markdown_text)
```
For a full example, see [example_api.py](./pdf2md_llm/example_api.py)
## License
This project is licensed under the MIT License. See the LICENSE file for details.
## Acknowledgements
* [pdf2image](https://github.com/Belval/pdf2image) for converting PDF files to images.
* [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL) LLM model
* [vLLM](https://github.com/vllm-project/vllm) for efficient LLM model inference