{"id":50390943,"url":"https://github.com/mathieubuisson/unpdf","last_synced_at":"2026-05-30T18:01:29.306Z","repository":{"id":354039437,"uuid":"1221732729","full_name":"MathieuBuisson/unpdf","owner":"MathieuBuisson","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-26T22:11:24.000Z","size":5,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-26T22:15:47.678Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MathieuBuisson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-26T16:00:12.000Z","updated_at":"2026-04-26T22:11:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MathieuBuisson/unpdf","commit_stats":null,"previous_names":["mathieubuisson/unpdf"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/MathieuBuisson/unpdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MathieuBuisson%2Funpdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MathieuBuisson%2Funpdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MathieuBuisson%2Funpdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MathieuBuisson%2Funpdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MathieuBuisson","download_url":"https://codeload.github.com/MathieuBuisson/unpdf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MathieuBuisson%2Funpdf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33703065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-30T18:01:28.034Z","updated_at":"2026-05-30T18:01:29.286Z","avatar_url":"https://github.com/MathieuBuisson.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# unpdf\n\n`unpdf` is a minimalist command-line utility designed to convert PDF documents into Markdown format, optimized for use with LLMs (Large Language Models). It leverages the power of `pymupdf4llm` to extract text, tables, and images while preserving the document structure.\n\n## Features\n\n- **Batch Processing**: Convert single files or entire directories.\n- **Directory Preservation**: Recursively scans folders and mirrors the input structure in the output directory.\n- **Smart Skip**: Automatically skips existing output files unless forced to overwrite.\n- **Error Handling**: Gracefully handles encrypted or corrupt PDFs by logging warnings and continuing the batch.\n\n## Technology Stack\n\n- **Language**: Python 3.13+\n- **PDF Engine**: [PyMuPDF4LLM](https://github.com/pymupdf/pymupdf4llm)\n- **CLI**: Standard `argparse`\n\n## Installation\n\n### Prerequisites\n\n- Python 3.13 or higher.\n\n### Setup\n\n1. Clone the repository:\n   ```bash\n   git clone \u003crepository-url\u003e\n   cd unpdf\n   ```\n\n2. Install dependencies:\n   ```bash\n   pip install .\n   ```\n\n## Usage\n\n```text\nusage: unpdf [-h] INPUT -o DIR [--recurse] [--force] [--version]\n\npositional arguments:\n  INPUT                  PDF file or folder containing PDFs\n\noptions:\n  -o DIR, --output DIR    Output folder (required)\n  --recurse               Recursively scan subfolders\n  --force                 Overwrite existing output files\n  --version               Show version information and exit\n```\n\n### Examples\n\n**Convert a single file:**\n```bash\nunpdf document.pdf -o ./output\n```\n\n**Convert an entire folder recursively:**\n```bash\nunpdf ./docs -o ./markdown_docs --recurse\n```\n\n## Project Structure\n\n```text\nunpdf/\n├── src/unpdf/                  # Source package\n│   ├── __init__.py\n│   ├── __main__.py              # Entry point: python -m unpdf\n│   ├── cli.py                   # CLI argument parsing\n│   ├── scanner.py               # PDF discovery and path mapping\n│   ├── converter.py             # Single-file PDF→Markdown conversion\n│   └── runner.py                # Batch orchestration and statistics\n├── tests/                       # Test suite\n│   ├── test_cli.py\n│   ├── test_scanner.py\n│   ├── test_converter.py\n│   └── test_runner.py\n├── pyproject.toml               # Centralized configuration\n├── SPEC.md                      # Technical specifications\n└── README.md                    # User documentation\n```\n\n## Testing\n\nRun the test suite using `pytest`:\n\n```bash\npytest tests/\n```\n\n## Code Quality\n\nThis project uses several tools to maintain code quality:\n\n```bash\nblack src/ tests/          # Format code\nmypy src/ tests/          # Type checking\nbandit -r src/ tests/      # Security analysis\n```\n\n## Usage\n\n```bash\n# After pip install .\nunpdf document.pdf -o ./output\n\n# OR without installation\npython -m unpdf document.pdf -o ./output\n```\n\n# Options","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathieubuisson%2Funpdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmathieubuisson%2Funpdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathieubuisson%2Funpdf/lists"}