{"id":34610297,"url":"https://github.com/clecherbauer/pdf-summarizer","last_synced_at":"2026-05-24T13:32:39.927Z","repository":{"id":296592634,"uuid":"993887834","full_name":"clecherbauer/pdf-summarizer","owner":"clecherbauer","description":null,"archived":false,"fork":false,"pushed_at":"2025-05-31T20:43:25.000Z","size":20,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-12-26T02:51:11.542Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clecherbauer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-31T18:29:56.000Z","updated_at":"2025-05-31T20:43:28.000Z","dependencies_parsed_at":"2025-06-01T07:02:11.277Z","dependency_job_id":"3d5c7278-7b03-47c9-b879-864fdfd0c0f0","html_url":"https://github.com/clecherbauer/pdf-summarizer","commit_stats":null,"previous_names":["clecherbauer/pdf-summarizer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/clecherbauer/pdf-summarizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clecherbauer%2Fpdf-summarizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clecherbauer%2Fpdf-summarizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clecherbauer%2Fpdf-summarizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clecherbauer%2Fpdf-summarizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clecherbauer","download_url":"https://codeload.github.com/clecherbauer/pdf-summarizer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clecherbauer%2Fpdf-summarizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33436554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-24T13:13:05.286Z","status":"ssl_error","status_checked_at":"2026-05-24T13:13:03.728Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-24T14:07:59.219Z","updated_at":"2026-05-24T13:32:39.911Z","avatar_url":"https://github.com/clecherbauer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 🧠 PDF Summarizer with LLaMA + llama-cpp-python\n\nThis project summarizes large PDF documents using a local LLaMA model (e.g., Mistral or LLaMA2) with recursive chunking and GPU-accelerated inference via `llama-cpp-python`.\n\n---\n\n## 📦 Features\n\n- Summarizes large PDF documents using chunking and recursive summarization.\n- Runs local LLaMA models in GGUF format using `llama-cpp-python`.\n- GPU acceleration using CUDA/cuBLAS for performance.\n- Avoids redundant computation with checkpointing.\n- Fully configurable via command-line arguments.\n\n---\n\n## 🔧 Setup Instructions\n### 1. Create and Activate Virtual Environment\n\n```bash\npython3 -m venv .venv\nsource .venv/bin/activate\n```\n\n---\n\n### 2. Install CUDA 12 (Ubuntu)\n\n```bash\n# Prioritize NVIDIA packages\nwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin\nsudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600\n\n# Fetch NVIDIA keys\nsudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub\n\n\n# Add NVIDIA repos\nsudo add-apt-repository \"deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /\"\nsudo apt-get update\nsudo apt-get install cuda-drivers cuda-12-6 cudnn9-cuda-12-6 libcudnn8-dev libnccl2 libnccl-dev\nsudo reboot\n```\n\nAfter reboot:\n\n```bash\nnvidia-smi\nnvcc --version\n```\n\n---\n\n### 3. Install Python Dependencies\n\n#### llama-cpp-python (GPU/cuBLAS):\n\n```bash\nCMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install --no-cache-dir --force-reinstall llama-cpp-python\n```\n\n#### Other dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n---\n\n## 🚀 Usage\n\n### Download the Mistral-7B-Instruct GGUF model from Hugging Face\n\nGo to [TheBloke/Mistral-7B-Instruct-v0.2-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF) and download your preferred GGUF file, for example:\n\n```bash\nmkdir -p models/mistral\ncd models/mistral\nwget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf\ncd ../../\n```\n\n### Run summarizer with your PDF and model\n\n```bash\npython summarize_pdf.py --pdf-path path/to/document.pdf --model-path models/mistral/mistral-7b-instruct-v0.2.Q4_K_M.gguf\n```\n\n### Optional flags:\n\n- `--max-ctx-tokens`: Override context length (default: 6144).\n- `--gpu-layers`: Number of layers offloaded to GPU (default: 20).\n\nExample:\n\n```bash\npython summarize_pdf.py \\\n  --pdf-path document.pdf \\\n  --model-path models/mistral/mistral-7b-instruct-v0.2.Q4_K_M.gguf \\\n  --max-ctx-tokens 4096 \\\n  --gpu-layers 10\n```\n\n---\n\n## 🧠 Notes\n\n- Intermediate summaries are saved in `partial_summaries/`.\n- Debug output is saved in `debug_outputs/`.\n- Final summary saved to `final_summary.txt`.\n\n---\n\n## 📁 Directory Structure\n\n```\n.\n├── summarize_pdf.py\n├── final_summary.txt\n├── partial_summaries/\n├── debug_outputs/\n└── README.md\n```\n\n---\n\n## ✅ Tested On\n\n- Ubuntu 22.04\n- Python 3.10\n- CUDA 12.9\n- NVIDIA RTX 3050 ti\n\n---\n\n## 🛡 License\n\nThis project is licensed under the **GNU General Public License v3.0**.  \nSee the [LICENSE](LICENSE) file for more details.\n\n\u003e You are free to use, modify, and distribute this software under the terms of the GPLv3. Any derivative work must also be distributed under the same license.\n\n---\n\n## 🙏 Acknowledgements\n\n- [llama.cpp](https://github.com/ggerganov/llama.cpp)\n- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)\n- [Mistral models](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclecherbauer%2Fpdf-summarizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclecherbauer%2Fpdf-summarizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclecherbauer%2Fpdf-summarizer/lists"}