{"id":25065678,"url":"https://github.com/programming-sai/pdf-summarizer","last_synced_at":"2026-02-11T11:34:09.215Z","repository":{"id":267576793,"uuid":"901681653","full_name":"Programming-Sai/PDF-Summarizer","owner":"Programming-Sai","description":"Summarise a given pdf to possibly extract only highlighted text and images","archived":false,"fork":false,"pushed_at":"2025-01-12T11:38:05.000Z","size":8922,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-24T15:55:24.556Z","etag":null,"topics":["argparse","cli","pdf","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Programming-Sai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-11T05:47:29.000Z","updated_at":"2025-01-17T12:56:25.000Z","dependencies_parsed_at":"2024-12-11T06:32:02.978Z","dependency_job_id":"6f55e106-26d1-4f38-a73c-cf9ef1f38abc","html_url":"https://github.com/Programming-Sai/PDF-Summarizer","commit_stats":null,"previous_names":["programming-sai/pdf-summarizer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Programming-Sai/PDF-Summarizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Programming-Sai%2FPDF-Summarizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Programming-Sai%2FPDF-Summarizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Programming-Sai%2FPDF-Summarizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Programming-Sai%2FPDF-Summarizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Programming-Sai","download_url":"https://codeload.github.com/Programming-Sai/PDF-Summarizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Programming-Sai%2FPDF-Summarizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29332601,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T06:13:03.264Z","status":"ssl_error","status_checked_at":"2026-02-11T06:12:55.843Z","response_time":97,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["argparse","cli","pdf","python"],"created_at":"2025-02-06T19:44:44.206Z","updated_at":"2026-02-11T11:34:09.201Z","avatar_url":"https://github.com/Programming-Sai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF Summarizer\n\n![Python](https://img.shields.io/badge/Python-3.10%2B-blue) ![Platform](https://img.shields.io/badge/Platform-Mac%20%7C%20Windows%20%7C%20Linux-lightgrey) ![Repo Size](https://img.shields.io/github/repo-size/Programming-Sai/PDF-Summarizer) ![Last Commit](https://img.shields.io/github/last-commit/Programming-Sai/PDF-Summarizer) ![Tech Stack](https://img.shields.io/badge/Built%20with-Python%20%7C%20Argparse%20%7C%20OpenCV%7C%20PymuPdf%7C%20Numpy%7C%20Pillow-brightgreen) ![Coverage](https://img.shields.io/badge/Coverage-80%25-yellowgreen)\n\nThe **PDF Summarizer** is a command-line tool designed to help users manage and perform various operations on PDF files. This README provides a clear overview of how to use the tool, highlighting its key functionalities and their implementations.\n\n---\n\n## Installation\n\n1. Clone the repository:\n\n   ```bash\n   git clone https://github.com/Programming-Sai/PDF-Summarizer.git\n   ```\n\n2. Navigate to the project directory\n\n   ```bash\n   cd PDF-Summarizer\n   ```\n\n3. Create a Virtual Environment and activate it.\n\n   ```bash\n   python -m venv .ospdf-venv\n\n   .ospdf-venv\\Scripts\\activate  # Windows\n\n   OR\n\n   source .ospdf-venv/bin/activate  # MacOS/Linux\n\n   ```\n\n   \u003cbr\u003e\n\n\u003e [!IMPORTANT]\n\u003e Make sure to select the new virtual environment `.ospdf-venv` as your interpreter in VS Code. Use the shortcut **`Ctrl + Shift + P`** (Windows/Linux) or **`Cmd + Shift + P`** (Mac), then type and select **\"Python: Select Interpreter\"**. Choose the interpreter option marked **`Recommended`** or **`Python 3.x.x ('.ospdf-venv':venv)`**.\n\n   \u003cbr\u003e\n\n4. Install the required dependencies:\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n5. Run the application:\n\n   ```bash\n   python -u main.py\n   ```\n\n   On running the application, you should see an output similar to this:\n\n   ```plaintext\n\n    _____                             _____\n   ( ___ )---------------------------( ___ )\n    |   |                             |   |\n    |   |                      _  __  |   |\n    |   |   ___  ___ _ __   __| |/ _| |   |\n    |   |  / _ \\/ __| '_ \\ / _` | |_  |   |\n    |   | | (_) \\__ \\ |_) | (_| |  _| |   |\n    |   |  \\___/|___/ .__/ \\__,_|_|   |   |\n    |   |           |_|               |   |\n    |___|                             |___|\n   (_____)---------------------------(_____)\n\n   Welcome to PDF Summarizer!\n\n   Version: 0.0.1\n\n   PDF Summarizer helps you manage and work with PDFs. Here are some of the things you can do:\n   - Summarize PDF content based on highlighted text.\n   - Split a PDF into individual pages or ranges.\n   - Merge multiple PDFs into one.\n   - Convert a PDF page into an image.\n\n   Tips\n   ------\n   - Use `init` to set the input file once and avoid specifying it repeatedly.\n   - Reset your session with `init -r` to start fresh.\n   - Use `-h` or `--help` when in doubt.\n   ```\n\n---\n\n## Functionalities\n\n### 1. Summarize Highlighted Text\n\n**Description:** Extract and summarize highlighted text from a PDF file.\n\n- **Implementation:**\n  - Parses the PDF for annotations.\n  - Extracts the highlighted content.\n  - Optionally includes images from the PDF in the output.\n- **What it does:**\n  - Produces a summary as plain text, a PDF, or a Word document.\n\n**Usage:**\n\n```bash\npython main.py summarize --input-path \u003cpath_to_pdf\u003e --output-path \u003coutput_path\u003e\n```\n\n### 2. Split PDF\n\n**Description:** Extract specific pages or ranges of pages from a PDF.\n\n- **Implementation:**\n  - Uses a PDF parser to split the document based on page indices.\n  - Saves the extracted pages as a new PDF.\n- **What it does:**\n  - Enables breaking large PDFs into smaller, more manageable files.\n\n**Usage:**\n\n```bash\npython main.py split \u003cpath_to_pdf\u003e \u003coutput_pdf\u003e --start-page \u003cstart\u003e --end-page \u003cend\u003e\n```\n\n### 3. Merge PDFs\n\n**Description:** Combine multiple PDF files into one.\n\n- **Implementation:**\n  - Reads the input PDFs.\n  - Concatenates their pages in the specified order.\n  - Outputs a single, merged PDF.\n- **What it does:**\n  - Consolidates multiple related documents into a single file.\n\n**Usage:**\n\n```bash\npython main.py merge \u003coutput_pdf\u003e \u003cinput_pdf_1\u003e \u003cinput_pdf_2\u003e ...\n```\n\n### 4. Convert PDF to Image\n\n**Description:** Convert a single page of a PDF into an image.\n\n- **Implementation:**\n  - Extracts the specified page from the PDF.\n  - Renders the page as an image.\n  - Saves the image in the desired format (e.g., PNG, JPEG).\n- **What it does:**\n  - Enables visual representation of PDF content for use in presentations or web pages.\n\n**Usage:**\n\n```bash\npython main.py pdf2img  \u003cpath_to_pdf\u003e \u003coutput_image\u003e \u003cpage_number\u003e\n```\n\n---\n\n## Tips\n\n- **Initialization:** Use the `init` command to set a default PDF file for your session, eliminating the need to specify the file repeatedly for each operation.\n- **Help:** Add `-h` or `--help` to any command for detailed usage instructions.\n- **Reset:** Start fresh by resetting the session with the `init -r` command.\n\n---\n\n## Troubleshooting\n\n- Ensure you have Python 3.10+ installed.\n- Verify dependencies are correctly installed using:\n\n  ```bash\n  pip list\n  ```\n\n- If a command fails, check the help menu for correct syntax.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprogramming-sai%2Fpdf-summarizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprogramming-sai%2Fpdf-summarizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprogramming-sai%2Fpdf-summarizer/lists"}