https://github.com/programming-sai/pdf-summarizer
Summarise a given pdf to possibly extract only highlighted text and images
https://github.com/programming-sai/pdf-summarizer
argparse cli pdf python
Last synced: 4 months ago
JSON representation
Summarise a given pdf to possibly extract only highlighted text and images
- Host: GitHub
- URL: https://github.com/programming-sai/pdf-summarizer
- Owner: Programming-Sai
- Created: 2024-12-11T05:47:29.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-12T11:38:05.000Z (over 1 year ago)
- Last Synced: 2025-06-24T15:55:24.556Z (12 months ago)
- Topics: argparse, cli, pdf, python
- Language: Python
- Homepage:
- Size: 8.51 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDF Summarizer
     
The **PDF Summarizer** is a command-line tool designed to help users manage and perform various operations on PDF files. This README provides a clear overview of how to use the tool, highlighting its key functionalities and their implementations.
---
## Installation
1. Clone the repository:
```bash
git clone https://github.com/Programming-Sai/PDF-Summarizer.git
```
2. Navigate to the project directory
```bash
cd PDF-Summarizer
```
3. Create a Virtual Environment and activate it.
```bash
python -m venv .ospdf-venv
.ospdf-venv\Scripts\activate # Windows
OR
source .ospdf-venv/bin/activate # MacOS/Linux
```
> [!IMPORTANT]
> Make sure to select the new virtual environment `.ospdf-venv` as your interpreter in VS Code. Use the shortcut **`Ctrl + Shift + P`** (Windows/Linux) or **`Cmd + Shift + P`** (Mac), then type and select **"Python: Select Interpreter"**. Choose the interpreter option marked **`Recommended`** or **`Python 3.x.x ('.ospdf-venv':venv)`**.
4. Install the required dependencies:
```bash
pip install -r requirements.txt
```
5. Run the application:
```bash
python -u main.py
```
On running the application, you should see an output similar to this:
```plaintext
_____ _____
( ___ )---------------------------( ___ )
| | | |
| | _ __ | |
| | ___ ___ _ __ __| |/ _| | |
| | / _ \/ __| '_ \ / _` | |_ | |
| | | (_) \__ \ |_) | (_| | _| | |
| | \___/|___/ .__/ \__,_|_| | |
| | |_| | |
|___| |___|
(_____)---------------------------(_____)
Welcome to PDF Summarizer!
Version: 0.0.1
PDF Summarizer helps you manage and work with PDFs. Here are some of the things you can do:
- Summarize PDF content based on highlighted text.
- Split a PDF into individual pages or ranges.
- Merge multiple PDFs into one.
- Convert a PDF page into an image.
Tips
------
- Use `init` to set the input file once and avoid specifying it repeatedly.
- Reset your session with `init -r` to start fresh.
- Use `-h` or `--help` when in doubt.
```
---
## Functionalities
### 1. Summarize Highlighted Text
**Description:** Extract and summarize highlighted text from a PDF file.
- **Implementation:**
- Parses the PDF for annotations.
- Extracts the highlighted content.
- Optionally includes images from the PDF in the output.
- **What it does:**
- Produces a summary as plain text, a PDF, or a Word document.
**Usage:**
```bash
python main.py summarize --input-path --output-path
```
### 2. Split PDF
**Description:** Extract specific pages or ranges of pages from a PDF.
- **Implementation:**
- Uses a PDF parser to split the document based on page indices.
- Saves the extracted pages as a new PDF.
- **What it does:**
- Enables breaking large PDFs into smaller, more manageable files.
**Usage:**
```bash
python main.py split --start-page --end-page
```
### 3. Merge PDFs
**Description:** Combine multiple PDF files into one.
- **Implementation:**
- Reads the input PDFs.
- Concatenates their pages in the specified order.
- Outputs a single, merged PDF.
- **What it does:**
- Consolidates multiple related documents into a single file.
**Usage:**
```bash
python main.py merge ...
```
### 4. Convert PDF to Image
**Description:** Convert a single page of a PDF into an image.
- **Implementation:**
- Extracts the specified page from the PDF.
- Renders the page as an image.
- Saves the image in the desired format (e.g., PNG, JPEG).
- **What it does:**
- Enables visual representation of PDF content for use in presentations or web pages.
**Usage:**
```bash
python main.py pdf2img
```
---
## Tips
- **Initialization:** Use the `init` command to set a default PDF file for your session, eliminating the need to specify the file repeatedly for each operation.
- **Help:** Add `-h` or `--help` to any command for detailed usage instructions.
- **Reset:** Start fresh by resetting the session with the `init -r` command.
---
## Troubleshooting
- Ensure you have Python 3.10+ installed.
- Verify dependencies are correctly installed using:
```bash
pip list
```
- If a command fails, check the help menu for correct syntax.