An open API service indexing awesome lists of open source software.

https://github.com/bjornmelin/pdfusion

A lightweight Python utility for effortlessly merging multiple PDF files into a single document.
https://github.com/bjornmelin/pdfusion

automation batch-processing cli command-line-tool document-management document-processing file-management pdf pdf-manipulation pdf-merger pdf-tools pypdf2 python python-library utilities

Last synced: 7 months ago
JSON representation

A lightweight Python utility for effortlessly merging multiple PDF files into a single document.

Awesome Lists containing this project

README

          

# ๐Ÿ“„ PDFusion

A lightweight Python utility for effortlessly merging multiple PDF files into a single document.

[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://choosealicense.com/licenses/mit/)
[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110/)
[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](CONTRIBUTING.md)
[![GitHub](https://img.shields.io/badge/GitHub-BjornMelin-181717?logo=github)](https://github.com/BjornMelin)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Bjorn%20Melin-0077B5?logo=linkedin)](https://www.linkedin.com/in/bjorn-melin/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

## ๐Ÿ“‹ Table of Contents

- [๐Ÿ“ Description](#-description)
- [๐Ÿš€ Key Features](#-key-features)
- [๐Ÿ“‚ Repository Structure](#-repository-structure)
- [๐Ÿ’ป Installation](#-installation)
- [For Users ๐ŸŒŸ](#for-users-)
- [For Developers ๐Ÿ”ง](#for-developers-)
- [๐ŸŽฎ Usage](#-usage)
- [Command Line Interface](#command-line-interface)
- [Python API](#python-api)
- [๐Ÿ› ๏ธ Development](#๏ธ-development)
- [Running Tests](#running-tests)
- [๐Ÿค Contributing](#-contributing)
- [๐Ÿ‘จโ€๐Ÿ’ป Author](#-author)
- [๐Ÿ“œ License](#-license)
- [๐ŸŒŸ Star History](#-star-history)
- [๐Ÿ™ Acknowledgments](#-acknowledgments)

## ๐Ÿ“ Description

PDFusion is a simple yet powerful command-line tool that makes it easy to combine multiple PDF files into a single document while preserving the original quality. Perfect for combining reports, consolidating documentation, or organizing digital paperwork.

### ๐Ÿš€ Key Features

- ๐Ÿ“ Merge all PDFs in a directory with a single command
- ๐Ÿ”„ Automatic alphabetical ordering of files
- โฑ๏ธ Timestamp-based output naming option
- ๐Ÿ› ๏ธ Both CLI and Python API support
- ๐Ÿ’ก Clear progress feedback and error handling
- ๐Ÿ”’ Maintains original PDF quality
- ๐Ÿ“ Detailed logging of the merge process
- ๐Ÿ” Type hints with full mypy support
- ๐Ÿงช Comprehensive test coverage (>90%)
- ๐Ÿ“Š Performance benchmarks included
- ๐Ÿ› Custom exception handling
- ๐ŸŽฏ Supports Python 3.11+

## ๐Ÿ“‚ Repository Structure

```mermaid
graph TD
A[pdfusion/] --> B[pdfusion/]
A --> C[tests/]
A --> D[examples/]
A --> E[Documentation]

B --> B1[__init__.py]
B --> B2[exceptions.py]
B --> B3[logging.py]
B --> B4[pdfusion.py]
B --> B5[py.typed]

C --> C1[__init__.py]
C --> C2[conftest.py]
C --> C3[test files]

D --> D1[basic_usage.py]

E --> E1[README.md]
E --> E2[LICENSE]
E --> E3[CONTRIBUTING.md]
E --> E4[Configuration Files]
```

## ๐Ÿ’ป Installation

### For Users ๐ŸŒŸ

```bash
pip install pdfusion
```

### For Developers ๐Ÿ”ง

```mermaid
graph LR
A[Clone Repository] --> B[Create Virtual Environment]
B --> C[Activate Environment]
C --> D[Install Dependencies]
D --> E[Ready to Develop!]
```

1. Clone the repository:

```bash
git clone https://github.com/BjornMelin/pdfusion.git
cd pdfusion
```

2. Create a virtual environment:

```bash
python -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
```

> **Note:** You can also use `virtualenv` instead of `venv`. See the [Virtual Environment Setup Guide](docs/virtualenv-setup.md) for more details.

3. Install development dependencies:

```bash
pip install -r requirements-dev.txt
```

## ๐ŸŽฎ Usage

### Quick Start Guide

1. **Install PDFusion**

```bash
pip install pdfusion
```

2. **Prepare Your PDFs**
- Create a directory with your PDF files
- Example structure:

```plaintext
my_pdfs/
โ”œโ”€โ”€ document1.pdf
โ”œโ”€โ”€ document2.pdf
โ””โ”€โ”€ document3.pdf
```

3. **Run PDFusion**

### Command Line Interface

```mermaid
graph LR
A[Input Directory] --> B[PDFusion CLI]
B --> C[Processing]
C --> D[Merged PDF]
style B fill:#f9f,stroke:#333,stroke-width:4px
```

```bash
# Basic usage
pdfusion /path/to/pdfs -o merged.pdf

# With verbose output
pdfusion /path/to/pdfs -v

# Auto timestamp filename
pdfusion /path/to/pdfs
```

#### CLI Options

- `-o, --output`: Output filename (optional)
- `-v, --verbose`: Enable verbose output
- `--version`: Show version number
- `-h, --help`: Show help message

### Python API

```python
from pdfusion import merge_pdfs

# Example 1: Basic usage
result = merge_pdfs(
input_dir="/path/to/pdfs",
output_file="merged.pdf"
)
print(f"Merged {result.files_merged} files into {result.output_path}")

# Example 2: With verbose output and auto timestamp
result = merge_pdfs(
input_dir="/path/to/pdfs",
verbose=True
)
print(f"Total pages in merged PDF: {result.total_pages}")

# Example 3: Full options
result = merge_pdfs(
input_dir="/path/to/pdfs",
output_file="merged.pdf",
verbose=True,
sort_files=True, # Sort files alphabetically
add_bookmarks=True # Add bookmarks for each merged PDF
)
```

### Example Project Structure

Create a simple script `merge_my_pdfs.py`:

```python
from pdfusion import merge_pdfs
import logging

# Set up logging (optional)
logging.basicConfig(level=logging.INFO)

# Merge PDFs
try:
result = merge_pdfs(
input_dir="./my_pdfs",
output_file="merged_document.pdf",
verbose=True
)
print(f"Successfully merged {result.files_merged} files!")
print(f"Output saved to: {result.output_path}")
print(f"Total pages: {result.total_pages}")

except Exception as e:
print(f"Error merging PDFs: {e}")
```

Run your script:

```bash
python merge_my_pdfs.py
```

### Output Format

The `merge_pdfs` function returns a result object with the following attributes:

- `files_merged`: Number of files merged
- `output_path`: Path to the merged PDF
- `total_pages`: Total number of pages in the merged PDF
- `processing_time`: Time taken to merge the PDFs

## ๐Ÿ› ๏ธ Development

### Running Tests

```bash
# Run all tests
pytest

# Run with coverage report
pytest --cov=pdfusion

# Run performance benchmarks
pytest tests/test_pdfusion.py -v -m benchmark

# Run specific test file
pytest tests/test_pdfusion.py -v
```

## ๐Ÿค Contributing

```mermaid
graph LR
A[Fork Repository] --> B[Create Feature Branch]
B --> C[Make Changes]
C --> D[Commit Changes]
D --> E[Push to Branch]
E --> F[Open Pull Request]
style F fill:#f96,stroke:#333,stroke-width:4px
```

1. Fork the repository
2. Create your feature branch (`git checkout -b feat/version/AmazingFeature`)
3. Commit your changes (`git commit -m 'type(scope): Add some AmazingFeature'`)
4. Push to the branch (`git push origin feat/version/AmazingFeature`)
5. Open a Pull Request (`feat(scope): Add some AmazingFeature`)

## ๐Ÿ‘จโ€๐Ÿ’ป Author

### Bjorn Melin

[![AWS Certified Solutions Architect](https://images.credly.com/size/110x110/images/0e284c3f-5164-4b21-8660-0d84737941bc/image.png)](https://www.credly.com/org/amazon-web-services/badge/aws-certified-solutions-architect-associate)
[![AWS Certified Developer](https://images.credly.com/size/110x110/images/b9feab85-1a43-4f6c-99a5-631b88d5461b/image.png)](https://www.credly.com/org/amazon-web-services/badge/aws-certified-developer-associate)
[![AWS Certified AI Practitioner](https://images.credly.com/size/110x110/images/4d4693bb-530e-4bca-9327-de07f3aa2348/image.png)](https://www.credly.com/org/amazon-web-services/badge/aws-certified-ai-practitioner)
[![AWS Certified Cloud Practitioner](https://images.credly.com/size/110x110/images/00634f82-b07f-4bbd-a6bb-53de397fc3a6/image.png)](https://www.credly.com/org/amazon-web-services/badge/aws-certified-cloud-practitioner)

AWS-certified Solutions Architect and Developer with expertise in cloud architecture and modern development practices. Connect with me on:

- ๐ŸŒ [GitHub](https://github.com/BjornMelin)
- ๐Ÿ’ผ [LinkedIn](https://www.linkedin.com/in/bjorn-melin/)

Project Link: [https://github.com/BjornMelin/pdfusion](https://github.com/BjornMelin/pdfusion)

## ๐Ÿ“œ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐ŸŒŸ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=bjornmelin/pdfusion&type=Date)](https://star-history.com/#bjornmelin/pdfusion&Date)

## ๐Ÿ™ Acknowledgments

- ๐Ÿ [Python](https://www.python.org/)
- ๐Ÿ“„ [pypdf2](https://pypdf.readthedocs.io/en/stable/)
- ๐Ÿท๏ธ [GitHub Badges](https://shields.io/)


โšก Built with Python 3.11 + pypdf2 by Bjorn Melin