https://github.com/kartikmehta8/codetopdf

This Python script efficiently compiles text from multiple files in a directory into a single PDF, while excluding specified files and folders.
https://github.com/kartikmehta8/codetopdf

automation pdf-converter python

Last synced: 3 months ago
JSON representation

This Python script efficiently compiles text from multiple files in a directory into a single PDF, while excluding specified files and folders.

Host: GitHub
URL: https://github.com/kartikmehta8/codetopdf
Owner: kartikmehta8
License: mit
Created: 2024-01-06T14:21:00.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-06T14:21:11.000Z (over 1 year ago)
Last Synced: 2025-01-16T05:55:23.741Z (4 months ago)
Topics: automation, pdf-converter, python
Language: Python
Homepage:
Size: 3.91 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Folder to PDF Converter Script

## Introduction

This Python script is designed to recursively traverse through a specified folder, compile the contents of text files into a single PDF, and skip over specified unallowed files and folders. It's particularly useful for aggregating text data from multiple files into a single document.

## Dependencies

To run this script, you need Python installed on your system along with the following libraries:

- `PyPDF2`: Used for creating and manipulating PDF files.

- `reportlab`: Used for generating PDF documents with text content.

You can install these dependencies using pip:

```bash

pip install PyPDF2 reportlab

```

## Script Walkthrough

### Importing Libraries

```python

import os

import sys

from PyPDF2 import PdfFileWriter, PdfFileReader

import io

from reportlab.pdfgen import canvas

from reportlab.lib.pagesizes import letter

```

-   `os` and `sys` are standard Python libraries used for file and system operations.

-   `PyPDF2` is used for PDF manipulation.

-   `io` provides core tools for working with streams.

-   `reportlab` is used for creating PDF documents.

### Function: `add_text_to_pdf`

```python

def add_text_to_pdf(text, pdf_writer):

    packet = io.BytesIO()

    can = canvas.Canvas(packet, pagesize=letter)

    for i, line in enumerate(text.split('\n')):

        can.drawString(72, 800 - 15 * i, line)

    can.save()

    packet.seek(0)

    new_pdf = PdfFileReader(packet)

    page = new_pdf.getPage(0)

    pdf_writer.addPage(page)

```

This function creates a PDF page with the provided text. It uses `reportlab` to draw text onto a canvas, which is then converted into a PDF page and added to the `PdfFileWriter` object.

### Function: `process_folder`

```python

def process_folder(folder_path, output_pdf, unallowed_files, unallowed_folders):

    pdf_writer = PdfFileWriter()

    for root, dirs, files in os.walk(folder_path):

        dirs[:] = [d for d in dirs if d not in unallowed_folders]

        for file in files:

            if file in unallowed_files:

                continue

            file_path = os.path.join(root, file)

            try:

                with open(file_path, 'r') as f:

                    add_text_to_pdf(f.read(), pdf_writer)

            except Exception as e:

                print(f"Error processing {file_path}: {e}")

    with open(output_pdf, 'wb') as out:

        pdf_writer.write(out)

```

This function traverses the directory tree starting from `folder_path`. It skips over unallowed folders and files, reads the content of allowed files, and uses `add_text_to_pdf` to add this content to the PDF.

### Main Script

```python

if __name__ == "__main__":

    if len(sys.argv) != 3:

        print("Usage: python script.py  ")

        sys.exit(1)

    folder_path = sys.argv[1]

    output_pdf = sys.argv[2]

    unallowed_files = ['file1.txt', 'file2.txt']

    unallowed_folders = ['folder1', 'folder2']

    process_folder(folder_path, output_pdf, unallowed_files, unallowed_folders) 

```

This part of the script checks for the correct usage, takes command-line arguments for the folder path and output PDF file name, and calls `process_folder` with these parameters along with the lists of unallowed files and folders.

## Usage

Run the script from the command line, providing the path to the folder and the output PDF file name:

```bash

python script.py /path/to/folder output.pdf 

```

## Use Cases

-   **Document Aggregation**: Combining multiple text documents into a single PDF for reporting or archival purposes.

-   **Data Compilation**: Gathering text data from various files for data analysis or research.

## Conclusion

This script is a versatile tool for anyone looking to consolidate text information from multiple files into a single PDF document. Its ability to exclude specific files and folders adds to its flexibility, making it suitable for a variety of use cases in both professional and personal settings.

Remember to adjust the script for different file types and formatting needs, as it currently handles basic text files.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kartikmehta8/codetopdf

Awesome Lists containing this project

README