https://github.com/vthuong99/web-scan-to-pdf

automation pdf python selenium

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/vthuong99/web-scan-to-pdf
Owner: VThuong99
Created: 2025-04-07T03:14:29.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-04-07T03:52:12.000Z (about 1 year ago)
Last Synced: 2025-04-07T04:30:20.589Z (about 1 year ago)
Topics: automation, pdf, python, selenium
Language: Python
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Web Screenshot to PDF

This project automates capturing screenshots from a webpage by scrolling dynamically and combining them into a PDF file. It supports large documents (e.g., thousands of pages) with error handling and configurable options.

## Features
- Access a specified URL and scroll dynamically based on page height.
- Capture screenshots for a specified number of pages.
- Optional login support for authenticated pages.
- Rename screenshot files to a 3-digit format (e.g., `page_001.png`).
- Combine screenshots into a PDF file.

## Installation

Install required libraries:
```bash
pip install -r requirements.txt
```

## Usage
1. Run the main script with command-line arguments:
```bash
python src/main.py --url --pages [--email ] [--password ] [--output ]
```
- `--url`: URL of the webpage (required).
- `--pages`: Number of pages to capture (required).
- `--email`: Email for login (optional).
- `--password`: Password for login (optional).
- `--output`: Output PDF file name (default: `output_screenshots.pdf`).

2. Run the rename script to standardize filenames:
```bash
python scr/rename_files.py
```
## Example
Capture 446 pages from a Box URL without login:
```bash
python src/main.py --url https://app.box.com/s/7mnv4zzizj3yizwnhuiaa4q7jycn9orr --pages 446
```
### Sample Results
![PDF Preview](images/output_pdf_preview.png)

# Contributing
Contributions are welcome! Please:
- Open an issue to report bugs or suggest features.
- Submit a pull request with your improvements.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vthuong99/web-scan-to-pdf

Awesome Lists containing this project

README