https://github.com/marioszocs/pdf-splitter
Split PDF files by size, by page, and extract email addresses
https://github.com/marioszocs/pdf-splitter
itextpdf java pdf pdfbox pdfextraction pdfsplitter
Last synced: 11 months ago
JSON representation
Split PDF files by size, by page, and extract email addresses
- Host: GitHub
- URL: https://github.com/marioszocs/pdf-splitter
- Owner: marioszocs
- Created: 2021-05-08T15:25:52.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2025-01-28T19:38:42.000Z (over 1 year ago)
- Last Synced: 2025-03-30T10:29:45.654Z (about 1 year ago)
- Topics: itextpdf, java, pdf, pdfbox, pdfextraction, pdfsplitter
- Language: Java
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDF Splitter
The **PDF Splitter** is a desktop application built in Java for splitting PDF files by size, by pages, and extracting email addresses from PDF documents. This project utilizes the **PDFBox** and **iTextPDF** libraries to perform these operations effectively.
---
## Features
- **Split PDFs by size**: Break large PDF files into smaller chunks of a specified size.
- **Split PDFs by pages**: Divide a PDF into multiple parts after a given number of pages.
- **Extract email addresses**: Retrieve and save all email addresses found in a PDF document to a `.txt` file.
---
## How It Works
### Main Operations:
1. **Split PDF After Specific Pages**:
- Select the number of pages after which the PDF should be split.
- The resulting PDFs will be saved in the output folder.
2. **Split PDF by Specific Size**:
- Specify the maximum allowable size for each split PDF in kilobytes.
- The application will create multiple PDFs, ensuring each part adheres to the size limit.
3. **Extract Email Addresses**:
- Scans the text within PDF files for valid email addresses.
- Extracted emails are saved in a `.txt` file for easy access.
---
## Requirements
- **Java 8** or higher.
- **Maven** for dependency management.
---
## Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/pdf-splitter.git
cd pdf-splitter
```
2. Build the project:
```bash
mvn clean install
```
3. Run the application:
```bash
java -cp target/pdfsplitting-0.0.1-SNAPSHOT.jar com.pdfsplitting.Main
```
---
## Example Screenshots
### Split PDF by Size

### Input Selection

### Output Example

---
## Libraries Used
- **[Apache PDFBox](https://pdfbox.apache.org/)**: For handling PDF documents.
- **[iTextPDF](https://itextpdf.com/)**: For advanced PDF processing.
---
## Project Structure
```
marioszocs-pdf-splitter/
├── pom.xml # Maven build configuration
├── README.md # Project documentation
├── src/main/java/com/pdfsplitting/
│ ├── Main.java # Entry point of the application
│ ├── PDFFileOperations.java # Interface for PDF operations
│ ├── PDFFileOperationsImp.java # Implementation of PDF operations
│ ├── PdfUtilities.java # Utility methods for PDF handling
└── .gitignore # Ignored files
```