https://github.com/arpitsingh134/pdf-metadata-scanner
Hereโs an updated and complete version of your README.md for the PDF Metadata Scanner API, now with support for MySQL, Docker Compose, asynchronous processing, and enhanced instructions.
https://github.com/arpitsingh134/pdf-metadata-scanner
apache-pdfbox docker docker-compose java-17 lombok-maven maven microservices-architecture mysql slf4j-loggers spring-data-jpa spring-web springboot3
Last synced: about 2 months ago
JSON representation
Hereโs an updated and complete version of your README.md for the PDF Metadata Scanner API, now with support for MySQL, Docker Compose, asynchronous processing, and enhanced instructions.
- Host: GitHub
- URL: https://github.com/arpitsingh134/pdf-metadata-scanner
- Owner: arpitsingh134
- License: mit
- Created: 2025-06-14T19:30:58.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-06-15T15:38:58.000Z (12 months ago)
- Last Synced: 2025-06-28T06:07:51.907Z (11 months ago)
- Topics: apache-pdfbox, docker, docker-compose, java-17, lombok-maven, maven, microservices-architecture, mysql, slf4j-loggers, spring-data-jpa, spring-web, springboot3
- Language: Java
- Homepage:
- Size: 43.7 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ PDF Metadata Scanner API
This Spring Boot project allows you to:
* Upload PDF files
* Generate SHA256 hashes
* Asynchronously extract and store metadata in a MySQL database
* Retrieve metadata by hash
## ๐ Features
* ๐ฆ RESTful endpoints for uploading and retrieving PDFs
* ๐ SHA256-based hash identification
* โ๏ธ Asynchronous metadata extraction
* ๐๏ธ MySQL persistence
* ๐ณ Docker & Docker Compose support
* ๐ Clean logs for traceability
---
## ๐ง Prerequisites
* Docker + Docker Compose installed
* Optional: Java 17+ and Maven if running locally
---
## ๐ณ Docker Setup (Recommended)
### Clone Repo
```bash
git clone https://github.com/arpitsingh134/PDF-Metadata-Scanne.git
```
```bash
cd pdf-metadata-scanner
```
```bash
mvn clean package
```
```bash
mvn clean install
```
### Run with Docker Compose
#### Start Docker Containers
```bash
docker-compose up --build
```
#### Stop Docker Containers
```bash
docker-compose down -v
```
---
### Check database data after running docker containers
```
docker exec -it mysqldb mysql -u testuser -p
```
#### give password
```
testpass
```
#### Run command
```
show tables;
use pdfscanner;
show tables;
select * from pdf_metadata;
```
---
## ๐ ๏ธ Running Locally (Without Docker)
Make sure MySQL is running and `application.properties` is updated with correct DB credentials.
```bash
mvn clean spring-boot:run
```
---
## ๐ฎ API Usage
### ๐ผ Upload PDF
Uploads the file and triggers metadata extraction asynchronously.
```bash
curl -F "file=@sample.pdf" http://localhost:8080/scan
```
#### โ
Response:
```json
{
"sha256": "hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y="
}
```
---
### ๐ Lookup Metadata
```bash
curl http://localhost:8080/lookup/hqR2EoK%2Fq7NQ0%2FDGXAJfI%2FDa8mqwYZcD3TxA%2FpdKX1Y%3D
```
#### โ
Example Response:
```json
{
"sha256": "hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y=",
"version": "1.7",
"producer": "Apache PDFBox",
"author": "Arpit Singh",
"created": "D:20250614010000Z",
"modified": "D:20250614020000Z",
"scanned": "2025-06-14T05:45:00Z",
"filename": "sample_20250614_114500.pdf"
}
```
---
## ๐งพ Project Structure
| Layer | Description |
| ------------ | ------------------------------------ |
| `controller` | Handles `/scan` and `/lookup/{hash}` |
| `service` | Extracts metadata using PDFBox |
| `model` | PdfMetadata entity |
| `repository` | Spring Data JPA for MySQL |
| `util` | SHA256 Hash Utility |
---
## โ๏ธ Tech Stack
* Java 17
* Spring Boot
* Spring Web + Spring Data JPA
* MySQL
* Apache PDFBox
* Docker / Docker Compose
* Lombok + SLF4J
---
## ๐ Files to Note
* `Dockerfile`: For containerizing the Spring Boot app
* `docker-compose.yml`: Spins up Spring app + MySQL DB
* `application.properties`: DB configs and JPA tuning
* `PdfMetadata`: JPA entity model for metadata
---
## ๐จโ๐ป Author
**Arpit Singh**
๐ง [arpitsingh134@gmail.com](mailto:arpitsingh134@gmail.com)
๐ [LinkedIn](https://linkedin.com/in/arpitsingh134)
---