An open API service indexing awesome lists of open source software.

https://github.com/arpitsingh134/pdf-metadata-scanner

Hereโ€™s an updated and complete version of your README.md for the PDF Metadata Scanner API, now with support for MySQL, Docker Compose, asynchronous processing, and enhanced instructions.
https://github.com/arpitsingh134/pdf-metadata-scanner

apache-pdfbox docker docker-compose java-17 lombok-maven maven microservices-architecture mysql slf4j-loggers spring-data-jpa spring-web springboot3

Last synced: about 2 months ago
JSON representation

Hereโ€™s an updated and complete version of your README.md for the PDF Metadata Scanner API, now with support for MySQL, Docker Compose, asynchronous processing, and enhanced instructions.

Awesome Lists containing this project

README

          

# ๐Ÿ“„ PDF Metadata Scanner API

This Spring Boot project allows you to:

* Upload PDF files
* Generate SHA256 hashes
* Asynchronously extract and store metadata in a MySQL database
* Retrieve metadata by hash

## ๐Ÿš€ Features

* ๐Ÿ“ฆ RESTful endpoints for uploading and retrieving PDFs
* ๐Ÿ” SHA256-based hash identification
* โš™๏ธ Asynchronous metadata extraction
* ๐Ÿ—ƒ๏ธ MySQL persistence
* ๐Ÿณ Docker & Docker Compose support
* ๐Ÿ“ Clean logs for traceability

---

## ๐Ÿ”ง Prerequisites

* Docker + Docker Compose installed
* Optional: Java 17+ and Maven if running locally

---

## ๐Ÿณ Docker Setup (Recommended)

### Clone Repo

```bash
git clone https://github.com/arpitsingh134/PDF-Metadata-Scanne.git
```

```bash
cd pdf-metadata-scanner
```

```bash
mvn clean package
```

```bash
mvn clean install
```

### Run with Docker Compose

#### Start Docker Containers

```bash
docker-compose up --build
```

#### Stop Docker Containers
```bash
docker-compose down -v
```

---
### Check database data after running docker containers
```
docker exec -it mysqldb mysql -u testuser -p
```
#### give password

```
testpass
```

#### Run command

```
show tables;
use pdfscanner;
show tables;
select * from pdf_metadata;
```

---

## ๐Ÿ› ๏ธ Running Locally (Without Docker)

Make sure MySQL is running and `application.properties` is updated with correct DB credentials.

```bash
mvn clean spring-boot:run
```

---

## ๐Ÿ“ฎ API Usage

### ๐Ÿ”ผ Upload PDF

Uploads the file and triggers metadata extraction asynchronously.

```bash
curl -F "file=@sample.pdf" http://localhost:8080/scan
```

#### โœ… Response:

```json
{
"sha256": "hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y="
}
```

---

### ๐Ÿ” Lookup Metadata

```bash
curl http://localhost:8080/lookup/hqR2EoK%2Fq7NQ0%2FDGXAJfI%2FDa8mqwYZcD3TxA%2FpdKX1Y%3D
```

#### โœ… Example Response:

```json
{
"sha256": "hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y=",
"version": "1.7",
"producer": "Apache PDFBox",
"author": "Arpit Singh",
"created": "D:20250614010000Z",
"modified": "D:20250614020000Z",
"scanned": "2025-06-14T05:45:00Z",
"filename": "sample_20250614_114500.pdf"
}
```

---

## ๐Ÿงพ Project Structure

| Layer | Description |
| ------------ | ------------------------------------ |
| `controller` | Handles `/scan` and `/lookup/{hash}` |
| `service` | Extracts metadata using PDFBox |
| `model` | PdfMetadata entity |
| `repository` | Spring Data JPA for MySQL |
| `util` | SHA256 Hash Utility |

---

## โš™๏ธ Tech Stack

* Java 17
* Spring Boot
* Spring Web + Spring Data JPA
* MySQL
* Apache PDFBox
* Docker / Docker Compose
* Lombok + SLF4J

---

## ๐Ÿ“ Files to Note

* `Dockerfile`: For containerizing the Spring Boot app
* `docker-compose.yml`: Spins up Spring app + MySQL DB
* `application.properties`: DB configs and JPA tuning
* `PdfMetadata`: JPA entity model for metadata

---

## ๐Ÿ‘จโ€๐Ÿ’ป Author

**Arpit Singh**
๐Ÿ“ง [arpitsingh134@gmail.com](mailto:arpitsingh134@gmail.com)
๐Ÿ”— [LinkedIn](https://linkedin.com/in/arpitsingh134)

---