{"id":29113896,"url":"https://github.com/arpitsingh134/pdf-metadata-scanner","last_synced_at":"2026-04-12T10:38:27.864Z","repository":{"id":299128385,"uuid":"1002131635","full_name":"arpitsingh134/PDF-Metadata-Scanner","owner":"arpitsingh134","description":"Here’s an updated and complete version of your README.md for the PDF Metadata Scanner API, now with support for MySQL, Docker Compose, asynchronous processing, and enhanced instructions.","archived":false,"fork":false,"pushed_at":"2025-06-15T15:38:58.000Z","size":45846,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-28T06:07:51.907Z","etag":null,"topics":["apache-pdfbox","docker","docker-compose","java-17","lombok-maven","maven","microservices-architecture","mysql","slf4j-loggers","spring-data-jpa","spring-web","springboot3"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arpitsingh134.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-14T19:30:58.000Z","updated_at":"2025-06-15T15:39:01.000Z","dependencies_parsed_at":"2025-06-14T21:19:53.215Z","dependency_job_id":"6bf83e40-f081-4ea5-b86f-b11ec0e0cba3","html_url":"https://github.com/arpitsingh134/PDF-Metadata-Scanner","commit_stats":null,"previous_names":["arpitsingh134/pdf-metadata-scanner"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arpitsingh134/PDF-Metadata-Scanner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arpitsingh134%2FPDF-Metadata-Scanner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arpitsingh134%2FPDF-Metadata-Scanner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arpitsingh134%2FPDF-Metadata-Scanner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arpitsingh134%2FPDF-Metadata-Scanner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arpitsingh134","download_url":"https://codeload.github.com/arpitsingh134/PDF-Metadata-Scanner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arpitsingh134%2FPDF-Metadata-Scanner/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262581389,"owners_count":23331913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-pdfbox","docker","docker-compose","java-17","lombok-maven","maven","microservices-architecture","mysql","slf4j-loggers","spring-data-jpa","spring-web","springboot3"],"created_at":"2025-06-29T11:05:55.662Z","updated_at":"2025-12-30T22:21:29.751Z","avatar_url":"https://github.com/arpitsingh134.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📄 PDF Metadata Scanner API\n\nThis Spring Boot project allows you to:\n\n* Upload PDF files\n* Generate SHA256 hashes\n* Asynchronously extract and store metadata in a MySQL database\n* Retrieve metadata by hash\n\n## 🚀 Features\n\n* 📦 RESTful endpoints for uploading and retrieving PDFs\n* 🔐 SHA256-based hash identification\n* ⚙️ Asynchronous metadata extraction\n* 🗃️ MySQL persistence\n* 🐳 Docker \u0026 Docker Compose support\n* 📝 Clean logs for traceability\n\n---\n\n## 🔧 Prerequisites\n\n* Docker + Docker Compose installed\n* Optional: Java 17+ and Maven if running locally\n\n---\n\n## 🐳 Docker Setup (Recommended)\n\n### Clone Repo\n\n```bash\ngit clone https://github.com/arpitsingh134/PDF-Metadata-Scanne.git\n```\n\n```bash\ncd pdf-metadata-scanner\n```\n\n```bash\nmvn clean package\n```\n\n```bash\nmvn clean install\n```\n\n### Run with Docker Compose\n\n#### Start Docker Containers\n\n```bash\ndocker-compose up --build\n```\n\n#### Stop Docker Containers\n```bash\ndocker-compose down -v\n```\n\n---\n### Check database data after running docker containers \n```\ndocker exec -it mysqldb mysql -u testuser -p    \n```\n#### give password \n\n```\ntestpass\n```\n\n#### Run command\n\n```\nshow tables;\nuse pdfscanner;\nshow tables;\nselect * from pdf_metadata;\n```\n\n\n---\n\n## 🛠️ Running Locally (Without Docker)\n\nMake sure MySQL is running and `application.properties` is updated with correct DB credentials.\n\n```bash\nmvn clean spring-boot:run\n```\n\n---\n\n## 📮 API Usage\n\n### 🔼 Upload PDF\n\nUploads the file and triggers metadata extraction asynchronously.\n\n```bash\ncurl -F \"file=@sample.pdf\" http://localhost:8080/scan\n```\n\n#### ✅ Response:\n\n```json\n{\n  \"sha256\": \"hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y=\"\n}\n```\n\n---\n\n### 🔍 Lookup Metadata\n\n```bash\ncurl http://localhost:8080/lookup/hqR2EoK%2Fq7NQ0%2FDGXAJfI%2FDa8mqwYZcD3TxA%2FpdKX1Y%3D\n```\n\n#### ✅ Example Response:\n\n```json\n{\n  \"sha256\": \"hqR2EoK/q7NQ0/DGXAJfI/Da8mqwYZcD3TxA/pdKX1Y=\",\n  \"version\": \"1.7\",\n  \"producer\": \"Apache PDFBox\",\n  \"author\": \"Arpit Singh\",\n  \"created\": \"D:20250614010000Z\",\n  \"modified\": \"D:20250614020000Z\",\n  \"scanned\": \"2025-06-14T05:45:00Z\",\n  \"filename\": \"sample_20250614_114500.pdf\"\n}\n```\n\n---\n\n## 🧾 Project Structure\n\n| Layer        | Description                          |\n| ------------ | ------------------------------------ |\n| `controller` | Handles `/scan` and `/lookup/{hash}` |\n| `service`    | Extracts metadata using PDFBox       |\n| `model`      | PdfMetadata entity                   |\n| `repository` | Spring Data JPA for MySQL            |\n| `util`       | SHA256 Hash Utility                  |\n\n---\n\n## ⚙️ Tech Stack\n\n* Java 17\n* Spring Boot\n* Spring Web + Spring Data JPA\n* MySQL\n* Apache PDFBox\n* Docker / Docker Compose\n* Lombok + SLF4J\n\n---\n\n## 📁 Files to Note\n\n* `Dockerfile`: For containerizing the Spring Boot app\n* `docker-compose.yml`: Spins up Spring app + MySQL DB\n* `application.properties`: DB configs and JPA tuning\n* `PdfMetadata`: JPA entity model for metadata\n\n---\n\n## 👨‍💻 Author\n\n**Arpit Singh**\n📧 [arpitsingh134@gmail.com](mailto:arpitsingh134@gmail.com)\n🔗 [LinkedIn](https://linkedin.com/in/arpitsingh134)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farpitsingh134%2Fpdf-metadata-scanner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farpitsingh134%2Fpdf-metadata-scanner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farpitsingh134%2Fpdf-metadata-scanner/lists"}