{"id":27157730,"url":"https://github.com/mittalsoni00/filereader","last_synced_at":"2026-04-06T08:01:39.025Z","repository":{"id":284216584,"uuid":"954213250","full_name":"mittalsoni00/FileReader","owner":"mittalsoni00","description":"Java PdfReader API is a Spring Boot-based application that extracts text from PDF files using the Apache PDFBox library. It provides a REST API to upload PDFs and retrieve their extracted text. This project simplifies text extraction for various applications like document processing and data analysis. ","archived":false,"fork":false,"pushed_at":"2025-04-08T10:45:18.000Z","size":45,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T11:34:10.801Z","etag":null,"topics":["github-config","java","maven","pdfbox","postman","spring","spring-boot","spring-web"],"latest_commit_sha":null,"homepage":"http://localhost:8080/api/parse-pdf","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mittalsoni00.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-24T18:24:36.000Z","updated_at":"2025-03-27T04:09:44.000Z","dependencies_parsed_at":"2025-03-24T19:46:21.442Z","dependency_job_id":null,"html_url":"https://github.com/mittalsoni00/FileReader","commit_stats":null,"previous_names":["mittalsoni00/filereader"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mittalsoni00/FileReader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mittalsoni00%2FFileReader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mittalsoni00%2FFileReader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mittalsoni00%2FFileReader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mittalsoni00%2FFileReader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mittalsoni00","download_url":"https://codeload.github.com/mittalsoni00/FileReader/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mittalsoni00%2FFileReader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31464102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["github-config","java","maven","pdfbox","postman","spring","spring-boot","spring-web"],"created_at":"2025-04-08T21:35:57.623Z","updated_at":"2026-04-06T08:01:38.910Z","avatar_url":"https://github.com/mittalsoni00.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# JAVA PdfReader API using LLM interaction\nA Spring Boot application that allows users to upload a **CASA bank statement PDF** and receive extracted details such as **Name**, **Email**, **Opening Balance**, and **Closing Balance** using **OpenAI GPT-4o** model.\n🔗 **Live Hosted API**: [https://pdfreader-fped.onrender.com/api/parse-pdf](https://pdfreader-fped.onrender.com/api/parse-pdf)\n\n---\n## 📌 Switch to Master Branch  \n**Note:** Please switch to the `master` branch to access all the documentation and source files.  \n\n## 📂 Project Structure\n\n### 🔹 Key Files\n\n| File | Description |\n|------|-------------|\n| `PdfController.java` | Main controller that handles file upload and calls service to process PDF |\n| `PdfService.java` | Extracts text from the PDF and uses OpenAI API for intelligent data parsing |\n| `OpenAiService.java` | Integrates with OpenAI GPT-4o via REST call using Spring's `RestTemplate` |\n| `application.properties` | Stores server config and API key (linked via environment variable for security) |\n| `Dockerfile` | Dockerized for public deployment on Render |\n| `ChatController.java` | [Debug Endpoint] Allows sending prompt manually to OpenAI via JSON |\n| `ChatPromptDTO.java` | DTO for accepting request body in JSON format |\n| `PdfParserUtil.java` | Optional utility class to aid in raw PDF parsing |\n\nAPI can be accessed via Postman or curl command.(I have used Postman{instructions below👇})\n\n\n### ❌ Files You Can Exclude  \nThe following files are not crucial to the core functionality:  \n- `ApiReaderNewApplicationTests.java`  \n- `Main.java`  \n\n## 🧪 PdfReader.java (Testing Purpose)  \nThe `PdfReader.java` file is a **testing utility** that demonstrates how text extraction is performed from a PDF. It directly uses the **Apache PDFBox API** to extract text from any PDF file.  \n\n---\n\n## 🛠️ Tech Stack\n\n- Java 17\n- Spring Boot\n- Apache PDFBox\n- OpenAI GPT-4o (via REST API)\n- Maven\n- Docker (for deployment)\n- Render (hosting platform)\n- Postman / curl for testing\n\n---\n## 💡 Features\n\n- 🔍 Intelligent extraction using **OpenAI GPT-4o** API\n- 📄 Accepts **PDF file** as multipart input\n- 🧠 Analyzes content with **LLM prompt** to extract:\n  - Name\n  - Email\n  - Opening Balance\n  - Closing Balance\n- 📬 JSON formatted output\n- 🧪 Separate test/debug endpoint to interact with OpenAI\n- 🌐 **Deployed publicly** using Docker and Render\n\n---\n\n## 🚀 Getting Started (Local Setup)\n\n### 1️⃣ Clone the Repository\n\n```bash\ngit clone https://github.com/mittalsoni00/FileReader.git\ncd FileReader\ngit checkout master\n```\n\n### 2️⃣ Add Your OpenAI API Key\n\nUse an environment variable for security:\n```bash\nexport OPENAI_API_KEY=your_secret_key_here\n```\n\nOr add it in `application.properties` (only for testing, not recommended for prod):\n```properties\nOPENAI_API_KEY=${OPENAI_API_KEY}\n```\n\n### 3️⃣ Build and Run\n\n```bash\nmvn clean install\njava -jar target/Api_Reader_New-0.0.1-SNAPSHOT.jar\n```\n\n---\n\n## 🧪 Debug Endpoint (Optional)\n\nTo test prompt-only flow (without PDF), hit:\n\n```\nPOST /api/chat\nBody (JSON):\n{\n  \"prompt\": \"Tell me a joke about Java developers\"\n}\n```\n\nThis will return a direct OpenAI response.\n\n---\n\n\n\n## 📬 API Documentation\n\n### ✅ Endpoint for PDF Parsing\n\n```\nPOST /api/parse-pdf\n```\n\n### Request Type\n`multipart/form-data`\n\n### Form Key:\n| Key | Value |\n|-----|-------|\n| `file` | [Select your PDF file] |\n\n### 🔁 Response (Success)\n\n```json\n{\n  \"response\": \"Here are the extracted details from the bank statement:\\n\\n- Name: John Doe\\n- Email: johndoe@example.com\\n- Opening Balance: $5,000\\n- Closing Balance: $6,500\"\n}\n```\n\n---\n\n## 📬 Testing via Postman\n\n### 📌 Steps\n\n1. Open Postman → **New Request**\n2. Choose **POST** → Enter URL:\n   ```\n   https://pdfreader-fped.onrender.com/api/parse-pdf\n   ```\n3. Go to **Body** tab → Select `form-data`\n4. Add a key named `file` → Upload PDF file\n5. Click **Send**\n\n### ✅ Response\nYou’ll receive a JSON containing Name, Email, and balances extracted using OpenAI.\n\n🟢 Make sure `Content-Type` is set to `multipart/form-data`. Postman sets this automatically if `form-data` is chosen.\n\n---\n\n## 🐳 Deployment Notes (on Render)\n\n- Dockerized Spring Boot app using:\n  ```dockerfile\n  FROM openjdk:17\n  WORKDIR /app\n  COPY target/Api_Reader_New-0.0.1-SNAPSHOT.jar app.jar\n  ENTRYPOINT [\"java\", \"-jar\", \"app.jar\"]\n  ```\n- Pushed to GitHub repo: [https://github.com/mittalsoni00/FileReader](https://github.com/mittalsoni00/FileReader)\n- Environment variable `OPENAI_API_KEY` added via Render dashboard\n- Health Check path: `/api/parse-pdf`\n\n\n🚀 **Boss, you made it happen — from idea to a fully deployed AI-powered PDF reader! Absolute 🔥. Proud moment! 🎯**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmittalsoni00%2Ffilereader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmittalsoni00%2Ffilereader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmittalsoni00%2Ffilereader/lists"}