{"id":15129865,"url":"https://github.com/king04aman/pdf-extractor-api","last_synced_at":"2026-01-18T20:32:56.143Z","repository":{"id":256663105,"uuid":"856042911","full_name":"king04aman/PDF-Extractor-API","owner":"king04aman","description":"PDF Extractor API is a FastAPI project for extracting information from PDFs. It includes user authentication, PDF uploading, and text extraction. The API supports secure PDF uploads, keyword-based extraction, and rate limiting.","archived":false,"fork":false,"pushed_at":"2024-09-19T20:59:50.000Z","size":23,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-11T17:14:04.924Z","etag":null,"topics":["api-security","docker-compose","doker","fastapi","invoice-management","invoice-pdf","jwt-auth","jwt-authentication","jwt-token","pdf-processing","pdf-processor","python","python3","rate-limiting","sap"],"latest_commit_sha":null,"homepage":"https://github.com/king04aman/PDF-Extractor-API","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/king04aman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-11T22:17:44.000Z","updated_at":"2024-12-19T13:08:33.000Z","dependencies_parsed_at":"2024-09-12T09:45:35.680Z","dependency_job_id":"147db54c-2394-48fa-a626-e32ba4564592","html_url":"https://github.com/king04aman/PDF-Extractor-API","commit_stats":null,"previous_names":["king04aman/pdf-extractor-api"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/king04aman%2FPDF-Extractor-API","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/king04aman%2FPDF-Extractor-API/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/king04aman%2FPDF-Extractor-API/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/king04aman%2FPDF-Extractor-API/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/king04aman","download_url":"https://codeload.github.com/king04aman/PDF-Extractor-API/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247386946,"owners_count":20930741,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-security","docker-compose","doker","fastapi","invoice-management","invoice-pdf","jwt-auth","jwt-authentication","jwt-token","pdf-processing","pdf-processor","python","python3","rate-limiting","sap"],"created_at":"2024-09-26T02:21:58.784Z","updated_at":"2026-01-18T20:32:56.136Z","avatar_url":"https://github.com/king04aman.png","language":"Python","readme":"\u003ch1 align=\"center\"\u003e PDF Extractor API \u003c/h1\u003e\n\n![PDF Extractor API](https://socialify.git.ci/king04aman/PDF-Extractor-API/image?description=1\u0026font=Jost\u0026language=1\u0026logo=https%3A%2F%2Fimages.weserv.nl%2F%3Furl%3Dhttps%3A%2F%2Favatars.githubusercontent.com%2Fu%2F62813940%3Fv%3D4%26h%3D250%26w%3D250%26fit%3Dcover%26mask%3Dcircle%26maxage%3D7d\u0026name=1\u0026owner=1\u0026pattern=Floating%20Cogs\u0026theme=Dark)\n\n## Overview\n\nThe PDF Extractor API is a FastAPI-based application designed to extract text and metadata from PDF files. It supports authentication using JWT tokens and rate limiting to manage API usage. The API allows users to upload PDF files, extract headers and items based on provided keywords, and handle responses in a user-friendly format.\n\n## Features\n\n- **Authentication**: Secure API access with JWT tokens.\n- **File Upload**: Upload PDF files in base64 format.\n- **PDF Extraction**: Extract headers and items from PDF files.\n- **Rate Limiting**: Protect the API from excessive usage.\n\n## Getting Started\n\nTo get started with the PDF Extractor API, follow these instructions to set up your development environment and run the application.\n\n### Prerequisites\n\n- Python 3.11+\n- Docker (optional, for containerized deployment)\n\n### Installation\n\n1. **Clone the Repository**\n\n   ```bash\n   git clone https://github.com/yourusername/pdf-extractor-api.git\n   cd pdf-extractor-api\n   ```\n2. **Set Up a Virtual Environment**\n\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows use `venv\\Scripts\\activate`\n   ```\n\n3. **Install Dependencies**\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. **Configure Environment**\n\n   Create a config.json file in the root directory with the following content:\n\n      ```json\n         {\n            \"client_id\": \"your_client_id\",\n            \"client_secret\": \"your_client_secret\",\n            \"url_auth\": \"your_auth_url\",\n            \"api_url\": \"your_api_url\",\n            \"access_token\": \"\",\n            \"expires_at\": \"\"\n         }\n      ```\n      Replace the placeholders with your actual configuration values.\n\n### Running the Application\n\n   1. **Start the Server**\n\n      ```bash\n      uvicorn main:app --host 0.0.0.0 --port 8000\n      ```\n\n   2. **Access the API**\n\n      Open your browser or API client and navigate to http://localhost:8000/docs to access the interactive API documentation provided by FastAPI.\n\n   3. **API Endpoints**\n\n      - POST /token: Obtain an access token.\n      - GET /users/me: Get information about the current user.\n      - POST /upload: Upload a PDF file in base64 format.\n      - POST /extract-header: Extract header information from a PDF.\n      - POST /extract-items: Extract item information from a PDF.\n\n### Example Usage\n\n   1. **Authenticate and Get a Token**\n\n      ```bash\n      curl -X POST \"http://localhost:8000/token\" -H \"Content-Type: application/x-www-form-urlencoded\" -d \"username=TSPABAP\u0026password=Welcome@321\"\n      ```\n   2. Upload a PDF File\n\n      ```bash\n      curl -X POST \"http://localhost:8000/upload\" -H \"Content-Type: application/json\" -d '{\"base64_string\": \"your_base64_encoded_pdf\"}'\n      ```\n\n   3. **Extract Header**\n\n      ```bash\n      curl -X POST \"http://localhost:8000/extract-header\" -H \"Authorization: Bearer your_access_token\" -H \"Content-Type: application/json\" -d '{\"file_id\": \"your_file_id\", \"keywords\": [\"keyword1\", \"keyword2\"], \"prompt\": \"Extract the header from the PDF.\"}'\n      ```\n\n   4. **Extract Items**\n\n      ```bash\n      curl -X POST \"http://localhost:8000/extract-items\" -H \"Authorization: Bearer your_access_token\" -H \"Content-Type: application/json\" -d '{\"file_id\": \"your_file_id\", \"keywords\": [\"keyword1\", \"keyword2\"], \"prompt\": \"Extract the items from the PDF.\"}'\n      ```\n\n### License\n\nThis project is licensed under the GNU General Public License v3.0 (GPL-3.0). See the [LICENSE](LICENSE) file for more details.\n\n### Contribution\n\n   We welcome contributions to improve the PDF Extractor API. Please follow these steps to contribute:\n\n   - Fork the repository.\n   - Create a new branch for your changes.\n   - Make your changes and test them.\n   - Submit a pull request with a detailed description of your changes.\n\n### Contact\nFor any questions or support, please open an issue in the repository.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fking04aman%2Fpdf-extractor-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fking04aman%2Fpdf-extractor-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fking04aman%2Fpdf-extractor-api/lists"}