{"id":21354358,"url":"https://github.com/samestrin/pdf-extract-api-digitalocean","last_synced_at":"2026-05-21T04:06:59.710Z","repository":{"id":237346833,"uuid":"794345442","full_name":"samestrin/pdf-extract-api-digitalocean","owner":"samestrin","description":"A Node.js based REST PDF Text Extraction API using pdf-parse.","archived":false,"fork":false,"pushed_at":"2024-05-08T01:59:35.000Z","size":2927,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-22T17:46:52.499Z","etag":null,"topics":["api","node","nodejs","ocr","pdf","pdf-parse","rest"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samestrin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-01T00:12:19.000Z","updated_at":"2024-07-02T11:16:52.000Z","dependencies_parsed_at":"2024-05-01T01:28:40.992Z","dependency_job_id":"457630ec-7d05-4841-bf90-24e07c94f9b2","html_url":"https://github.com/samestrin/pdf-extract-api-digitalocean","commit_stats":null,"previous_names":["samestrin/llm-pdf-ocr-api","samestrin/pdf-extract-api-digitalocean"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samestrin%2Fpdf-extract-api-digitalocean","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samestrin%2Fpdf-extract-api-digitalocean/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samestrin%2Fpdf-extract-api-digitalocean/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samestrin%2Fpdf-extract-api-digitalocean/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samestrin","download_url":"https://codeload.github.com/samestrin/pdf-extract-api-digitalocean/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243828117,"owners_count":20354435,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","node","nodejs","ocr","pdf","pdf-parse","rest"],"created_at":"2024-11-22T04:13:12.907Z","updated_at":"2026-05-21T04:06:54.673Z","avatar_url":"https://github.com/samestrin.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pdf-extract-api-digitalocean\n\n[![Star on GitHub](https://img.shields.io/github/stars/samestrin/pdf-extract-api-digitalocean?style=social)](https://github.com/samestrin/pdf-extract-api-digitalocean/stargazers)[![Fork on GitHub](https://img.shields.io/github/forks/samestrin/pdf-extract-api-digitalocean?style=social) ](https://github.com/samestrin/pdf-extract-api-digitalocean/network/members)[![Watch on GitHub](https://img.shields.io/github/watchers/samestrin/pdf-extract-api-digitalocean?style=social)](https://github.com/samestrin/pdf-extract-api-digitalocean/watchers)\n\n![Version 0.0.1](https://img.shields.io/badge/Version-0.0.1-blue) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg) ](https://opensource.org/licenses/MIT)[![Built with Python](https://img.shields.io/badge/Built%20with-Python-green)](https://www.python.org/)\n\nThis project implements a simulated Optical Character Recognition (OCR) service that extracts text from PDF files uploaded by users. Built with Node.js and utilizing several libraries such as Express, Multer, and pdf-parse, this application is designed to be easy to set up and integrate into other systems needing PDF text extraction capabilities.\n\n## Features\n\n- **PDF Text Extraction**: Allows users to upload PDF files and extracts readable text from them.\n- **File Upload Management**: Utilizes Multer for efficient handling of file uploads with customizable storage options.\n- **Error Handling**: Robust error management to ensure stability and provide meaningful error messages to the client.\n\n## Dependencies\n\n- **Node.js**: The script runs in a Node.js environment.\n- **express**: Web framework for Node.js.\n- **multer**: Middleware for handling multipart/form-data, used for uploading files.\n- **pdf-parse**: Library to parse and extract text from PDF files.\n- **fs.promises**: Part of Node.js File System module to handle file operations using promises.\n- **path**: Utilities for handling and transforming file paths.\n\n## Installing Node.js\n\nBefore installing, ensure you have Node.js and npm (Node Package Manager) installed on your system. You can download and install Node.js from [Node.js official website](https://nodejs.org/).\n\n## Installing pdf-extract-api-digitalocean\n\nTo install and use pdf-extract-api-digitalocean, follow these steps:\n\nClone the Repository: Begin by cloning the repository containing the pdf-extract-api-digitalocean to your local machine.\n\n```bash\ngit clone https://github.com/samestrin/pdf-extract-api-digitalocean/\n```\n\nSet PORT environment variable to define the port on which the server will listen. Default is 3000\n\nNavigate to your project's root directory and run:\n\n```bash\nnpm start\n```\n\n## **Endpoints**\n\n### **Extract**\n\n**Endpoint:** `/extract` **Method:** POST\n\nExtract text from a PDF file.\n\n#### **Parameters**\n\n- `file`: PDF file\n\n## **Example Usage**\n\nUse a tool like Postman or curl to make a request:\n\n```bash\ncurl -F \"file=@path_to_pdf_file.pdf\" http://localhost:[PORT]/extract\n```\n\nThe server will process the uploaded file and return the extracted text in JSON format.\n\n## **Error Handling**\n\nThe API handles errors gracefully and returns appropriate error responses.\n\n- **400 Bad Request**: Invalid request parameters.\n- **500 Internal Server Error**: Unexpected server error.\n\n## Contribute\n\nContributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Share\n\n[![Twitter](https://img.shields.io/badge/X-Tweet-blue)](https://twitter.com/intent/tweet?text=Check%20out%20this%20awesome%20project!\u0026url=https://github.com/samestrin/pdf-extract-api-digitalocean) [![Facebook](https://img.shields.io/badge/Facebook-Share-blue)](https://www.facebook.com/sharer/sharer.php?u=https://github.com/samestrin/pdf-extract-api-digitalocean) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Share-blue)](https://www.linkedin.com/sharing/share-offsite/?url=https://github.com/samestrin/pdf-extract-api-digitalocean)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamestrin%2Fpdf-extract-api-digitalocean","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamestrin%2Fpdf-extract-api-digitalocean","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamestrin%2Fpdf-extract-api-digitalocean/lists"}