Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gunnarmorling/quarkus-pdf-extract
Quarkus-based microservice to extract text from PDF files
https://github.com/gunnarmorling/quarkus-pdf-extract
Last synced: 2 months ago
JSON representation
Quarkus-based microservice to extract text from PDF files
- Host: GitHub
- URL: https://github.com/gunnarmorling/quarkus-pdf-extract
- Owner: gunnarmorling
- License: apache-2.0
- Created: 2019-04-14T12:41:02.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-15T16:00:24.000Z (over 3 years ago)
- Last Synced: 2024-08-03T01:11:52.893Z (5 months ago)
- Language: Java
- Size: 77.1 KB
- Stars: 24
- Watchers: 3
- Forks: 6
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-cloud-run - Quarkus with GraalVM
README
# Quarkus PDF Extract
An example microservice which extracts the text contents of uploaded PDF files.
It is built using [Quarkus](http://quarkus.io/) and uses [Apache PDFBox](https://pdfbox.apache.org/)
as well as Jonathan Link's [PDFLayoutTextStripper](https://github.com/JonathanLink/PDFLayoutTextStripper).## Building and Running the Service
This service is intended to be run as a native Linux binary via GraalVM.
Build the binary like so:./mvnw package -Pnative -Dnative-image.container-runtime=docker
Then create a Docker container with the binary:
docker build -f src/main/docker/Dockerfile.native -t quarkus-examples/quarkus-pdf-export .
Run the container:docker run -i --rm -p 8080:8080 -e PORT=8080 quarkus-examples/quarkus-pdf-export
## Running in Dev Mode
While working on the service, the Quarkus Dev Mode comes in handy:
./mvnw compile quarkus:dev
Modify source code and invoke the service again (see below), and it will automatically be re-compiled.
## Invoking the Service
To invoke the service, e.g. use httpie like so (adjust the file name to a PDF on your hard disk):
http -f POST localhost:8080/rest/extract pdfFile@"/path/to/some/file.pdf" -d
This will create a file _extracted.txt_ with the extracted text in the current directory.
Alternatively, open http://localhost:8080 in your web browser, select a PDF file to upload and click "Submit".
This returns a file with the extracted text which you can store to your hard disk.## Deploying to Google Cloud Run
The Quarkus-built native binary is a perfect fit for Serverless environments such as Google Cloud Run.
Follow the steps in the Google Cloud Run [documentation](https://cloud.google.com/run/docs/quickstarts/build-and-deploy)
for setting up Cloud Run and the Google Cloud SDK.Submit a build of the Docker container to the Google Container Registry:
rm -rf target/reports
gcloud builds submit --tag gcr.io//quarkus-pdf-extractThen deploy an instance of the service to Cloud Run:
gcloud beta run deploy --image gcr.io//quarkus-pdf-extract
You can then invoke the service as shown before, replacing localhost:8080 with the endpoint shown in the output of the `deploy` command (similar to https://quarkus-pdf-extract--uc.a.run.app).