Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/diging/giles-eco-cepheus
https://github.com/diging/giles-eco-cepheus
extract-images giles-ecosystem java spring
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/diging/giles-eco-cepheus
- Owner: diging
- License: mpl-2.0
- Created: 2016-11-14T22:45:18.000Z (about 8 years ago)
- Default Branch: develop
- Last Pushed: 2023-08-15T20:19:37.000Z (over 1 year ago)
- Last Synced: 2024-04-16T02:10:27.881Z (7 months ago)
- Topics: extract-images, giles-ecosystem, java, spring
- Language: SCSS
- Size: 1000 KB
- Stars: 0
- Watchers: 12
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Cepheus
## giles-eco-cepheusThis repository contains Cepheus which is part of the Giles Ecosystem. Cepheus is an app to extract images and embedded text from PDFs.
The Giles Ecosystem is a distributed system to run OCR on images and extract images and texts from PDF files. This repository contains the text and image extraction component of this system called "Cepheus". The system requires the following software:
* Apache Tomcat 8
* Apache Kafka
* Apache Zookeeper (required by Apache Kafka)
* Tesseract (https://github.com/tesseract-ocr/)The components of the Giles Ecosystem are located in the following repositories:
* Giles: https://github.com/diging/giles-eco-giles-web (user-facing component for uploading files)
* Nepomuk: https://github.com/diging/giles-eco-nepomuk (file storage)
* Cepheus: https://github.com/diging/giles-eco-cepheus (this repository)
* Cassiopeia: https://github.com/diging/giles-eco-cassiopeia (OCR using Tesseract)The above applications have dependencies to libraries located in the following repositories:
* https://github.com/diging/giles-eco-requests
* https://github.com/diging/giles-eco-utilThere is a docker compose file for testing and evaluation purposes that sets up the Giles Ecosystem in Docker. You can find that file here: https://github.com/diging/giles-eco-docker