https://github.com/diging/giles-eco-giles-web
Distributed system based on Apache Kafka to run OCR on images and extract images and texts from PDF files.
https://github.com/diging/giles-eco-giles-web
giles-ecosystem java spring
Last synced: about 1 year ago
JSON representation
Distributed system based on Apache Kafka to run OCR on images and extract images and texts from PDF files.
- Host: GitHub
- URL: https://github.com/diging/giles-eco-giles-web
- Owner: diging
- License: mpl-2.0
- Created: 2016-11-03T23:13:41.000Z (over 9 years ago)
- Default Branch: develop
- Last Pushed: 2025-01-02T23:38:47.000Z (over 1 year ago)
- Last Synced: 2025-04-06T11:51:47.037Z (about 1 year ago)
- Topics: giles-ecosystem, java, spring
- Language: Java
- Homepage: http://gilesecosystem.io
- Size: 4.68 MB
- Stars: 9
- Watchers: 12
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Giles Ecosystem
The Giles Ecosystem is a distributed system to run OCR on images and extract images and texts from PDF files. This repository contains the user-facing component of this system called "Giles". The system requires the following software:
* Apache Tomcat 8
* Apache Kafka
* Apache Zookeeper
* MySQL (or PostgreSQL)
* Tesseract OCR (https://github.com/tesseract-ocr/)
The core components of the Giles Ecosystem are located in the following repositories:
* Giles: https://github.com/diging/giles-eco-giles-web (this repository)
* Nepomuk: https://github.com/diging/giles-eco-nepomuk (file storage)
* Cepheus: https://github.com/diging/giles-eco-cepheus (image extraction from PDF files)
* Andromemda: https://github.com/diging/giles-eco-andromeda (text extraction from PDF files)
* Cassiopeia: https://github.com/diging/giles-eco-cassiopeia (OCR using Tesseract)
The above applications have dependencies to libraries located in the following repositories:
* https://github.com/diging/giles-eco-requests
* https://github.com/diging/giles-eco-util
* https://github.com/diging/giles-eco-september-util
Additionally, Giles depends on:
* https://github.com/jdamerow/spring-social-github
* https://github.com/jdamerow/spring-social-mitreid-connect
There are some additional components of the Giles Ecosystem that can be added if required:
* September (monitoring app for the Giles Ecosystem): https://github.com/diging/giles-eco-september
* Freddie (Solr connector): https://github.com/diging/giles-eco-freddie
There is a Docker Compose file for testing and evaluation purposes that sets up the Giles Ecosystem in Docker. You can find that file here: https://github.com/diging/giles-eco-docker
You can detailed installation information and the documentation of Giles' API [here](https://diging.atlassian.net/wiki/display/GECO/Giles+Ecosystem+Home).