https://github.com/scyv/scarchive
Tool for archiving documents and pictures
https://github.com/scyv/scarchive
document-database pictures-archive spring-boot vaadin
Last synced: 11 months ago
JSON representation
Tool for archiving documents and pictures
- Host: GitHub
- URL: https://github.com/scyv/scarchive
- Owner: scyv
- License: apache-2.0
- Created: 2017-07-22T21:46:25.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-09-13T21:01:37.000Z (over 8 years ago)
- Last Synced: 2025-01-08T04:41:26.279Z (about 1 year ago)
- Topics: document-database, pictures-archive, spring-boot, vaadin
- Language: Java
- Size: 170 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SCArchive
Tool for archiving documents and pictures
* SCArchive scans given local folders for different file types (PDF and HMTL by now, more are coming) and extracts meta data from each file.
* PDF Files are OCR'd and extracted with the help from PDFBox (https://pdfbox.apache.org/) tesseract (https://github.com/tesseract-ocr/tesseract) and Graphicsmagick (http://www.graphicsmagick.org/).
* The application uses Vaadin for providing a Web-UI where the user can search for and edit the gathered meta data.
* As all files and also the gathered meta data is stored as local files, it is possible to synchronize the files via e.g. rsync or Resilio Sync to other machines.
## Technology Stack
* Java 8
* Spring-Boot
* Vaadin
* PDFBox
* tesseract
* GraphicsMagick
## Getting Started
1. Install the prerequisites
* Java 8 or greater (https://www.java.com/de/download/)
* tesseract (https://github.com/tesseract-ocr/tesseract#installing-tesseract)
* GraphicsMagick (http://www.graphicsmagick.org/download.html)
* Much RAM and CPU capacity (for OCR)
1. Currently only from source is possible
* Clone this repository `git clone git@github.com:scyv/SCArchive.git`
* Run `mvnw package`
* Navigate to ./target: `cd target`
* Copy application.properties from `src/main/resources`: `cp src/main/resources/application.properties .`
* Edit application.properties for your needs (see below)
* Run `java -jar server-0.0.1-SNAPSHOT.jar
## Application.properties
Property key
Possible Values
Description
scarchive.documentpaths
e.g. /home/user/myFiles;/home/user/myOtherFiles
; separated list of folders, the application shall scan
scarchive.scheduler.pollingInterval
Integer e.g. 10
Time between two scans in Seconds
scarchive.tesseract.bin
e.g. /usr/bin/tesseract
Absolute path to the tesseract binary
scarchive.graphicsmagick.bin
e.g. /usr/bin/gm
Absolute path to the graphicsmagick binary
scarchive.openlocal
true or false
When true, the files are opened locally, when false, the files are downloaded
scarchive.enablescan
true or false
When true, scanning of files is enabled, when false, no scanning takes place. This is especially useful if you want to provide the web ui without letting the host do the scanning
scarchive.maxfindings
e.g. 100
Maximum amount of findings that shall be shown when searching for meta data