An open API service indexing awesome lists of open source software.

https://github.com/scyv/scarchive

Tool for archiving documents and pictures
https://github.com/scyv/scarchive

document-database pictures-archive spring-boot vaadin

Last synced: 11 months ago
JSON representation

Tool for archiving documents and pictures

Awesome Lists containing this project

README

          

# SCArchive
Tool for archiving documents and pictures

* SCArchive scans given local folders for different file types (PDF and HMTL by now, more are coming) and extracts meta data from each file.
* PDF Files are OCR'd and extracted with the help from PDFBox (https://pdfbox.apache.org/) tesseract (https://github.com/tesseract-ocr/tesseract) and Graphicsmagick (http://www.graphicsmagick.org/).
* The application uses Vaadin for providing a Web-UI where the user can search for and edit the gathered meta data.
* As all files and also the gathered meta data is stored as local files, it is possible to synchronize the files via e.g. rsync or Resilio Sync to other machines.

## Technology Stack

* Java 8
* Spring-Boot
* Vaadin
* PDFBox
* tesseract
* GraphicsMagick

## Getting Started

1. Install the prerequisites
* Java 8 or greater (https://www.java.com/de/download/)
* tesseract (https://github.com/tesseract-ocr/tesseract#installing-tesseract)
* GraphicsMagick (http://www.graphicsmagick.org/download.html)
* Much RAM and CPU capacity (for OCR)
1. Currently only from source is possible
* Clone this repository `git clone git@github.com:scyv/SCArchive.git`
* Run `mvnw package`
* Navigate to ./target: `cd target`
* Copy application.properties from `src/main/resources`: `cp src/main/resources/application.properties .`
* Edit application.properties for your needs (see below)
* Run `java -jar server-0.0.1-SNAPSHOT.jar

## Application.properties


Property key
Possible Values
Description


scarchive.documentpaths
e.g. /home/user/myFiles;/home/user/myOtherFiles
; separated list of folders, the application shall scan


scarchive.scheduler.pollingInterval
Integer e.g. 10
Time between two scans in Seconds


scarchive.tesseract.bin
e.g. /usr/bin/tesseract
Absolute path to the tesseract binary


scarchive.graphicsmagick.bin
e.g. /usr/bin/gm
Absolute path to the graphicsmagick binary


scarchive.openlocal
true or false
When true, the files are opened locally, when false, the files are downloaded


scarchive.enablescan
true or false
When true, scanning of files is enabled, when false, no scanning takes place. This is especially useful if you want to provide the web ui without letting the host do the scanning


scarchive.maxfindings
e.g. 100
Maximum amount of findings that shall be shown when searching for meta data