{"id":19468490,"url":"https://github.com/scyv/scarchive","last_synced_at":"2025-02-25T14:25:33.027Z","repository":{"id":72851979,"uuid":"98059570","full_name":"scyv/SCArchive","owner":"scyv","description":"Tool for archiving documents and pictures","archived":false,"fork":false,"pushed_at":"2017-09-13T21:01:37.000Z","size":174,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-08T04:41:26.279Z","etag":null,"topics":["document-database","pictures-archive","spring-boot","vaadin"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scyv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-22T21:46:25.000Z","updated_at":"2018-05-26T06:48:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"5388bfa0-0c03-4385-a8c0-d458ca8ff4f6","html_url":"https://github.com/scyv/SCArchive","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scyv%2FSCArchive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scyv%2FSCArchive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scyv%2FSCArchive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scyv%2FSCArchive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scyv","download_url":"https://codeload.github.com/scyv/SCArchive/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240682391,"owners_count":19840498,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document-database","pictures-archive","spring-boot","vaadin"],"created_at":"2024-11-10T18:41:29.550Z","updated_at":"2025-02-25T14:25:32.994Z","avatar_url":"https://github.com/scyv.png","language":"Java","readme":"# SCArchive\nTool for archiving documents and pictures\n\n* SCArchive scans given local folders for different file types (PDF and HMTL by now, more are coming) and extracts meta data from each file.\n* PDF Files are OCR'd and extracted with the help from PDFBox (https://pdfbox.apache.org/) tesseract (https://github.com/tesseract-ocr/tesseract) and Graphicsmagick (http://www.graphicsmagick.org/).  \n* The application uses Vaadin for providing a Web-UI where the user can search for and edit the gathered meta data.\n* As all files and also the gathered meta data is stored as local files, it is possible to synchronize the files via e.g. rsync or Resilio Sync to other machines.\n\n## Technology Stack\n\n* Java 8\n* Spring-Boot\n* Vaadin\n* PDFBox\n* tesseract\n* GraphicsMagick\n\n## Getting Started\n\n1. Install the prerequisites\n    * Java 8 or greater (https://www.java.com/de/download/)\n    * tesseract (https://github.com/tesseract-ocr/tesseract#installing-tesseract)\n    * GraphicsMagick (http://www.graphicsmagick.org/download.html)\n    * Much RAM and CPU capacity (for OCR)\n1. Currently only from source is possible\n    * Clone this repository `git clone git@github.com:scyv/SCArchive.git`\n    * Run `mvnw package`\n    * Navigate to ./target: `cd target`\n    * Copy application.properties from `src/main/resources`: `cp src/main/resources/application.properties .`\n    * Edit application.properties for your needs (see below)\n    * Run `java -jar server-0.0.1-SNAPSHOT.jar\n    \n\n## Application.properties\n\n\u003ctable\u003e\n\u003cthead\u003e\n    \u003ctr\u003e\n        \u003cth\u003eProperty key\u003c/th\u003e\n        \u003cth\u003ePossible Values\u003c/th\u003e\n        \u003cth\u003eDescription\u003c/th\u003e\n    \u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n    \u003ctr\u003e\n        \u003ctd\u003escarchive.documentpaths\u003c/td\u003e\n        \u003ctd\u003ee.g. /home/user/myFiles;/home/user/myOtherFiles\u003c/td\u003e\n        \u003ctd\u003e; separated list of folders, the application shall scan\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003escarchive.scheduler.pollingInterval\u003c/td\u003e\n        \u003ctd\u003eInteger e.g. 10\u003c/td\u003e\n        \u003ctd\u003eTime between two scans in Seconds\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003escarchive.tesseract.bin\u003c/td\u003e\n        \u003ctd\u003ee.g. /usr/bin/tesseract\u003c/td\u003e\n        \u003ctd\u003eAbsolute path to the tesseract binary\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003escarchive.graphicsmagick.bin\u003c/td\u003e\n        \u003ctd\u003ee.g. /usr/bin/gm\u003c/td\u003e\n        \u003ctd\u003eAbsolute path to the graphicsmagick binary\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003escarchive.openlocal\u003c/td\u003e\n        \u003ctd\u003etrue or false\u003c/td\u003e\n        \u003ctd\u003eWhen true, the files are opened locally, when false, the files are downloaded\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003escarchive.enablescan\u003c/td\u003e\n        \u003ctd\u003etrue or false\u003c/td\u003e\n        \u003ctd\u003eWhen true, scanning of files is enabled, when false, no scanning takes place. This is especially useful if you want to provide the web ui without letting the host do the scanning\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003escarchive.maxfindings\u003c/td\u003e\n        \u003ctd\u003ee.g. 100\u003c/td\u003e\n        \u003ctd\u003eMaximum amount of findings that shall be shown when searching for meta data\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscyv%2Fscarchive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscyv%2Fscarchive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscyv%2Fscarchive/lists"}