{"id":16939755,"url":"https://github.com/fd0/nepomuk","last_synced_at":"2025-03-21T06:10:13.836Z","repository":{"id":159888665,"uuid":"634907313","full_name":"fd0/nepomuk","owner":"fd0","description":null,"archived":false,"fork":false,"pushed_at":"2024-03-23T17:04:22.000Z","size":11576,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-26T02:52:07.918Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fd0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-01T14:15:47.000Z","updated_at":"2023-05-01T14:16:23.000Z","dependencies_parsed_at":"2024-03-23T18:23:24.683Z","dependency_job_id":"7526abf2-15ba-4ce1-9bfe-c6c14add4e0e","html_url":"https://github.com/fd0/nepomuk","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fd0%2Fnepomuk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fd0%2Fnepomuk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fd0%2Fnepomuk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fd0%2Fnepomuk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fd0","download_url":"https://codeload.github.com/fd0/nepomuk/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244745760,"owners_count":20503050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T21:05:26.600Z","updated_at":"2025-03-21T06:10:13.817Z","avatar_url":"https://github.com/fd0.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nepomuk PDF Archive\n\nThis program implements an archive for scanned PDF documents. Each PDF file has\nan auto-detected correspondent. For each correspondent a sub directory is\ncreated and the PDF files for that correspondent are saved in the sub directory.\n\nWithin the subdir `.nepomuk`, additional files and directories are stored:\n\n * `incoming/` place new files here manually\n * `processed/` holds files optimized and OCRed before sorting\n * `db.json` contains data about the individual files\n\nFile names within `archive/Foo` (for correspondent called `Foo`) consist of the\ndate (`YYYY-MM-DD`) followed by the title, with the extension `.pdf`, for\nexample `2020-11-32 Title of the Document.pdf`. Internally, the archive system\nidentifies files based on the first four bytes of the SHA256 hash of its\ncontents. This ID is used to look up the file in the `db.json` file, which\ncontains additional metadata.\n\n# FTP Server\n\nFor testing the FTP server, the scripts named `upload-*.lftp` can be used. The\nscripts as well as some sample PDF files can be found in the `testdata/`\ndirectory, run `lftp -f testdata/upload-duplex.lftp`.\n\nIf two files with filenames starting with `duplex-odd` followed by\n`duplex-even` are uploaded, the archive will join them. This can be used to\neasily scan duplex documents with a simplex only scanner (e.g. with document\nfeeder) by first scanning the odd pages, turning the whole paper stack around\nand scanning the even pages backwards. This means the even pages are in reverse\norder. The script `upload-duplex.lftp` tests this.\n\nPDF files with the prefix `Receipt` will be split into several documents with\nexactly one page per document. This is used to scan a stack of single page\ndocuments in one run.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffd0%2Fnepomuk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffd0%2Fnepomuk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffd0%2Fnepomuk/lists"}