{"id":22051587,"url":"https://github.com/parlaynu/s3backup","last_synced_at":"2026-05-04T02:40:44.164Z","repository":{"id":171229077,"uuid":"647444886","full_name":"parlaynu/s3backup","owner":"parlaynu","description":"Run client side encrypted backups using AWS S3. Tools to backup, restore and manage job configurations.","archived":false,"fork":false,"pushed_at":"2025-11-28T23:51:51.000Z","size":52,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-15T07:28:39.329Z","etag":null,"topics":["age-encryption","aws-iam","aws-s3","content-hash","deduplication"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/parlaynu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-05-30T19:54:13.000Z","updated_at":"2025-11-28T23:51:55.000Z","dependencies_parsed_at":"2023-12-31T04:28:50.819Z","dependency_job_id":"678bbdb7-c233-4c3a-8154-f4af6d5ad556","html_url":"https://github.com/parlaynu/s3backup","commit_stats":null,"previous_names":["studio1767/s3backup","parlaynu/s3backup"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/parlaynu/s3backup","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parlaynu%2Fs3backup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parlaynu%2Fs3backup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parlaynu%2Fs3backup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parlaynu%2Fs3backup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/parlaynu","download_url":"https://codeload.github.com/parlaynu/s3backup/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/parlaynu%2Fs3backup/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32592720,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T22:12:39.696Z","status":"online","status_checked_at":"2026-05-04T02:00:06.625Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["age-encryption","aws-iam","aws-s3","content-hash","deduplication"],"created_at":"2024-11-30T15:09:43.145Z","updated_at":"2026-05-04T02:40:44.146Z","avatar_url":"https://github.com/parlaynu.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Backup and Restore to AWS S3\n\nThis project creates tools to backup local data to an AWS S3 bucket, restore the backups (fully or partially based on a regex),\nand some helper tools to upload job configurations and download files with transparent decryption.\n\n## Key Features\n\n* client side ecryption of data and metadata\n* client managed keys\n* support for multiple backup sources per job\n* support for multiple backup jobs per repository\n* deduplication of all files across all jobs in the repository\n\n## Setup\n\nSetting up the environment with all the keys, and users and permissions is reasonably involved. Instead\nof documenting all the steps, and no doubt getting things wrong, there is a sibling repository to this \none at [s3backup-infrastructure](https://github.com/studio1767/s3backup-infrastructure) that uses terraform to \ngenerate a working system that's ready to use.\n\nIt creates the s3 bucket, generates IAM users with correct permissions, creates the encryption keys and finally, generates\ntemplate job configurations for each backup job. Once this is done, you only need to update the job configuration\nfiles to match your backup needs, upload them, and you have a working system.\n\nThe following sections describe how different parts of the system work.\n\n## Bucket Structure\n\nThere are four key prefixes used in the bucket as described in the table below.\n\n|   Prefix   | Description                                              |\n|------------|----------------------------------------------------------|\n| repo/      | used for the age recipients key for encrypting data      |\n| jobs/      | all job configurations                                   |\n| manifests/ | uploaded manifests for each backup                       |\n| data/      | the backed up data stored under a content-hash hierarchy |\n\nThe `repo/` prefix currently has a single object with the key `repo/recipients.txt`. This holds\nthe recipients key for the age encryption algorithm and is required to be present. In the default\npermissions setup, backup and restore users only have read access to this key.\n\nThe `jobs/` prefix is where all the job configurations are stored. \n\nThe key format is:\n\n    jobs/\u003cjobname\u003e/\u003cjobname\u003e-\u003cversion\u003e.yml\n\nAn example of what this looks like is this:\n\n    2023-01-29 17:03:53        756 jobs/nimbus/nimbus-001.yml\n    2023-01-30 09:38:15        777 jobs/nimbus/nimbus-002.yml\n    2023-03-27 13:34:11        951 jobs/nimbus/nimbus-003.yml\n    2023-03-28 14:53:29        771 jobs/nimbus/nimbus-004.yml\n\nThe backup application downloads the most recent job configuration and uses the contents to drive\nthe backup process.\n\nIf jobs are uploaded using the `s3jobupload` tools, they will be encrypted before uploading to the bucket.\n\nThe `manifests/` prefix is where the backup tool uploads the backup manifests to. This file is a simple csv file\nthat lists all the files processed by the backup and their metadata. It is used on the next backup to generate the\ndiff between what's on the disk and what's already uploaded. \n\nThe format of the keys is:\n\n    manifests/\u003cjobname\u003e/\u003clabelname\u003e/\u003cjobname\u003e-\u003clabelname\u003e-\u003ctimestamp\u003e\n\nAn example of what this looks like is:\n\n    2023-02-28 13:10:58    1794480 manifests/nimbus/nimbus/nimbus-nimbus-2023-02-28-47431.csv.gz\n    2023-03-16 11:21:28    1794600 manifests/nimbus/nimbus/nimbus-nimbus-2023-03-16-40837.csv.gz\n    2023-03-28 15:06:35    1800139 manifests/nimbus/nimbus/nimbus-nimbus-2023-03-28-54390.csv.gz\n    2023-04-21 20:14:37    1800140 manifests/nimbus/nimbus/nimbus-nimbus-2023-04-21-72872.csv.gz\n    2023-05-03 13:48:40    1799336 manifests/nimbus/nimbus/nimbus-nimbus-2023-05-03-49716.csv.gz\n\nThe `data/` prefix holds the actual backed up file data. It is stored in keys named after the content\nhash of the file. The manifest file maps the backed up file name to it's content hash for\nrestoring. \n\nThe format of the keys is:\n\n    data/\u003cfirst-4-characters-of-hash\u003e/\u003cfull-hash\u003e\n\nAn example of this is:\n\n    2023-04-30 11:17:44     869607 data/3c34/3c346f7689103d92df482fda90300dd8c66da0a883a9be02ab0eba41d04c39d5\n    2023-05-31 11:37:09    3976570 data/3c34/3c349ec0413b44aad9b95bc3d7155f559896fbf6b80857ca7bfa25ad05fb3a9e\n    2023-02-28 09:39:13       5015 data/3c36/3c36d6326384e558d9e74143b12208d882431bb475a583c276468d3b85563db0\n    2023-05-10 10:03:05     243932 data/3c37/3c373028c9e73fb01278a94fd4f7095ed471272d115baaf43733bdef4c1031ea\n    2023-01-31 17:29:32       5492 data/3c37/3c37ec4d217101090d4eea8d9fdf3b94cf674ca52676c39522aed250f5bbb3d3\n    2023-05-10 09:50:07        399 data/3c38/3c38f7a454656cba1f35bffbb2bcc99796fdc3cacb1ed3d9e80941edc1ef9188\n    2023-01-04 14:08:24        823 data/3c39/3c39093bdfcf762ac5df336a8a04defc03dfc84d10b94b2826d740ac9b506026\n    2023-03-10 05:26:18     737528 data/3c39/3c390a4f7dd4733633da106e1282bf03ba5e23197b6826f72d58371ec5d2d786\n    2023-04-28 14:58:55     393046 data/3c39/3c390b0a4d6ccdea0329f5d0a0b565c67b6d0d26e889fa58ed2affe7aa0e2fe0\n\n## Encryption\n\nA very important thing to keep in mind here is that the encryption and decryption all happens on the client side\nusing client managed keys. If the client loses the keys, then the backups will no longer be accessible.\nSo, keep them somewhere safe!\n\nThe metadata (job configurations and manifest files) and data are encrypted automatically during the backup.\nAs mentioned previously, this happens on the client side, using client managed keys. Both use [age](https://github.com/FiloSottile/age)\nfor ecnryption.\n\nThe backup data is encrypted using the recipients file in the bucket under the key `repo/recipients.txt`. This \nis a hard requirement - the backup won't run if the file isn't there.\n\nTo restore, you need the matching identities file, accessible on your local disk. Do not upload the \nidentities file to the bucket.\n\nUsing asymmetric encryption for the data means that the public key can be made accessible in the bucket for all\nbackup users to access, but only people you provide the identities key to can restore.\n\nThe metadata is encrypted using passphrases. These are unique to each user. This way users can only download\nand decrypt their own job and manifest files.\n\nThe keys for the metadata encryption/decryption are stored by default in:\n\n    ~/.s3bu/secrets.yml\n\nIt is also a hard requirement that this file be present and have valid contents.\n\nThe file generated by the infrastructure project creates passphrases of the same form that age itself will generate.\nThe contents of the secrets file is like this:\n\n    - id: Ol4uX0S47CRmk2ZR\n      passphrase: often-shove-area-age-frog-brick-lift-snake-city-joy\n    - id: jhU3wC1wQ5sFakiN\n      passphrase: call-mention-enjoy-roast-woman-clinic-excite-buyer-toad-meadow\n    - id: 0S5F4YM84UPJEhBG\n      passphrase: gift-range-soon-sand-wolf-salmon-uphold-captain-material-champion\n    - id: PtV7YohWQhtz1KTZ\n      passphrase: absurd-fresh-jeans-weapon-chaos-door-dutch-tell-trick-vanish\n\nWhen encrypting, the last entry in the file is used. To retire keys, simply add a new passphrase \nto the file. Leave the older entries as they will be needed to decrypt older jobs and manifests.\n\nThe `id` is added to the uploaded file's metadata and is used to locate the correct passphrase for\ntransparent decryption.\n\n## Job Configuration File\n\nThe job configuration file is a simple yaml file. A very simple configuration looks like this:\n\n    ---\n    # the directory to scan for inputs\n    sources:\n    - path: /home/me\n      label: home\n    - path: /home/me/Projects\n      label: projects\n      \n    # list of file extensions to exclude\n    exclude_extensions:\n    - .DS_Store\n    - .tfstate\n    - .tfstate.backup\n    \n    # list of file extensions to include\n    # include_extensions:\n    # - .go\n    # - .txt\n    \n    # top level directories to exclude\n    exclude_top_dirs:\n    - Applications\n    - Downloads\n    - .Trash\n    \n    # top level directories to include\n    # include_top_dirs:\n    # - Projects\n    \n    # directories we skip\n    skip_dirs:\n    - 3rdparty\n    - pyenv\n    \n    # directories we skip ... if they have one of these (file or directory)\n    skip_dir_items:\n    - .nobackup\n    - .git\n\nThe `sources` key lists the local source directories to backup. The `path` is the physical path and the `label` a logical \nname. If the physical mount point changes, you can update the path and keep the label the same and the backups will continue \nas normal. Manifest files are keyed using both the jobname and label as defined in here.\n\nFor the exclude/include options, include is evaluated first, then exclude: if an extension or directory is in both lists, it\nwill be excluded. I rarely use the 'include_' variant - if it isn't present, it includes everything.\n\nThe `*_extensions` options list file extensions to consider. \n\nThe `*_top_dirs` are only evaluated for directories at the top level of the backup source.\n\nThe key `skip_dirs` lists directories to skip at all levels of the backup.\n\nThe `skip_dir_items` will skip directories if there is a file or directory with the name of one of the items in the\ndirectory.\n\n## Usage\n\n### Update Job Configuration\n\nTo update a job configuration, first download the latest version of the job file to edit. \n\n    s3jobdownload -p \u003cmy-aws-profile\u003e \u003cbucket-name\u003e \u003cjob-name\u003e\n\nThis will automatically find the most recent configuration for the named job and download it. The\ninfrastructure project automatically creates initial job configurations from a default so there \nwill always be a job there to edit.\n\nEdit the configuration using a text editor and then upload:\n\n    s3jobupload -p \u003cmy-aws-profile\u003e \u003cbucket-name\u003e \u003cjob-name\u003e \u003cpath-to-job-file\u003e\n\nThis will rename the file to be numbered as the next number in sequence for the job configurations\nand it will become the active configuration.\n\nIf this is the first time you have updated the configuration, the job will be stored at a key matching this:\n\n    jobs/\u003cjobname\u003e/\u003cjobname\u003e-001.yml\n\n### Running Backups\n\nTo run the backup, run this command:\n\n    s3backup -p \u003cmy-aws-profile\u003e \u003cbackup-bucket-name\u003e \u003cmyjobname\u003e\n\nThe backup procedure is something like this:\n\n* download the latest job configuration (decrypting as necessary)\n* download the latest manifest file (decrypting as necessary)\n* iterate over the contents of the manifest file and the filesystem together\n* compare the file names and metadata of each to determine if a file is new or modified\n* if the file appears to be new or modified, generate the hash of its content\n* check the bucket for this hash; if it isn't there, upload it\n* keep looping until both the manifest and the filesystem iterators are drained\n* as a final step, upload the new manifest file\n\nThe manifest that is generated is a full manifest of what is on the disk. In this way,\nwe only ever do incremental uploads/backups, but we always have a full manifest.\n\n### Restoring Content\n\nAs mentioned in the encryption section, restoring uses the identities for decrypting the data. The default location \nfor the identities file is:\n\n    ~/.s3bu/identities.txt\n\nIf it's there, the tool will find it and use it. If it's somewhere else, the location can be overridden on the command line.\n\nTo restore you will also need the passphrases used to encrypt the manifest file. The infrastructure project generates \nindividual secrets files with passphrases for each user, and one containing all passphrases intended for use by an\nadministrator.\n\nOnce all this is in place, run the restore with a command like this:\n\n    s3restore -p \u003cmy-aws-profile\u003e \u003cbackup-bucket-name\u003e \u003cmanifest-key\u003e \u003crestore-root\u003e [\u003cpattern\u003e]\n\nThe pattern is optional and is a regular expression used to match the file name. It defaults to '.*' to \nrestore everything in the manifest.\n\nAs an example of a selective restore, to restore everything in a manifest under a directory called \n'Projects/s3backup', you would run a command like this:\n\n    s3restore -p myprofilename backups.example.com manifests/test/local/test-local-2023-05-29-51748.csv.gz local '^Projects/s3backup/'\n\nThis would restore the files into the a directory called 'local' and preserve the full path to the file under this\nnew location.\n\nBy default, restore will not restore into a directory that isn't empty. To force it to do this, use the `-f` flag.\n\nIf is running in force mode, it won't overwrite any files that are already in the files system. To change this behaviour, \nuse the `-o` flag.\n\nThe restore operation is like this:\n\n* download the specified manifest file (decrypting as necessary)\n* loop through each entry in the manifest\n* compare the filename to the pattern, and if it matches, download it (decrypting as necessary)\n* set the permissions on the file to match those recorded in the manifest\n\n### Manual Downloading\n\nThere is a utility that will manually download any file you specify with a valid key and decrypt as\nit goes. To use it:\n\n    s3download -p myprofile mybucket full-key-to-file restore-directory\n\nBy default, it won't overwrite a file that already exists; use the '-o' flag to change that.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparlaynu%2Fs3backup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fparlaynu%2Fs3backup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fparlaynu%2Fs3backup/lists"}