{"id":13658211,"url":"https://github.com/karlicoss/arctee","last_synced_at":"2025-07-20T10:34:28.496Z","repository":{"id":53001884,"uuid":"231008026","full_name":"karlicoss/arctee","owner":"karlicoss","description":"Atomic tee","archived":false,"fork":false,"pushed_at":"2024-08-10T15:04:24.000Z","size":33,"stargazers_count":36,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-10T00:52:45.053Z","etag":null,"topics":["backup","data-liberation","export"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/karlicoss.png","metadata":{"files":{"readme":"README.org","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-31T02:00:16.000Z","updated_at":"2025-05-25T11:53:43.000Z","dependencies_parsed_at":"2022-08-23T23:10:34.858Z","dependency_job_id":null,"html_url":"https://github.com/karlicoss/arctee","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/karlicoss/arctee","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karlicoss%2Farctee","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karlicoss%2Farctee/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karlicoss%2Farctee/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karlicoss%2Farctee/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/karlicoss","download_url":"https://codeload.github.com/karlicoss/arctee/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karlicoss%2Farctee/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266111439,"owners_count":23877980,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backup","data-liberation","export"],"created_at":"2024-08-02T05:00:57.612Z","updated_at":"2025-07-20T10:34:28.457Z","avatar_url":"https://github.com/karlicoss.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"#+EXPORT_EXCLUDE_TAGS: noexport\n\n#+begin_src python :exports output :results replace raw\nimport arctee \nreturn arctee.__doc__\n#+end_src\n\n#+RESULTS:\n\nHelper script to run your data exports.\nIt works kind of like [[https://en.wikipedia.org/wiki/Tee_(command)][*tee* command]], but:\n\n- *a*: writes output atomically\n- *r*: supports retrying command\n- *c*: supports compressing output\n\nYou can read more on how it's used [[https://beepb00p.xyz/exports.html#arctee][here]].\n\n* Motivation\nMany things are very common to all data exports, regardless of the source.\nIn the vast majority of cases, you want to fetch some data, save it in a file (e.g. JSON) along with a timestamp and potentially compress.\n\nThis script aims to minimize the common boilerplate:\n\n- =path= argument allows easy ISO8601 timestamping and guarantees atomic writing, so you'd never end up with corrupted exports.\n- =--compression= allows to compress simply by passing the extension. No more =tar -zcvf=!\n- =--retries= allows easy exponential backoff in case service you're querying is flaky.\n\nExample:\n\n: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py\n\n1. runs =/soft/export/rememberthemilk.py=, retrying it up to three times if it fails\n\n   The script is expected to dump its result in stdout; stderr is simply passed through.\n2. once the data is fetched it's compressed as =zstd=\n3. timestamp is computed and compressed data is written to =/exports/rtm/20200102T170015Z.ical.zstd=\n\n* Do you really need a special script for that?\n\n- why not use =date= command for timestamps?\n\n  passing =$(date -Iseconds --utc).json= as =path= works, however I need it for *most* of my exports; so it ends up polluting my crontabs.\n\nNext, I want to do several things one after another here.\nThat sounds like a perfect candidate for *pipes*, right?\nSadly, there are serious caveats:\n\n- *pipe errors don't propagate*. If one parts of your pipe fail, it doesn't fail everything\n\n  That's a major problem that often leads to unexpected behaviours.\n\n  In bash you can fix this by setting =set -o pipefail=. However:\n\n  - default cron shell is =/bin/sh=. Ok, you can change it to ~SHELL=/bin/bash~, but\n  - you can't set it to =/bin/bash -o pipefail=\n\n    You'd have to prepend all of your pipes with =set -o pipefail=, which is quite boilerplaty\n\n- you can't use pipes for *retrying*; you need some wrapper script anyway\n\n  E.g. similar to how you need a wrapper script when you want to stop your program on timeout.\n\n- it's possible to use pipes for atomically writing output to a file, however I haven't found any existing tools to do that\n\n  E.g. I want something like =curl https://some.api/get-data | tee --atomic /path/to/data.sjon=.\n\n  If you know any existing tool please let me know!\n\n- it's possible to pipe compression\n\n  However due to the above concerns (timestamping/retrying/atomic writing), it has to be part of the script as well.\n\nIt feels that cron isn't a suitable tool for my needs due to pipe handling and the need for retries, however I haven't found a better alternative.\nIf you think any of these things can be simplified, I'd be happy to know and remove them in favor of more standard solutions!\n\n* Installation\n\nThis can be installed with pip by running: =pip3 install --user git+https://github.com/karlicoss/arctee=\n\nYou can also manually install this by installing =atomicwrites= (=pip3 install atomicwrites=) and downloading and running =arctee.py= directly\n\n** Optional Dependencies\n- =pip3 install --user backoff=\n\n  [[https://github.com/litl/backoff][backoff]] is a library to simplify backoff and retrying. Only necessary if you want to use --retries--.\n- =apt install atool=\n\n  [[https://www.nongnu.org/atool][atool]] is a tool to create archives in any format. Only necessary if you want to use compression.\n\n# end of autogenerated stuff\n\n* Usage\n\n#+begin_src sh :results output :exports output\narctee --help\n#+end_src\n\n# TODO ugh. seems that github chokes over #+RESULT: here\n#+begin_example\nusage: arctee [-h] [-r RETRIES] [-c COMPRESSION] path\n\nWrapper for automating boilerplate for reliable and regular data exports.\n\nExample: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py --user \"user@email.com\"\n\nArguments past '--' are the actuall command to run.\n\npositional arguments:\n  path                  Path with borg-style placeholders. Supported: {utcnow}, {hostname}, {platform}.\n\n                        Example: '/exports/pocket/pocket_{utcnow}.json'\n\n                        (see https://manpages.debian.org/testing/borgbackup/borg-placeholders.1.en.html)\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -r RETRIES, --retries RETRIES\n                        Total number of tries, 1 (default) means only try once. Uses exponential backoff.\n  -c COMPRESSION, --compression COMPRESSION\n                        Set compression format.\n\n                        See 'man apack' for list of supported formats. In addition, 'zstd' is also supported.\n#+end_example\n\n* TODOs                                                            :noexport:\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarlicoss%2Farctee","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkarlicoss%2Farctee","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarlicoss%2Farctee/lists"}