Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/karlicoss/arctee
Atomic tee
https://github.com/karlicoss/arctee
backup data-liberation export
Last synced: about 2 months ago
JSON representation
Atomic tee
- Host: GitHub
- URL: https://github.com/karlicoss/arctee
- Owner: karlicoss
- Created: 2019-12-31T02:00:16.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-08-10T15:04:24.000Z (4 months ago)
- Last Synced: 2024-08-10T16:23:22.653Z (4 months ago)
- Topics: backup, data-liberation, export
- Language: Python
- Homepage:
- Size: 32.2 KB
- Stars: 28
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
#+EXPORT_EXCLUDE_TAGS: noexport
#+begin_src python :exports output :results replace raw
import arctee
return arctee.__doc__
#+end_src#+RESULTS:
Helper script to run your data exports.
It works kind of like [[https://en.wikipedia.org/wiki/Tee_(command)][*tee* command]], but:- *a*: writes output atomically
- *r*: supports retrying command
- *c*: supports compressing outputYou can read more on how it's used [[https://beepb00p.xyz/exports.html#arctee][here]].
* Motivation
Many things are very common to all data exports, regardless of the source.
In the vast majority of cases, you want to fetch some data, save it in a file (e.g. JSON) along with a timestamp and potentially compress.This script aims to minimize the common boilerplate:
- =path= argument allows easy ISO8601 timestamping and guarantees atomic writing, so you'd never end up with corrupted exports.
- =--compression= allows to compress simply by passing the extension. No more =tar -zcvf=!
- =--retries= allows easy exponential backoff in case service you're querying is flaky.Example:
: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py
1. runs =/soft/export/rememberthemilk.py=, retrying it up to three times if it fails
The script is expected to dump its result in stdout; stderr is simply passed through.
2. once the data is fetched it's compressed as =zstd=
3. timestamp is computed and compressed data is written to =/exports/rtm/20200102T170015Z.ical.zstd=* Do you really need a special script for that?
- why not use =date= command for timestamps?
passing =$(date -Iseconds --utc).json= as =path= works, however I need it for *most* of my exports; so it ends up polluting my crontabs.
Next, I want to do several things one after another here.
That sounds like a perfect candidate for *pipes*, right?
Sadly, there are serious caveats:- *pipe errors don't propagate*. If one parts of your pipe fail, it doesn't fail everything
That's a major problem that often leads to unexpected behaviours.
In bash you can fix this by setting =set -o pipefail=. However:
- default cron shell is =/bin/sh=. Ok, you can change it to ~SHELL=/bin/bash~, but
- you can't set it to =/bin/bash -o pipefail=You'd have to prepend all of your pipes with =set -o pipefail=, which is quite boilerplaty
- you can't use pipes for *retrying*; you need some wrapper script anyway
E.g. similar to how you need a wrapper script when you want to stop your program on timeout.
- it's possible to use pipes for atomically writing output to a file, however I haven't found any existing tools to do that
E.g. I want something like =curl https://some.api/get-data | tee --atomic /path/to/data.sjon=.
If you know any existing tool please let me know!
- it's possible to pipe compression
However due to the above concerns (timestamping/retrying/atomic writing), it has to be part of the script as well.
It feels that cron isn't a suitable tool for my needs due to pipe handling and the need for retries, however I haven't found a better alternative.
If you think any of these things can be simplified, I'd be happy to know and remove them in favor of more standard solutions!* Installation
This can be installed with pip by running: =pip3 install --user git+https://github.com/karlicoss/arctee=
You can also manually install this by installing =atomicwrites= (=pip3 install atomicwrites=) and downloading and running =arctee.py= directly
** Optional Dependencies
- =pip3 install --user backoff=[[https://github.com/litl/backoff][backoff]] is a library to simplify backoff and retrying. Only necessary if you want to use --retries--.
- =apt install atool=[[https://www.nongnu.org/atool][atool]] is a tool to create archives in any format. Only necessary if you want to use compression.
# end of autogenerated stuff
* Usage
#+begin_src sh :results output :exports output
arctee --help
#+end_src# TODO ugh. seems that github chokes over #+RESULT: here
#+begin_example
usage: arctee [-h] [-r RETRIES] [-c COMPRESSION] pathWrapper for automating boilerplate for reliable and regular data exports.
Example: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py --user "[email protected]"
Arguments past '--' are the actuall command to run.
positional arguments:
path Path with borg-style placeholders. Supported: {utcnow}, {hostname}, {platform}.Example: '/exports/pocket/pocket_{utcnow}.json'
(see https://manpages.debian.org/testing/borgbackup/borg-placeholders.1.en.html)
optional arguments:
-h, --help show this help message and exit
-r RETRIES, --retries RETRIES
Total number of tries, 1 (default) means only try once. Uses exponential backoff.
-c COMPRESSION, --compression COMPRESSION
Set compression format.See 'man apack' for list of supported formats. In addition, 'zstd' is also supported.
#+end_example* TODOs :noexport: