https://github.com/bjoerrrn/storj-system-health.sh
storagenode (storj) tool to inform operators almost immediately about fatal, audit or general errors via discord push message and email alerts.
https://github.com/bjoerrrn/storj-system-health.sh
discord-bot docker error-reporting healthcheck linux linux-shell mail-alerts storj storj-node
Last synced: 5 months ago
JSON representation
storagenode (storj) tool to inform operators almost immediately about fatal, audit or general errors via discord push message and email alerts.
- Host: GitHub
- URL: https://github.com/bjoerrrn/storj-system-health.sh
- Owner: bjoerrrn
- License: gpl-3.0
- Created: 2021-12-15T22:21:00.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2025-02-02T10:22:42.000Z (8 months ago)
- Last Synced: 2025-03-31T23:41:44.646Z (6 months ago)
- Topics: discord-bot, docker, error-reporting, healthcheck, linux, linux-shell, mail-alerts, storj, storj-node
- Language: Shell
- Homepage: https://discordapp.com/users/371404709262786561
- Size: 1.22 MB
- Stars: 11
- Watchers: 1
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# storj-system-health.sh
 
this linux/macos shell script checks, if a [storj node][storagenode] (from the [storj][storj] project) runs into errors and alerts the operator by discord push messages as well as emails. requires at least one [storj node][storagenode] running with [docker][docker] on linux.
## features
* multinode support π
* optionally discord (as quick notifications) and/or mail (with error details) alerts π₯ π
* alerts, in case: β οΈ
* audit, suspension and/or online scores are below a threshold (storj node disqualification risk)
* audit timeouts are recognized (pending audits; discqualification risk)
* audit time lags: download started vs. download finished is larger than 3 mins (storj node disqualification risk)
* a threshold of repair gets/puts and downloads/uploads are reached (storj node disqualification risk)
* there was no get/put at all in the last hour (storj node disqualification risk)
* any other fatal error occurs, incl. issues with docker stability
* storj node version is outdated
* the node is offline (docker container not started)
* reports: π°
* disk usage
* success rates audits, downloads, uploads, repair up-/downloads
* estimated payouts for today and current month
* todays upload and download statistics
* optimized for crontab and command line usage π»
* supports redirected logs [to a file][log_redirect]
* only requires [curl][curl], [jq][jq], [bc][bc] and (optionally) [swaks][swaks] to run π₯## optimzed / tested for
- debian bullseye π§
- macos monterey π ([jq][jq] + [swaks][swaks] installed with [brew][brew])## dependencies
- [storj node][storagenode] node up and running, within a
- [docker][docker] container
- [curl][curl] (http requests)
- [jq][jq] 1.6 β οΈ (JSON parsing)
- [bc][bc] (arbitrary precision calculator)
- [swaks][swaks] (mail sending, smtp)
- [discord.sh][discord.sh] (discord pushes)## setting up storj system health
1. optional: [setup a webhook][webhook] in the desired discord text channel
2. optional: grab your smtp email authentication data
3. download (or clone) a copy of `discord.sh` *
4. download (or clone) a copy of `storj-system-health.sh` and `storj-system-health.credo` **
5. optional: setup discord and mail variables in `storj-system-health.credo`
6. Go nuts π\* `wget https://raw.githubusercontent.com/ChaoticWeg/discord.sh/master/discord.sh`
\*\* `wget https://raw.githubusercontent.com/bjoerrrn/storj-system-health.sh/main/storj-system-health.sh && wget https://raw.githubusercontent.com/bjoerrrn/storj-system-health.sh/main/storj-system-health.credo`## setting up variables in *.credo
you will need to modify these variables in `*.credo` for your specific node and smtp mail server configuration. the `*.credo` file must not include comments and blank lines, the following description is just for your explanation:
```
## discord settings
DISCORDON=true. # enables (true) or disables (false) discord pushes
DISCORDURL=https://discord.com/api/webhooks/...
# your discord webhook url## mail settings
MAILON=true # enables (true) or disables (false) email messages
MAILFROM="" # your "from:" mail address
MAILTO="" # your "to:" mail address
MAILSERVER="" # your smtp server address
MAILUSER="" # your user name from smtp server
MAILPASS="" # your password from smtp server
MAILPORT="" # OPTIONAL smtp server port to be used
MAILENCRYPT="" # OPTIONAL encryption method to be used. Valid values are TLS, TLS-optional, TLS-optional-strict, TLS-on-connect.## alerting settings
SATPINGFREQ=3600 # in case satellite scores are below threshold,
# value in seconds, when next alert will be sent earliest
## storj node docker names and urls
NODES=storagenode # storage node names, multiple: separated with comma,
# e.g. storagenode,storagenode-a,storagenode-b
NODEURLS=localhost:14002
# storage node dashboard urls, multiple: separated with comma,
# e.g. localhost:14002,192.168.171.5:14002## node data mount points
MOUNTPOINTS=/mnt/node # your storage node mount point, multiple: separated with comma
# e.g. /mnt/node,/mnt/node-a,/mnt/node-b
# enter 'source' from the docker run command here## specify redirected logs per node
NODELOGPATHS=/ # put your relative path + log file name here,
# in case you've redirected your docker logs with
# e.g. config.yaml: 'log.output: "/app/config/node.log"'
# / -> for non-redirected logs
# /node.log -> for single node redirect
# /,/ -> for 2 node with non-redirected logs
# /node1.log,/node2.log -> for 2 nodes with redirects
# /node.log,/ -> only 1st is redirected
# /mnt/hdd1/node.log -> full path possible, too## log selection specifica - in alignment with cronjob settings
LOGMIN=60 # latest log horizon to have a detailled view on, in minutes
# -> change this, if your cronjob runs more often than 60m
LOGMAX=720 # larger log horizon for overall statistics, in minutes
```make sure, your script is executable by running the following command. add 'sudo' at the beginning, if admin privileges are required.
```
chmod u+x storj-system-health.sh # or:
sudo chmod u+x storj-system-health.shchmod u+x discord.sh # or:
sudo chmod u+x discord.sh
```## usage
you can run the script in debug mode to force a push message to your discord channel (if enabled) although no error was found - or without the debug flag to run it in silent mode via crontab (see automation chapter).
```
./storj-system-health.sh -d # for a regular discord push message or:
./storj-system-health.sh # for silent mode
```optionally you can pass another path to `*.credo`, in case it has another name or source:
```
./storj-system-health.sh -c /home/pi/anothername.credo
```in order to use the estimated payout information, which looks like so:
```
message: [sn1] : hdd 38.62% > OK 0.25$ / 11.77$
```
... you should set your crontab to be run around 23:55 UTC. You need to adjust the timing, if you have a couple of nodes and/or huge log files to be analysed: the script needs to be finished before the next full hour, ideally latest 23:59:59 UTC.it also supports a help command for further details:
```
./storj-system-health.sh -h
```## automation with crontab
to let the health check run automatically, hereβs a crontab example for linux, which runs the script each hour.
```
15,35,55 * * * * pi /home/pi/storj-system-health.sh -d > /dev/null
```for macos please be aware of the following specifics:
* use `crontab -e` and `crontab -l`, although it is depricated (for now it works)
* you do not have to use the user name, it's to be executed with the current user
* use full paths to your script and credo file
* find out your standard path with `echo Β§PATH` and set it in crontab
```
SHELL=/bin/sh
PATH="/opt/homebrew/opt/sqlite/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
# UNIX:
30 * * * * pi cd /home/pi/scripts/ && ./storj-system-health.sh
59 1 * * * pi cd /home/pi/scripts/ && ./storj-system-health.sh -Ed
# MACOS
# 30 * * * * /Users/me/storj-system-health.sh >> /Users/me/Desktop/checks.txt 2>&1
# 59 1 * * * /Users/me/storj-system-health.sh -Ed -c /Users/me/my.credo >> /Users/me/Desktop/checks.txt 2>&1
```## example screenshots
an "ok" message

a message saying, that there are fatal errors

another message saying, that there are general errors

satellite score issues

success rates per node

explanation:
```
(repair) downloads / (repair) uploads:
c = cancelled rate
f = failed rate
s = success rateaudits :
r = recoverable audit rate
c = critical audit fail rate
s = audit success rate
```## contributing
[issues](https://github.com/bjoerrrn/storj-system-health.sh/issues) and [pull requests](https://github.com/bjoerrrn/storj-system-health.sh/pulls) are welcome. for major changes, please open an [issue](https://github.com/bjoerrrn/storj-system-health.sh/issues) first to discuss what you would like to change.
if you want to contact me directly, feel free to do so via discord: https://discordapp.com/users/371404709262786561
## license
[GPL-3.0](https://www.gnu.org/licenses/gpl-3.0.en.html)
[discord.sh]: https://github.com/ChaoticWeg/discord.sh
[successrates.sh]: https://github.com/ReneSmeekes/storj_success_rate
[brew]: https://github.com/Homebrew/brew
[curl]: https://curl.haxx.se/
[bc]: https://www.gnu.org/software/bc/manual/html_mono/bc.html
[jq]: https://stedolan.github.io/jq/
[storj]: https://www.storj.io
[docker]: https://github.com/docker
[swaks]: https://github.com/jetmore/swaks
[storagenode]: https://www.storj.io/node
[log_redirect]: https://docs.storj.io/node/resources/faq/redirect-logs[webhook]: https://support.discordapp.com/hc/en-us/articles/228383668-Intro-to-Webhooks