https://github.com/susji/lilmon
li'l monitor
https://github.com/susji/lilmon
lilmon monitoring monitoring-application time-series
Last synced: about 1 year ago
JSON representation
li'l monitor
- Host: GitHub
- URL: https://github.com/susji/lilmon
- Owner: susji
- License: gpl-3.0
- Created: 2022-07-21T18:14:45.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-06-26T19:34:38.000Z (almost 2 years ago)
- Last Synced: 2025-02-07T19:13:47.388Z (over 1 year ago)
- Topics: lilmon, monitoring, monitoring-application, time-series
- Language: Go
- Homepage:
- Size: 2.11 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# lilmon
## What is it?
lilmon is a small program for collecting numeric values on your UNIX-like system
and displaying them as time series in a browser view. If you stretch the
definition a bit, it is a minimalistic monitoring tool.
lilmon is currently very experimental software.
## Why does it exist?
I needed a small monitoring tool for my own use. I use it to monitor individual
UNIX-like hosts. I'm looking at parameters like available diskspace, CPU load,
network performance, host temperatures, connection quality and so on. There are
very good and professional tools for this purpose, but I wanted something small
and simple that requires very little setup.
## What does it do?
lilmon has two modes of operation.
In the first mode, `measure`, it periodically executes commands for its metrics
and gathers their numeric values into a SQLite database.
In the second mode, `serve`, it displays the recorded values with a dynamic HTML
page.
## What does lilmon measure?
lilmon measures numeric values. To do this, lilmon is given a set of *metrics*.
Each metric has a
- name
- description
- graphing options
- command
lilmon uses raw shell-commands to obtain these numeric values for each specified
metric. Commands are shell-expanded like
$ /bin/sh -c ''
and as a result, lilmon expects to receive a single value back via `stdout`. The
value is interpreted as a `float64`. Integers are also fine. Whitespace is
trimmed before any interpretation is attempted.
A minimalistic example of a metric command would then be `echo 123` which would
result in a static value of `123` on each measurement.
## How does it look like?
The graphs are drawn using `gonum.org/v1/plot`. Currently lilmon produces graphs
like this:

## How do you configure lilmon?
See [the example file](lilmon.ini.example) for inspiration. Each definition of a
metric consists of the following four fields:
|||
The shell command may contain `|` characters -- it will not affect configuration
parsing.
`` may contain the following `,` separated parameters:
- `deriv`: The time series is numerically differentiated with respect to time
- `no_ds`: The time series is not downsampled at all
- `y_min=`: Graph's minimum Y value
- `y_max=`: Graph's maximum Y value
- `kibi` and `kilo`: Y values are rendered with unit prefixes in base-2 or base-10, respectively
`deriv` is useful if your metric is, for example, measuring transmitted or
received bytes for a network interface. By using `deriv`, the UI will then
display transfer rates (bytes/second) instead of bytes.
`no_ds` may be useful if you want to produce an exact averaging of some metric's
data. For example, if your data may contain abrupt changes in individual
measurements and you want to be sure they are included when the time series is
being binned, then you should enable `no_ds`. Do note that this makes the
generation of the specific graph considerably slower.
`kibi` and `kilo` will make larger values much more easier to read.
### What if my metric command contains `;`?
This will be a problem for the configuration parser because it assumes that a
lonesome `;` begins a comment. To avoid this, quote your metric definition like
this:
```
metric="n_subshell_constant|Plain silly||{ echo -n \"one\"; echo -n two; }|wc -c"
```
## Show me some example metrics!
These are some metrics I use. They may fail in cases I have not thought about.
There are many ways to obtain similar results. The primary reason for including
these here is to give you inspiration on how to use lilmon.
As is stressed elsewhere in this README, to be safer, avoid running any metrics
as a privileged user. If greater privileges are required, use something which
selectively gives you just enough capabilities to measure your value.
### TX & RX speed of some interface
Use the correct interface name in place of `if-name`. If you need to measure
than one interface, define more similar metrics with different metric name and
`if-name`. Silly but simple!
Also, note the `deriv` in the graphing options, which means that the raw byte
counts are numerically differentiated when the graph is drawn. The result is
then a decent approximation of TX & RX speed.
#### Linux
```
metric=bytes_wifi_rx|Wifi RX|y_min=0,deriv,kilo|cat /proc/net/dev|fgrep if-name|awk '{print $2}'
metric=bytes_wifi_tx|Wifi TX|y_min=0,deriv,kilo|cat /proc/net/dev|fgrep if-name|awk '{print $10}'
```
#### OpenBSD
```
metric=bytes_wired_rx|Wired RX|deriv,y_min=0,kilo|netstat -n -i -b|fgrep if-name|fgrep Link|awk '{print $5}'
metric=bytes_wired_tx|Wired TX|deriv,y_min=0,kilo|netstat -n -i -b|fgrep if-name|fgrep Link|awk '{print $6}'
```
### Temperature sensor
#### Linux
The example here makes use of `jq` to search the JSON dump produced by `sensors
-j`. See what `sensors -j` displays for you and accomodate the `jq` filter. You
may of course produce the same sensor value by just parsing and filtering the
regular text dump.
```
metric=temp_cpu|CPU temperature|y_min=30,y_max=90|sensors -j|jq '.["dev::temp1::temp1_input"]'
```
#### OpenBSD
Look at the output of `sysctl hw.sensors` and figure out the the exact path for
your device. If it has something other than a raw float value, filter the rest
out.
```
metric=cpu_temp|CPU temperature|y_min=40,y_max=90|sysctl hw.sensors.km0.temp0|cut -d '=' -f2|cut -d ' ' -f 1
```
### Ping round-trip time for a well-known target
#### Linux
Note that in this example we use the `-w 10` option to define a hard deadline of
10 seconds. This is not fully portable, so see your `man 8 ping` for more
details. Something like the `timeout` command is available on many platforms,
and it works well for making sure programs time out.
```
metric=ping_google|PING Google|y_min=0,kilo|ping -q -w 10 -c 2 8.8.8.8|tail -1|cut -d'=' -f2|cut -d '/' -f2
```
### System load (1 min)
#### OpenBSD
```
metric=load_1|1 minute CPU LOAD|y_min=0|uptime|grep -E -o 'averages: [\.0-9]+'|cut -d ' ' -f2
```
### Free memory
#### OpenBSD
For an example's sake, we go through some trouble to dig out some bytes. Perhaps
we are lucky and `top` always prints megabytes?
```
metric=free_mem|Free memory|y_min=0,kilo|echo $((1024 * 1024 * $(top -b|egrep -o 'Free: [0-9]+'|cut -d ' ' -f2)))
```
### Wi-Fi clients (stations) connected to hostapd
Note that this invocation probably requires privileged execution. See the
discussion regarding `sudo` and `doas`. This command also assumes that there is
only one station or that the station of interest has index `0`.
```
metric=n_sta|hostapd clients|y_min=0|sudo /usr/sbin/hostapd_cli status|fgrep 'num_sta[0]'|cut -d '=' -f 2
```
## Does lilmon do alerting?
No. Its intended purpose is to record numeric values and display them with a
bare bones UI. However, as everything is recorded into a SQLite database, a
different program can easily follow the metrics and do alerting based on that.
For details, see the next question.
## How do I access the values lilmon has gathered?
First make sure you have `sqlite3` installed. Then you can do something like
the following to get the 10 latest measurements for metric `NAME`.
$ sqlite3 'file:/var/lilmon/db/lilmon.sqlite?mode=ro' \
'SELECT * FROM lilmon_metric_NAME ORDER BY timestamp DESC LIMIT 10'
Note the `mode=ro` part for read-only.
## Will lilmon have a configuration UI?
No.
## The graphs look terrible!
~~Yes. I'll probably make them less terrible in future.~~
Much better now, right?
## The graphs look too terrible!
This is enough for me. However, `measure` and `serve` are fairly well decoupled
so an alternative UI is quite easy to build based on the data gathered by
`measure`.
## `lilmon serve` refuses to start and says it cannot open the database!
Are you running `serve` without a pre-existing database? In that case, you will
have to start `lilmon measure` first because it will create the missing
database.
## How to proceed after changing the metrics in the configuration file?
Restart both processes but do restart `lilmon measure` first. It is responsible
for creating new database tables and their indexes for new or renamed metrics.
## Will lilmon support monitoring more than one machine?
As all lilmon metrics are just columns in a SQLite table, they can be
transferred outside their host of origin with relative ease. It's just not
something I'm especially interested in.
## How to run measurement commands which require privileged execution?
It is **not** required or recommended to run lilmon as a privileged user.
For the measure mode, it is wiser to make use of `sudo`, `doas`, or something
similar with limited capabilities to obtain privileged metrics. You would then
run these exactly as the non-privileged ones, except through `doas` like here:
```
[metrics]
metric=n_id_chars|Characters output by privileged id|y_min=0|doas /usr/bin/id|wc -c
```
### doas
With `doas` you may permit the `lilmon` user to run `/usr/bin/id` without any
arguments as `root` like this:
```doas
permit nopass lilmon as root cmd /usr/bin/id args
```
Do note that the lone keyword `args` suffixed to the command means only
execution without any arguments.
### sudo
`sudo` also permits specifying that the command can only be run without any
arguments, however as `sudo` also supports very rich logic for specifying how
and what commands can be run, please exercise caution, stick to simple
definitions, and give the relevant manuals a careful read. In this example, the
user `lilmon` is permitted to run `/usr/bin/id` without any arguments as `root`
and without a password on all hosts where the configuration file is active:
```sudo
lilmon ALL=(root) NOPASSWD: /usr/bin/id ""
```
Do note that if you do not specify the `""` suffix, then the above invocation
would permit `lilmon` to execute `/usr/bin/id` with arbitrary arguments.
## Can you edit the browser UI?
Yes, just use [the example as basis](lilmon.template.example) and have at it.
## What is required to run lilmon?
**NOTE**: lilmon is currently **very experimental software** and it is not yet
packaged in any reasonable manner. Your usage experience will be mildly tedious.
Before trying to perform an install with the attached `Makefile`, convince
yourself that it is doing the right thing. At this stage, performing a manual
install may be a better idea.
The installation is for the most part condensed into `make install`, but the
creation of the non-privileged user is platform-dependent. We also must give
that user a chance to write its database in the directory. For GNU/Linux it
looks like this
```
# make install
# adduser --disabled-login --system --no-create-home --group lilmon
# chown lilmon:lilmon /var/lilmon/db
# sudo -u lilmon /usr/local/bin/lilmon measure
# sudo -u lilmon /usr/local/bin/lilmon serve
```
When you are starting lilmon fresh without a pre-existing database, the first
run of `lilmon measure` will create it. As `lilmon serve` opens the database in
a read-only mode, it cannot initialize the database. Thus make sure have
successfully ran `measure` at least once before running `serve`.
Also note that by default `lilmon serve` listens only on localhost. You may want
to set the listening adress to something else such as a suitable interface's IP.
If you want it to listen on all interfaces, use `0.0.0.0:15515` but please do
not expose the lilmon browser view to any untrusted networks. As suggested
below, you may in any case wish to provide the actual access via a suitable
reverse proxy.
## Do I need timeouts for my commands?
It does not hurt, but lilmon tries to cancel measurement commands which take
`$TOO_LONG` to complete. See `metrics.go` for the details.
## What about TLS, rate limiting, authentication...?
I strongly recommend a reverse proxy for handling these things.
## How does lilmon treat the data when time series are produced?
### Short answer
It produces an averaged view which may contain some quantitative accuracy.
### Long answer
There are two basic steps: obtaining the `(timestamp, value)` pairs from the
database and producing a binned view on them. I'm guessing there is a smarter
way to achieve the same result with some SQL wizardry.
#### Random sampling of measurements
lilmon automatically does downsampling when it thinks that the query may result
in a large amount of samples. Here we make two assumptions:
1. Measurements are evenly distributed
2. Neglecting individual samples at random is OK
In practice we use SQLite's `RANDOM()` to produce a coinflip when samples are
`SELECT`ed from the metric tables.
The behavior can be turned off for individual metrics with the `no_ds` graphing
option and the global behavior may be adjusted with the `downsampling_scale`
option. The greater the value is, the less effect downsampling has. For details,
see `db.go`.
#### Averaging of samples to individual bins
Each lilmon graph contains some amount of bins. The exact amount is defined by
three variables:
1. graph time range (from user)
2. bin width (from configuration)
3. maximum amount of bins (from configuration)
In the first step, we collected a bunch of samples and the here in the second
step we distribute them among the bins. The resulting bin value is then an
average of the all the values placed in the bin. For details, see `graph.go`.
## Known limitations
- If a metric is disabled by removing it from the configuration file, its
historical data will not be automatically pruned after the retention period
## TODO
- [ ] support units for smart Y labels (eg. "bytes")
- [ ] some end-to-end testing for `serve`