Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rabbitmq/sysmon-handler

Simple OTP app for managing Erlang VM system_monitor event messages
https://github.com/rabbitmq/sysmon-handler

erlang metrics monitoring observability plugin rabbitmq

Last synced: 2 months ago
JSON representation

Simple OTP app for managing Erlang VM system_monitor event messages

Awesome Lists containing this project

README

        

sysmon_handler
==============

[![Build Status](https://travis-ci.com/rabbitmq/sysmon-handler.svg?branch=master)](https://travis-ci.com/rabbitmq/sysmon-handler)
[![Hex version](https://img.shields.io/hexpm/v/sysmon_handler.svg "Hex version")](https://hex.pm/packages/sysmon_handler)

`sysmon_handler` is an Erlang/OTP application that manages the event messages
that can be generated by the Erlang virtual machine's `system_monitor` BIF
(Built-In Function). These messages can notify a central data-gathering
process about the following events:

* Processes that have their private heaps grow beyond a certain size.
* Processes whose private heap garbage collection ops take too long
* Ports that are busy, e.g., blocking file & socket I/O
* Network distribution ports are busy, e.g., lots of communication
with a slow peer Erlang node.

The problem with `system_monitor` events is that there isn't a mechanism within
the Erlang virtual machine that limits the rate at which the events are
generated. A busy VM can easily create many hundreds of these messages per
second. Some kind of rate-limiting filter is required to avoid further
overloading a system that may already be overloaded.

This app will use two processes for `system_monitor` message handling.

1. A `gen_server` process to provide a rate-limiting filter.
1. A `gen_event` server to allow flexible, user-defined functions to
respond to `system_monitor` events that pass through the first stage
filter.

There can be only one `system_monitor` process
----------------------------------------------

The Erlang/OTP documentation is pretty clear on this point: only one process
can receive `system_monitor` messages. But using the `sysmon_handler` OTP app,
if multiple parties are interested in receiving `system_monitor` events, each
party can add an event handler to the `sysmon_handler` event handler.

The event handler process in this application uses the registered name
`sysmon_handler`. To add your handler, use something like:
`gen_event:add_sup_handler(sysmon_handler, yourModuleName,
YourInitialArgs)`.

See the [`gen_event` documentation for
`add_sup_event/3`](https://www.erlang.org/doc/man/gen_event.html#add_sup_handler-3)
for API details. See the example event handler module in the source repository,
`src/sysmon_handler_example_handler.erl`, for example usage.

Events sent to custom event handlers
------------------------------------

The following events can be sent from the `sysmon_handler`
filtering/rate-limiting process (a.k.a. `sysmon_handler_filter`) to the
event handler process (a.k.a. `sysmon_handler`).

* `{monitor, pid(), atom(), term()}` ... These are
`system_monitor` messages as they are received verbatim by the
`sysmon_handler_filter` process. See the reference documentation for
`erlang:system_monitor/2` for details.
* `{suppressed, proc_events | port_events, Num::integer()}` ... These
messages inform your event handler that `Num` events of a certain type
(`proc_events` or `port_events`) were suppressed in the last second
(i.e. their arrival rate exceeded the configured rate limit).

Change Log
----------

| Version | Changes
|----------|-----------------------------------------------------------
| `v1.2.0` | Do not report an error if Erlang distribution is stopped. Require Erlang 21.x.
| `v1.1.0` | Change the `heap_size` cuttlefish default to be 80MiB on 64-bit systems, which is in line with the default word count limit. Cuttlefish default for `garbage_collect` is changed to `50ms` which is in line with the default if unset. Cuttlefish default for `scheduled_execution` also changed to `50ms`, also in line with the default if unset
| `v1.0.0` | Initial release.