Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/brendanhay/gamekeeper

Nagios monitoring and Ganglia/Graphite HTTP statistics aggregation for RabbitMQ
https://github.com/brendanhay/gamekeeper

Last synced: 2 months ago
JSON representation

Nagios monitoring and Ganglia/Graphite HTTP statistics aggregation for RabbitMQ

Awesome Lists containing this project

README

        

# gamekeeper

## Table of Contents

* [Introduction](#introduction)
* [Functionality](#functionality)
- [Measure](#measure)
- [Check](#check)
- [Prune](#prune)
* [Install](#install)
* [Configuration](#configuration)
* [Running](#running)
* [Contribute](#contribute)
* [Licence](#licence)

## Introduction

gamekeeper is a low resource application to perform multiple roles for
your RabbitMQ infrastructure:

* The ability to poll either a local or remote RabbitMQ HTTP API for metrics
which are then delivered to either Ganglia, Graphite, or stdout.
* It serves as a Nagios NPRE plugin endpoint for monitoring a node or
individual queues' health.
* Node management features such as pruning of idle connections and inactive queues.

## Functionality

gamekeeper has three modes of operation, each corresponding to a different
subset of functionality and accessible via the following subcommands:

### Measure

The `measure` subcommand will emit a series of metrics from the specified
`--uri` for a number of RabbitMQ/AMQP primitives.

All metrics are prefixed into sinks (Ganglia, Graphite, etc) with the
identifier: `.rabbit.`

> The `` constant is currently determined by escaping the local hostname, and will be configurable in a future release.

**Overview**

* `message.total`
* `message.ready`
* `message.unacked`
* `rate.publish`
* `rate.deliver`
* `rate.redeliver`
* `rate.confirm`
* `rate.ack`

**Connections**

* `connection.total`
* `connection.idle` - Calculated relative to the specified `--days` setting

**Channels**

* `channel.total`
* `channel.publisher` - Number of publisher/ingress channels
* `channel.consumer` - Number of consumer/egress channels
* `channel.duplex` - Number of channels marked as both publishing and consuming
* `channel.inactive`

**Exchanges**

* `exchange.rate.` - Message rate per exchange

**Queues**

* `queue.total`
* `queue.idle` - Determined by message residence and flow
* `queue.messages.` - Ready messages per queue
* `queue.consumers.` - Consumers per queue
* `queue.memory.` - Memory usage per queue
* `queue.ingress.` - Average message ingress per queue
* `queue.egress.` - Average message ingress per queue

**Bindings**

* `binding.total` - Overall number of AMQP bindings

The output sink can be configured to emit to `Stdout,,`,
`Ganglia,,`, or `Graphite,,` using the `--sink`
argument. The underlying [network-metrics](http://github.com/brendanhay/network-metrics) also
supports writing to `Statsd,,` but this is pointless, and not
recommended due to the fact the RabbitMQ management plugin performs pre-aggregation.

By default metrics will be printed to stdout.

> At time of writing [SoundCloud](http://www.soundcloud.com) emits all
> RabbitMQ metrics to Ganglia specifically

### Check

The `check` subcommand is used to perform a high-level inspection of
either the general node health, or a specific queue's health.

All output is to `stdout` in the
[Nagios NPRE Plugin](http://nagiosplug.sourceforge.net/developer-guidelines.html)
format.

**Node**

The Distributed Erlang [sname](http://www.erlang.org/doc/reference_manual/distributed.html) of the
RabbitMQ node needs to be specified via the `--name` argument, so gamekeeper can calculate the correct HTTP API uri to
request. For example, the node `rabbit@localhost` would result in HTTP requests to `http://localhost:15672/#/nodes/rabbit%40localhost`

Warning and critical levels can be specified for both message residence
and memory usage. A single check is performed and the output is combined.

A warning or critical for either memory residence or memory usage will
result in the most severe being used as the NPRE exit code and one line
summary.

> The memory usage warning and critical levels are specified in Gigabyte units

**Queue**

Queue checks are the same as the node level check, but local to a specifically
named queue.

> The memory usage warning and critical levels are specified in Megabyte units

### Prune

The `prune` subcommand is used via a manual invocation of gamekeeper and is
used to remove (via HTTP DELETE) idle connections and unused queues.

This is primarily useful if you do not use AMQP heartbeats and have problems
with dangling load-balancer connections through something like LVS or HAProxy.

> These commands are destruction, please use caution!

## Install

At present, it is assumed the user knows some of the Haskell eco system and
in particular wrangling cabal-dev to obtain dependencies. I plan to offer pre-built binaries for x86_64 OSX and Linux in future.

You will need reasonably new versions of GHC and the Haskell Platform which
you can obtain [here](http://www.haskell.org/platform/), then run `make install` in the root directory to compile gamekeeper.

There is also a Chef Cookbook which can be used to manage gamekeeper, if that's how you swing: https://github.com/brendanhay/gamekeeper-cookbook

## Configuration

Command line flags are used to configure gamekeeper, you can access help for
the top-level program and various subcommands via the `--help` switch.

### Available Flags


Command
Flag
Default
Format
About


measure
--uri
guest@localhost:15672
URI
Address of the RabbitMQ API to poll


--days
30
INT
Number of days before a conncetion is considered stale


--sink
Stdout,,
SINK,HOST,PORT
Sink options describing the type and host/port combination


check node
--name

STR
An Erlang atom represent the RabbitMQ node name


--uri
guest@localhost:15672
URI
Address of the RabbitMQ API to poll


--messages
15000000,30000000
WARN,CRIT
Message residence thresholds


--memory
4,8
WARN,CRIT
Memory thresholds, in Gigabytes


check queue
--name

STR
The name of the queue to check


--uri
guest@localhost:15672
URI
Address of the RabbitMQ API to poll


--messages
125000,250000
WARN,CRIT
Message residence thresholds


--memory
250,500
WARN,CRIT
Memory thresholds, in Megabytes


prune connections
--uri
guest@localhost:15672
URI
Address of the RabbitMQ API to poll


--days
30
INT
Number of days before a connection is considered idle


prune queues
--uri
guest@localhost:15672
URI
Address of the RabbitMQ API to poll

> There is also a `--verbose` switch which is useful when debugging metric emission to stdout

## Running

After a successful compile, the `./gamekeeper` symlink will be pointing to
the built binary under `./dist`

## Contribute

For any problems, comments or feedback please create an issue [here on GitHub](github.com/brendanhay/gamekeeper/issues).

## Licence

gamekeeper is released under the [Mozilla Public License Version 2.0](http://www.mozilla.org/MPL/)