https://github.com/satta/balboa-backend-accumulo

🛑 Experimental balboa backend for Accumulo
https://github.com/satta/balboa-backend-accumulo

accumulo balboa java passive-dns passivedns pdns

Last synced: 6 months ago
JSON representation

🛑 Experimental balboa backend for Accumulo

Host: GitHub
URL: https://github.com/satta/balboa-backend-accumulo
Owner: satta
License: mit
Created: 2020-08-23T13:23:30.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-10-28T13:48:02.000Z (almost 5 years ago)
Last Synced: 2025-02-07T19:49:47.904Z (8 months ago)
Topics: accumulo, balboa, java, passive-dns, passivedns, pdns
Language: Java
Homepage:
Size: 15.6 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: License.txt

Awesome Lists containing this project

README

          # balboa-backend-accumulo

![Java CI](https://github.com/satta/balboa-backend-accumulo/workflows/Java%20CI/badge.svg)

This is a backend for [balboa](https://github.com/DCSO/balboa) that uses

[Apache Accumulo](https://accumulo.apache.org/) as a storage and query engine.

It is quite basic in its feature set and should be considered a starting point or

building block in a more refined setup, most likely involving multiple input

consumer frontends feeding into multiple backend instances, all connecting to one

Accumulo cluster.

## Requirements

 * JDK 8 or later

 * [balboa-backend-java](https://github.com/satta/balboa-backend-java) ([Maven Central](https://search.maven.org/artifact/com.github.satta/balboa-backend-java))

 * accumulo-core API 2.0

 * commons-cli

## Building

A self-contained jar can be built, in the source directory, like this:

```

$ mvn package

```

This should leave a `balboa-backend-accumulo--jar-with-dependencies.jar`

in the `target/` subdirectory. Dependencies will be fetched automatically from

Maven Central.

## Configuration

The jar takes a `-c` command line parameter specifying the path to a

properties file, which needs to contain at least the necessary [Accumulo client

properties](https://accumulo.apache.org/docs/2.x/configuration/client-properties)

needed to connect to the cluster. For example, a simple development setup using

[Uno](https://github.com/apache/fluo-uno) could be accessed with something along the lines of:

```

instance.name=uno

instance.zookeepers=uno

auth.type=password

auth.principal=satta

auth.token=satta

balboa.port=4242

```

The `balboa.port` property defines the local port listened on for msgpack TCP connection

from frontends.

## Accumulo setup

The observation data are stored in three tables, optimized for `rrname`, `rdata` and

reverse `rrname` look-ups (used for suffix queries). We store observations redundantly

reduce the number of indirections.

Please make sure these tables are present and read/writable for the user specified

in the connection details.

### Table `balboa_by_rrname`

| Row ID                       | Column Family | Column Qualifier | Visibility | Value       |

|------------------------------|---------------|------------------|------------|-------------|

| rrname-rsensorid-data-rrtype | count         | count            | public     | LONG VARLEN |

| rrname-rsensorid-data-rrtype | seen          | first            | public     | LONG VARLEN |

| rrname-rsensorid-data-rrtype | seen          | last             | public     | LONG VARLEN |

We use various combiners to aggregate identical observations:

```

setiter -class org.apache.accumulo.core.iterators.user.MaxCombiner -p 11 -t balboa_by_rrname -all      # on seen:last

setiter -class org.apache.accumulo.core.iterators.user.MinCombiner -p 13 -t balboa_by_rrname -all      # on seen:first

setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 12 -t balboa_by_rrname -all  # on count:count

```

These need to be set on the following  other tables as well:

### Table `balboa_by_rdata`

| Row ID                       | Column Family | Column Qualifier | Visibility | Value       |

|------------------------------|---------------|------------------|------------|-------------|

| rdata-sensorid-rrname-rrtype | count         | count            | public     | LONG VARLEN |

| rdata-sensorid-rrname-rrtype | seen          | first            | public     | LONG VARLEN |

| rdata-sensorid-rrname-rrtype | seen          | last             | public     | LONG VARLEN |

### Table `balboa_by_rrname_rev`

| Row ID                            | Column Family | Column Qualifier | Visibility | Value       |

|-----------------------------------|---------------|------------------|------------|-------------|

| rev(rrname)-sensorid-rdata-rrtype | count         | count            | public     | LONG VARLEN |

| rev(rrname)-sensorid-rdata-rrtype | seen          | first            | public     | LONG VARLEN |

| rev(rrname)-sensorid-rdata-rrtype | seen          | last             | public     | LONG VARLEN |

## Example run

This example run uses balboa's `balboa-backend-console` to directly talk to the

backend rather than having to go through the GraphQL frontend.

`rrname` full query:

```

$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r dns.google | head -n 1 | jq

{

  "rrname": "dns.google",

  "rrtype": "A",

  "sensor_id": "foo",

  "rdata": "8.8.4.4",

  "count": 1,

  "first_seen": 1598303837,

  "last_seen": 1598303897

}

```

`rrname` suffix query:

```

$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r %.com.de | head -n 1 | jq

{

  "rrname": "www.jabra.com.de",

  "rrtype": "A",

  "sensor_id": "foo",

  "rdata": "152.199.21.175",

  "count": 1,

  "first_seen": 1603348710,

  "last_seen": 1603348770

}

```

`rdata` query:

```

$ balboa-backend-console query -h 127.0.0.1 -p 4242 -d 9.9.9.10 | jq

{

  "rrname": "dns10.quad9.net",

  "rrtype": "A",

  "sensor_id": "foo",

  "rdata": "9.9.9.10",

  "count": 1,

  "first_seen": 1603892361,

  "last_seen": 1603892421

}

```

## PoC limitations

 * Hard-coded table names and `public` visibility

 * Wildcard support limited to `rrname` queries

 * For `rrname` queries, additional `rdata` and `sensorid` constraints will be matched anywhere in the row

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/satta/balboa-backend-accumulo

Awesome Lists containing this project

README