https://github.com/satta/balboa-backend-accumulo
🛑 Experimental balboa backend for Accumulo
https://github.com/satta/balboa-backend-accumulo
accumulo balboa java passive-dns passivedns pdns
Last synced: about 2 months ago
JSON representation
🛑 Experimental balboa backend for Accumulo
- Host: GitHub
- URL: https://github.com/satta/balboa-backend-accumulo
- Owner: satta
- License: mit
- Created: 2020-08-23T13:23:30.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-28T13:48:02.000Z (over 4 years ago)
- Last Synced: 2025-02-07T19:49:47.904Z (4 months ago)
- Topics: accumulo, balboa, java, passive-dns, passivedns, pdns
- Language: Java
- Homepage:
- Size: 15.6 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: License.txt
Awesome Lists containing this project
README
# balboa-backend-accumulo

This is a backend for [balboa](https://github.com/DCSO/balboa) that uses
[Apache Accumulo](https://accumulo.apache.org/) as a storage and query engine.
It is quite basic in its feature set and should be considered a starting point or
building block in a more refined setup, most likely involving multiple input
consumer frontends feeding into multiple backend instances, all connecting to one
Accumulo cluster.## Requirements
* JDK 8 or later
* [balboa-backend-java](https://github.com/satta/balboa-backend-java) ([Maven Central](https://search.maven.org/artifact/com.github.satta/balboa-backend-java))
* accumulo-core API 2.0
* commons-cli## Building
A self-contained jar can be built, in the source directory, like this:
```
$ mvn package
```This should leave a `balboa-backend-accumulo--jar-with-dependencies.jar`
in the `target/` subdirectory. Dependencies will be fetched automatically from
Maven Central.## Configuration
The jar takes a `-c` command line parameter specifying the path to a
properties file, which needs to contain at least the necessary [Accumulo client
properties](https://accumulo.apache.org/docs/2.x/configuration/client-properties)
needed to connect to the cluster. For example, a simple development setup using
[Uno](https://github.com/apache/fluo-uno) could be accessed with something along the lines of:```
instance.name=uno
instance.zookeepers=uno
auth.type=password
auth.principal=satta
auth.token=sattabalboa.port=4242
```The `balboa.port` property defines the local port listened on for msgpack TCP connection
from frontends.## Accumulo setup
The observation data are stored in three tables, optimized for `rrname`, `rdata` and
reverse `rrname` look-ups (used for suffix queries). We store observations redundantly
reduce the number of indirections.Please make sure these tables are present and read/writable for the user specified
in the connection details.### Table `balboa_by_rrname`
| Row ID | Column Family | Column Qualifier | Visibility | Value |
|------------------------------|---------------|------------------|------------|-------------|
| rrname-rsensorid-data-rrtype | count | count | public | LONG VARLEN |
| rrname-rsensorid-data-rrtype | seen | first | public | LONG VARLEN |
| rrname-rsensorid-data-rrtype | seen | last | public | LONG VARLEN |We use various combiners to aggregate identical observations:
```
setiter -class org.apache.accumulo.core.iterators.user.MaxCombiner -p 11 -t balboa_by_rrname -all # on seen:last
setiter -class org.apache.accumulo.core.iterators.user.MinCombiner -p 13 -t balboa_by_rrname -all # on seen:first
setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 12 -t balboa_by_rrname -all # on count:count
```These need to be set on the following other tables as well:
### Table `balboa_by_rdata`
| Row ID | Column Family | Column Qualifier | Visibility | Value |
|------------------------------|---------------|------------------|------------|-------------|
| rdata-sensorid-rrname-rrtype | count | count | public | LONG VARLEN |
| rdata-sensorid-rrname-rrtype | seen | first | public | LONG VARLEN |
| rdata-sensorid-rrname-rrtype | seen | last | public | LONG VARLEN |### Table `balboa_by_rrname_rev`
| Row ID | Column Family | Column Qualifier | Visibility | Value |
|-----------------------------------|---------------|------------------|------------|-------------|
| rev(rrname)-sensorid-rdata-rrtype | count | count | public | LONG VARLEN |
| rev(rrname)-sensorid-rdata-rrtype | seen | first | public | LONG VARLEN |
| rev(rrname)-sensorid-rdata-rrtype | seen | last | public | LONG VARLEN |## Example run
This example run uses balboa's `balboa-backend-console` to directly talk to the
backend rather than having to go through the GraphQL frontend.`rrname` full query:
```
$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r dns.google | head -n 1 | jq
{
"rrname": "dns.google",
"rrtype": "A",
"sensor_id": "foo",
"rdata": "8.8.4.4",
"count": 1,
"first_seen": 1598303837,
"last_seen": 1598303897
}
````rrname` suffix query:
```
$ balboa-backend-console query -h 127.0.0.1 -p 4242 -r %.com.de | head -n 1 | jq
{
"rrname": "www.jabra.com.de",
"rrtype": "A",
"sensor_id": "foo",
"rdata": "152.199.21.175",
"count": 1,
"first_seen": 1603348710,
"last_seen": 1603348770
}
````rdata` query:
```
$ balboa-backend-console query -h 127.0.0.1 -p 4242 -d 9.9.9.10 | jq
{
"rrname": "dns10.quad9.net",
"rrtype": "A",
"sensor_id": "foo",
"rdata": "9.9.9.10",
"count": 1,
"first_seen": 1603892361,
"last_seen": 1603892421
}
```## PoC limitations
* Hard-coded table names and `public` visibility
* Wildcard support limited to `rrname` queries
* For `rrname` queries, additional `rdata` and `sensorid` constraints will be matched anywhere in the row