{"id":21511503,"url":"https://github.com/netobserv/flowlogs-pipeline","last_synced_at":"2026-02-09T12:14:58.719Z","repository":{"id":36961514,"uuid":"449844570","full_name":"netobserv/flowlogs-pipeline","owner":"netobserv","description":"Transform flow logs into metrics","archived":false,"fork":false,"pushed_at":"2024-10-24T20:11:42.000Z","size":789652,"stargazers_count":77,"open_issues_count":71,"forks_count":23,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-10-25T08:24:36.145Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/netobserv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-19T20:26:23.000Z","updated_at":"2024-10-20T15:59:18.000Z","dependencies_parsed_at":"2023-12-21T19:12:23.622Z","dependency_job_id":"1a856070-8318-4bba-baf5-05f50f7d3267","html_url":"https://github.com/netobserv/flowlogs-pipeline","commit_stats":null,"previous_names":["netobserv/flowlogs2metrics"],"tags_count":50,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/netobserv%2Fflowlogs-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/netobserv%2Fflowlogs-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/netobserv%2Fflowlogs-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/netobserv%2Fflowlogs-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/netobserv","download_url":"https://codeload.github.com/netobserv/flowlogs-pipeline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247353746,"owners_count":20925329,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T22:14:29.324Z","updated_at":"2026-02-09T12:14:58.712Z","avatar_url":"https://github.com/netobserv.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n[![pull request](https://github.com/netobserv/flowlogs-pipeline/actions/workflows/pull_request.yml/badge.svg)](https://github.com/netobserv/flowlogs-pipeline/actions/workflows/pull_request.yml)\n[![push image to quay.io](https://github.com/netobserv/flowlogs-pipeline/actions/workflows/push_image.yml/badge.svg)](https://github.com/netobserv/flowlogs-pipeline/actions/workflows/push_image.yml)\n[![codecov](https://codecov.io/gh/netobserv/flowlogs-pipeline/branch/main/graph/badge.svg?token=KMZKG6PRS9)](https://codecov.io/gh/netobserv/flowlogs-pipeline)\n[![Go Report Card](https://goreportcard.com/badge/github.com/netobserv/flowlogs-pipeline)](https://goreportcard.com/report/github.com/netobserv/flowlogs-pipeline)\n\n# Overview\n\n**Flow-Logs Pipeline** (a.k.a. FLP) is an **observability tool** that **[consumes](./pkg/pipeline/ingest/)** logs from various inputs, **[transform](./pkg//pipeline/transform/)** them and **[export](./pkg/pipeline/write/)** logs to **[loki](https://grafana.com/oss/loki/)** and / or time series metrics to **[prometheus](https://prometheus.io/)**.\n\n![Animated gif](docs/images/animation.gif)\n\nFLP can consume:\n- raw **network flow-logs** in their original format \n([NetFlow v5,v9](https://en.wikipedia.org/wiki/NetFlow) or [IPFIX](https://en.wikipedia.org/wiki/IP_Flow_Information_Export)) \n- [eBPF agent](https://github.com/netobserv/netobserv-ebpf-agent) flows in binary format (protobuf+GRPC)\n- Kafka entries in JSON format\n- A simple file\n\nFLP decorates the metrics and the transformed logs with **context**, \nallowing visualization layers and analytics frameworks to present **network insights** to SRE’s, cloud operators and network experts.\n\nIt also allows defining mathematical transformations to generate condense metrics that encapsulate network domain knowledge.\n\nFLP pipeline module is built on top of [gopipes](https://github.com/netobserv/gopipes) providing customizability and parallelism\n\nIn addition, along with Prometheus and its ecosystem tools such as Thanos, Cortex etc., \nFLP provides an efficient scalable multi-cloud solution for comprehensive network analytics that can rely **solely on metrics data-source**.\n\nDefault network metrics are documented here [docs/metrics.md](docs/metrics.md).  \nOperational metrics are documented here [docs/operational-metrics.md](docs/operational-metrics.md).\n\n\u003e note: operational metrics are exported only using prometheus   \n\u003cbr\u003e\n\u003cbr\u003e\n\n\u003e Note: prometheus eco-system tools such as Alert Manager can be used with FLP to generate alerts and provide big-picture insights.\n\n\n![Data flow](docs/images/data_flow.drawio.png \"Data flow\")\n\n# Usage\n\n\u003c!---AUTO-flowlogs-pipeline_help---\u003e\n```bash\nTransform, persist and expose flow-logs as network metrics  \n  \nUsage:  \n  flowlogs-pipeline [flags]  \n  \nFlags:  \n      --config string              config file (default is $HOME/.flowlogs-pipeline)  \n      --dynamicParameters string   json of configmap location for dynamic parameters  \n      --health.address string      Health server address (default \"0.0.0.0\")  \n      --health.port int            Health server port (default: disable health server)   \n  -h, --help                       help for flowlogs-pipeline  \n      --log-level string           Log level: debug, info, warning, error (default \"error\")  \n      --metricsSettings string     json for global metrics settings  \n      --parameters string          json of config file parameters field  \n      --pipeline string            json of config file pipeline field  \n      --profile.port int           Go pprof tool port (default: disabled)\n```\n\u003c!---END-AUTO-flowlogs-pipeline_help---\u003e\n\n\u003e Note: for API details refer to [docs/api.md](docs/api.md).\n\u003e \n## Configuration generation\n\nflowlogs-pipeline network metrics configuration ( `--config` flag) can be generated automatically using \nthe `confGenerator` utility. `confGenerator` aggregates information from multiple user provided network *metric \ndefinitions* into flowlogs-pipeline configuration. More details on `confGenerator` can be found \nin [docs/confGenrator.md](docs/confGenerator.md).\n \nTo generate flowlogs-pipeline configuration execute:  \n```shell\nmake generate-configuration\nmake dashboards\n```\n\n## Deploy into OpenShift (OCP) with prometheus, loki and grafana\nTo deploy FLP on OCP perform the following steps:\n1. Verify that `kubectl` works with the OCP cluster\n```shell\nkubectl get namespace openshift\n```\n2. Deploy FLP with all dependent components (into `default` namespace)\n```shell\nkubectl config set-context --current --namespace=default\nIMAGE_ORG=netobserv make ocp-deploy\n```\n\n3. Use a web-browser to access grafana dashboards ( end-point address exposed by the script) and observe metrics and logs  \n\n## Deploy with Kind and netflow-simulator (for development and exploration)\nThese instructions apply for deploying FLP development and exploration environment with [kind](https://kind.sigs.k8s.io/) and [netflow-simulator](https://hub.docker.com/r/networkstatic/nflow-generator),\ntested on Ubuntu 20.4 and Fedora 34.\n1. Make sure the following commands are installed and can be run from the current shell:\n   - make\n   - go (version 1.19)\n   - docker\n2. To deploy the full simulated environment which includes a kind cluster with FLP, Prometheus, Grafana, and\n   netflow-simulator, run (note that depending on your user permissions, you may have to run this command under sudo):\n    ```shell\n    make local-deploy\n    ````\n   If the command is successful, the metrics will get generated and can be observed by running (note that depending\n   on your user permissions, you may have to run this command under sudo):\n    ```shell\n    kubectl logs -l app=flowlogs-pipeline -f\n    ```\n    The metrics you see upon deployment are default and can be modified through configuration described [later](#Configuration).\n\n# Technology\n\nFLP is a framework. The main FLP object is the **pipeline**. FLP **pipeline** can be configured (see\n[Configuration section](#Configuration)) to extract the flow-log records from a source in a standard format such as NetFLow or IPFIX, apply custom processing, and output the result as metrics (e.g., in Prometheus format).\n\n# Architecture\n\nThe pipeline is constructed of a sequence of stages. Each stage is classified into one of the following types:\n- **ingest** - obtain flows from some source, one entry per line\n- **transform** - convert entries into a standard format; can include multiple transform stages\n- **write** - provide the means to write the data to some target, e.g. loki, standard output, etc\n- **extract** - derive a set of metrics from the imported flows\n- **encode** - make the data available in appropriate format (e.g. prometheus)\n\nThe first stage in a pipeline must be an **ingest** stage.\nEach stage (other than an **ingest** stage) specifies the stage it follows.\nMultiple stages may follow from a particular stage, thus allowing the same data to be consumed by multiple parallel pipelines.\nFor example, multiple **transform** stages may be performed and the results may be output to different targets.\n\nA configuration file consists of two sections.\nThe first section describes the high-level flow of information between the stages, giving each stage a name and building the graph of consumption and production of information between stages.\nThe second section provides the definition of specific configuration parameters for each one of the named stages.\nA full configuration file with the data consumed by  two different transforms might look like the following.\n\n```yaml\npipeline:\n  - name: ingest1\n  - name: generic1\n    follows: ingest1\n  - name: write1\n    follows: generic1\n  - name: generic2\n    follows: ingest1\n  - name: write2\n    follows: generic2\nparameters:\n  - name: ingest1\n    ingest:\n      type: file_loop\n      file:\n        filename: hack/examples/ocp-ipfix-flowlogs.json\n        decoder:\n          type: json\n  - name: generic1\n    transform:\n      type: generic\n      generic:\n        policy: replace_keys\n        rules:\n          - input: Bytes\n            output: v1_bytes\n          - input: DstAddr\n            output: v1_dstAddr\n          - input: Packets\n            output: v1_packets\n          - input: SrcPort\n            output: v1_srcPort\n  - name: write1\n    write:\n      type: stdout\n  - name: generic2\n    transform:\n      type: generic\n      generic:\n        policy: replace_keys\n        rules:\n          - input: Bytes\n            output: v2_bytes\n          - input: DstAddr\n            output: v2_dstAddr\n          - input: Packets\n            output: v2_packets\n          - input: SrcPort\n            output: v2_srcPort\n  - name: write2\n    write:\n      type: stdout\n```\nIt is expected that the **ingest** module will receive flows every so often, and this ingestion event will then trigger the rest of the pipeline. So, it is the responsibility of the **ingest** module to provide the timing of when (and how often) the pipeline will run.\n\n# Configuration\n\nIt is possible to configure flowlogs-pipeline using command-line-parameters, configuration file, or any combination of those options.\n\n\nFor example:\n1. Using configuration file:\n\n```yaml\nlog-level: info\npipeline:\n  - name: ingest_file\n  - name: write_stdout\n    follows: ingest_file\nparameters:\n  - name: ingest_file\n    ingest:\n      type: file\n      file:\n        filename: hack/examples/ocp-ipfix-flowlogs.json\n        decoder:\n          type: json\n  - name: write_stdout\n    write:\n      type: stdout\n```\n- execute\n\n\n`./flowlogs-pipeline --config \u003cconfigFile\u003e`\n\n2. Using command line parameters:\n \n`./flowlogs-pipeline --pipeline \"[{\\\"name\\\":\\\"ingest1\\\"},{\\\"follows\\\":\\\"ingest1\\\",\\\"name\\\":\\\"write1\\\"}]\" --parameters \"[{\\\"ingest\\\":{\\\"file\\\":{\\\"filename\\\":\\\"hack/examples/ocp-ipfix-flowlogs.json\\\"},\\\"decoder\\\":{\\\"type\\\":\\\"json\\\"},\\\"type\\\":\\\"file\\\"},\\\"name\\\":\\\"ingest1\\\"},{\\\"name\\\":\\\"write1\\\",\\\"write\\\":{\\\"type\\\":\\\"stdout\\\"}}]\"`\n\nOptions included in the command line override the options specified in the config file.\n\n`flowlogs-pipeline --log-level debug --pipeline \"[{\\\"name\\\":\\\"ingest1\\\"},{\\\"follows\\\":\\\"ingest1\\\",\\\"name\\\":\\\"write1\\\"}]\" --config \u003cconfigFile\u003e`\n\nSupported options are provided by running:\n\n```\nflowlogs-pipeline --help\n```\n\n# Syntax of portions of the configuration file\n\n## Supported stage types\n\n### Transform\nDifferent types of inputs come with different sets of keys.\nThe transform stage allows changing the names of the keys and deriving new keys from old ones.\nMultiple transforms may be specified, and they are applied in the **order of specification** (using the **follows** keyword).\nThe output from one transform becomes the input to the next transform.\n\n### Transform Generic\n\nThe generic transform module maps the input json keys into another set of keys.\nThis allows to perform subsequent operations using a uniform set of keys.\nIn some use cases, only a subset of the provided fields are required.\nUsing the generic transform, we may specify those particular fields that interest us.\nSpecify `policy: replace_keys` to use only the newly specified keys.\nTo include the original keys and values in addition to those specified in the `rules`,\nspecify `policy: preserve_original_keys`.\n\nThe rule `multiplier` takes the input field, multiplies it by the provided value, and\nplaces the result in the output field.\nThis is useful to use when provided with only a sample of the flow logs (e.g. 1 our of 20),\nand some of the variables need to be adjusted accordingly.\nIf `multipier` is not set or if it is set to 0, then the input field is simply copied to the output field.\n\nFor example, suppose we have a flow log with the following syntax:\n```\n{\"Bytes\":20800,\"DstAddr\":\"10.130.2.2\",\"DstPort\":36936,\"Packets\":400,\"Proto\":6,\"SequenceNum\":1919,\"SrcAddr\":\"10.130.2.13\",\"SrcHostIP\":\"10.0.197.206\",\"SrcPort\":3100,\"TCPFlags\":0,\"TimeFlowStart\":0,\"TimeReceived\":1637501832}\n```\n\nSuppose further that we are only interested in fields with source/destination addresses and ports, together with bytes and packets transferred.\nThe yaml specification for these parameters would look like this:\n\n```yaml\nparameters:\n  - name: transform1\n    transform:\n      type: generic\n      generic:\n        policy: replace_keys\n        rules:\n          - input: Bytes\n            output: bytes\n            multiplier: 20\n          - input: DstAddr\n            output: dstAddr\n          - input: DstPort\n            output: dstPort\n          - input: Packets\n            output: packets\n            multiplier: 20\n          - input: SrcAddr\n            output: srcAddr\n          - input: SrcPort\n            output: srcPort\n          - input: TimeReceived\n            output: timestamp\n```\n\nEach field specified by `input` is translated into a field specified by the corresponding `output`.\nOnly those specified fields are saved for further processing in the pipeline.\nFurther stages in the pipeline should use these new field names.\nThis mechanism allows us to translate from any flow-log layout to a standard set of field names.\n\nIn the above example, the `bytes` and `packets` fields have a multiplier of 20.\nThis may be done in case only a sampling of the flow logs are provided, in this case 1 in 20,\nso that these fields need to be scaled accordingly.\n\nIf the `input` and `output` fields are identical, then that field is simply passed to the next stage.\nFor example:\n```yaml\npipeline:\n  - name: transform1\n    follows: \u003csomething\u003e\n  - name: transform2\n    follows: transform1\nparameters:\n  - name: transform1\n    transform:\n      type: generic\n      generic:\n        policy: replace_keys\n        rules:\n          - input: DstAddr\n            output: dstAddr\n          - input: SrcAddr\n            output: srcAddr\n  - name: transform2\n    transform:\n      type: generic\n      generic:\n        policy: replace_keys\n        rules:\n          - input: dstAddr\n            output: dstIP\n          - input: dstAddr\n            output: dstAddr\n          - input: srcAddr\n            output: srcIP\n          - input: srcAddr\n            output: srcAddr\n```\nBefore the first transform suppose we have the keys `DstAddr` and `SrcAddr`.\nAfter the first transform, we have the keys `dstAddr` and `srcAddr`.\nAfter the second transform, we have the keys `dstAddr`, `dstIP`, `srcAddr`, and `srcIP`.\n\nTo maintain all the old keys and values and simply add the key `dstAddr` (derived from `DstAddr`), use the following:\n```yaml\nparameters:\n  - name: transform1\n    transform:\n      type: generic\n      generic:\n        policy: preserve_original_keys\n        rules:\n          - input: DstAddr\n            output: dstAddr\n```\n\n### Transform Filter\n\nThe filter transform module allows setting rules to remove complete flow logs from the output, or just remove specific keys and values from logs.\n\nFor example, suppose we have a flow log with the following syntax:\n```json\n{\n  \"Bytes\":20800,\n  \"DstAddr\":\"10.130.2.2\",\n  \"DstPort\":36936,\n  \"Packets\":400,\n  \"Proto\":6,\n  \"SequenceNum\":1919,\n  \"SrcAddr\":\"10.130.2.13\",\n  \"SrcHostIP\":\"10.0.197.206\",\n  \"SrcPort\":3100,\n  \"TCPFlags\":0,\n  \"TimeFlowStart\":0,\n  \"TimeReceived\":1637501832\n}\n```\n\nThe below configuration will skip that log, removing it from the output.\n\n```yaml\nparameters:\n  - name: filter1\n    transform:\n      type: filter\n      filter:\n        rules:\n        - type: remove_entry_if_exists\n          removeEntry:\n            input: TCPFlags\n```\n\n- `type: remove_entry_if_doesnt_exist` reverses the logic and will not remove the above example entry.\n- `type: remove_field` keeps the entry but changes its content, removing the `TCPFlags` key and value.\n- `type: remove_entry_if_equal` removes the entry if the specified field exists and is equal to the specified value.\n- `type: remove_entry_if_not_equal` removes the entry if the specified field exists and is not equal to the specified value.\n\n#### Transform Filter: query language\n\nAlternatively, a query language allows to filter flows, keeping entries rather than removing them.\n\n```\n(srcnamespace=\"netobserv\" OR (srcnamespace=\"ingress\" AND dstnamespace=\"netobserv\")) AND srckind!=\"service\"\n```\n\n[See here](./docs/filtering.md) for more information about this language.\n\n### Transform Network\n\n`transform network` provides specific functionality that is useful for transformation of network flow-logs:\n\n1. Resolve subnet from IP addresses\n1. Resolve known network service names from port numbers and protocols\n1. Compute geo-location from IP addresses\n1. Resolve kubernetes information from IP addresses\n\nExample configuration:\n\n```yaml\nparameters:\n  - name: transform1\n    transform:\n      type: network\n      network:\n        kubeConfig:\n          configPath: /tmp/config\n        rules:\n          - type: add_subnet\n            add_subnet:\n              input: srcIP\n              output: srcSubnet\n              subnet_mask: /24\n          - type: add_service\n            add_service:\n              input: dstPort\n              output: service\n              protocol: protocol\n          - type: add_location\n            add_location:\n              input: dstIP\n              output: dstLocation\n          - type: add_kubernetes\n            kubernetes:\n              ipField: srcIP\n              output: srcK8S\n```\n\nThe rule `add_subnet` generates a new field named `srcSubnet` with the \nsubnet of `srcIP` calculated based on prefix length from the `parameters` field \n\nThe rule `add_service` generates a new field named `service` with the known network \nservice name of `dstPort` port and `protocol` protocol. Unrecognized ports are ignored \n\u003e Note: `protocol` can be either network protocol name or number  \n\u003e   \n\u003e Note: optionally supports custom network services resolution by defining configuration parameters \n\u003e `servicesFile` and `protocolsFile` with paths to custom services/protocols files respectively  \n\nThe rule `add_location` generates new fields with the geo-location information. It uses the [IP2Location LITE database](https://lite.ip2location.com/) in that purpose. All the geo-location fields will be named by prefixing the `output` value to their names in the IP2Location DB (e.g., `CountryName`, `CountryLongName`, `RegionName`, `CityName` , `Longitude` and `Latitude`).\n\nThe rule `add_kubernetes` generates new fields with kubernetes information by\nmatching the `ipField` value (`srcIP` in the example above) with kubernetes `nodes`, `pods` and `services` IPs.\nAll the kubernetes fields will be named by appending `output` value\n(`srcK8S` in the example above) to the kubernetes metadata field names\n(e.g., `Namespace`, `Name`, `Type`, `HostIP`, `OwnerName`, `OwnerType` )\n\nIn addition, if the `parameters` value is not empty, fields with kubernetes labels \nwill be generated, and named by appending `parameters` value to the label keys.   \n\nIf `assignee` is set to `otel` then the output fields of `add_kubernetes` will be produced in opentelemetry format.\n\n\u003e Note: kubernetes connection is done using the first available method: \n\u003e 1. configuration parameter `kubeConfig.configPath` (in the example above `/tmp/config`) or\n\u003e 2. using `KUBECONFIG` environment variable\n\u003e 3. using local `~/.kube/config`\n\n\u003e Note: above example describes the most common available transform network `Type` options\n\n\u003e Note: above transform is essential for the `aggregation` phase  \n\n### Aggregates\n\nAggregates are used to define the transformation of flow-logs from textual/json format into\nnumeric values to be exported as metrics. Aggregates are dynamically created based\non defined values from fields in the flow-logs and on mathematical functions to be performed\non these values.\nThe specification of the aggregates details is placed in the `extract` stage of the pipeline.\n\nFor Example, assuming set of flow-logs, with single sample flow-log that looks like:\n```\n{\"srcIP\":   \"10.0.0.1\",\n\"dstIP\":   \"20.0.0.2\",\n\"level\":   \"error\",\n\"value\":   \"7\",\n\"message\": \"test message\"}\n```\n\nIt is possible to define aggregates per `srcIP` or per `dstIP` of per the tuple `srcIP`x`dstIP`\nto capture the `sum`, `min`, `avg` etc. of the values in the field `value`.\n\nFor example, configuration record for aggregating field `value` as\naverage for `srcIP`x`dstIP` tuples will look like this:\n\n```yaml\npipeline:\n  - name: aggregate1\n    follows: \u003csomething\u003e\nparameters:\n  - name: aggregate1\n    extract:\n      type: aggregates\n      aggregates:\n        - name: \"Average key=value for (srcIP, dstIP) pairs\"\n          by:\n            - \"dstIP\"\n            - \"srcIP\"\n          operation: \"avg\"\n          operationKey: \"value\"\n```\n\nThe output fields of the aggregates stage are:\n- `name`\n- `operation`\n- `operation_key`\n- `by`\n- `aggregate`\n- `total_value`: the total aggregate value\n- `total_count`: the total count\n- `recent_raw_values`: a slice with the raw values of the recent batch\n- `recent_op_value`: the aggregate value of the recent batch\n- `recent_count`: the count of flowlogs in the recent batch\n\nThese fields are used by the next stage (for example `prom` encoder).\nThe pipeline processes flowlogs in batches.\nThe output fields with `recent_` prefix are related to the recent batch.\nThey are needed when exposing metrics in Prometheus using Counters and Histograms.\nPrometheus Counters API accepts the delta amount to be added to the counter and not the total value as in Gauges.\nIn this case, `recent_op_value` and `recent_count` should be used as the `valueKey`.\nThe API of Histograms accepts the sample value, so it could be added to the appropriate bucket.\nIn this case, we are interested in the raw values of the records in the aggregation group.\nNo aggregate operation is needed and it should be set `raw_values`. The `valueKey` should be set to `recent_raw_values`.\n\n**Note**: `recent_raw_values` is filled only when the operation is `raw_values`.\n\n### Connection tracking\n\nThe connection tracking module allows grouping flow logs with common properties (i.e. same connection) and calculate \nuseful statistics.\nThe input of the module is flow-log records and the output is connection records and the flow-log records with an\nadditional hash id field to correlate with the connection records.\nThere are 4  output records types:\n1. **New connection**: indicates that a new connection is detected. i.e. the input contains a flow-log that doesn't\nbelong to any of the tracked connections.\n2. **Heartbeat**: a periodic report of the connection statistics for long connections.\n3. **End connection**: indicates that a connection has ended. A connection is considered ended once the \ntimeout since the latest flow-log of the connection has elapsed or a flow log of `FIN_ACK` has been received.\n4. **Flow log**: a copy of the input flow log with the additional `_RecordType` and `_HashId` fields.\n\nThe configuration can suppress any of the output types.\n\nThe configuration of the module allows defining how to group flow-logs into connections.\nThere is an option to group flow-logs into unidirectional connections or bidirectional connections.\nThe difference is that in unidirectional setting, flow-logs from A to B are grouped separately from flow-logs from B to A.\nWhile, in bidirectional setting, they are grouped together.\n\nBidirectional setting requires defining both `fieldGroupARef` and `fieldGroupBRef` sections to allow the connection\ntracking module to identify which set of fields can swap values and still be considered as the same connection.\nThe pairs of fields that can swap are determined by their order in the fieldGroup.\nIn the example below, `SrcAddr` and `DstAddr` are first in their fieldGroup, so they are swappable.\nThe same is true for `SrcPort` and `DstPort` which are second.\n\nThe configuration example below defines a bidirectional setting. So flow-logs that have the values of `SrcAddr` and `SrcPort` \nswapped with `DstAddr` and `DstPort` are grouped together as long as they have the same `Proto` field.\nFor example, the following first 2 flow-logs are grouped together into the same connection.\nWhile the third flow-log forms a new connection (because its `Proto` field differs from the first 2).\n```json\n{\"SrcAddr\":\"10.0.0.1\", \"SrcPort\":1234, \"DstAddr\":\"10.0.0.2\", \"DstPort\":80, \"Proto\":6, \"Bytes\":100, \"TimeReceived\": 1661430100}\n{\"SrcAddr\":\"10.0.0.2\", \"SrcPort\":80, \"DstAddr\":\"10.0.0.1\", \"DstPort\":1234, \"Proto\":6, \"Bytes\":200, \"TimeReceived\": 1661430200}\n{\"SrcAddr\":\"10.0.0.1\", \"SrcPort\":1234, \"DstAddr\":\"10.0.0.2\", \"DstPort\":80, \"Proto\":17, \"Bytes\":300, \"TimeReceived\": 1661430300}\n```\n\nA typical configuration might look like:\n```yaml\nparameters:\n- name: extract_conntrack\n  extract:\n    type: conntrack\n    conntrack:\n      keyDefinition:\n        fieldGroups:\n        - name: src\n          fields:\n          - SrcAddr\n          - SrcPort\n        - name: dst\n          fields:\n          - DstAddr\n          - DstPort\n        - name: protocol\n          fields:\n          - Proto\n        hash:\n          fieldGroupRefs:\n          - protocol\n          fieldGroupARef: src\n          fieldGroupBRef: dst\n      outputRecordTypes:\n      - newConnection\n      - endConnection\n      - heartbeat\n      - flowLog\n      outputFields:\n      - name: Bytes_total\n        operation: sum\n        input: Bytes\n      - name: Bytes\n        operation: sum\n        splitAB: true\n      - name: numFlowLogs\n        operation: count\n      - name: TimeFlowStart\n        operation: min\n        input: TimeReceived\n      - name: TimeFlowEnd\n        operation: max\n        input: TimeReceived\n      scheduling:\n      - selector: # UDP connections\n          Proto: 17\n        endConnectionTimeout: 5s\n        heartbeatInterval: 40s\n        terminatingTimeout: 5s\n      - selector: {} # Default group\n        endConnectionTimeout: 10s\n        heartbeatInterval: 30s\n        terminatingTimeout: 5s\n      tcpFlags:\n        fieldName: Flags\n        detectEndConnection: true\n        swapAB: true\n```\n\nA possible output would look like:\n```json\n{\n    \"_RecordType\": \"endConnection\",\n    \"_HashId\": \"3e8ba98164baecaf\",\n    \"_IsFirst\": true,\n    \"SrcAddr\": \"10.0.0.1\",\n    \"SrcPort\": 1234,\n    \"DstAddr\": \"10.0.0.2\",\n    \"DstPort\": 80,\n    \"Proto\": 6,\n    \"Bytes_AB\": 100,\n    \"Bytes_BA\": 200,\n    \"Bytes_total\": 300,\n    \"numFlowLogs\": 2,\n    \"TimeFlowStart\": 1661430100,\n    \"TimeFlowEnd\": 1661430200\n}\n\n{\n    \"_RecordType\": \"flowLog\",\n    \"_HashId\": \"bca4c313a1ad1b1c\",\n    \"SrcAddr\": \"10.0.0.1\",\n    \"SrcPort\": 1234,\n    \"DstAddr\": \"10.0.0.2\",\n    \"DstPort\": 80,\n    \"Proto\": 17,\n    \"Bytes\": 300,\n    \"TimeReceived\": 1661430300\n}\n```\n#### Connection tracking metrics\n\nThe following table shows the possible values of the `classification` label in `conntrack_input_records` operational metric.\n\n| Metric          | Reason                                                                                     |\n|-----------------|--------------------------------------------------------------------------------------------|\n | `discarded`     | layer2 protocols like ARP, none transport protocols like ICMPv4/6 and too many connections |\n | `rejected`      | when Error happens calculating connection track hash                                       |\n | `duplicate`     | for duplicate flows                                                                        |\n | `newConnection` | when new connection tracking flow is created                                               |\n\nNotice that all output records contain `_RecordType` and `_HashId` fields.\nOutput fields that set `splitAB: true` (like in `Bytes`) are split into 2 fields `Bytes_AB` and `Bytes_BA` which \naggregate values separately based on direction A-\u003eB and B-\u003eA respectively.\nWhen `splitAB` is absent, its default value is `false`.\n\nThe boolean field `_IsFirst` exists only in records of type `newConnection`, `heartbeat` and `endConnection`.\nIt is set to true only on the first record of the connection.\nThe `_IsFirst` field is useful in cases where `newConnection` records are not outputted (to reduce the number output records)\nand there is a need to count the total number of connections: simply counting `_IsFirst=true` \n\nThe configuration allows defining scheduling groups. That is, defining different timeouts based on connection key fields' values.\nThe order of the defined groups is important since the group of a connection is determined by the first matching group.\nThe last group must have an empty selector indicating a match-all rule serving as a default group for connections that \ndon't match any of the other groups. There can't be more than one default group.\n\nThe TCP flags section in the configuration allows utilizing the TCP flags data collected in the flow logs.\nIt has the following features that could be enabled (by default, they aren't enabled):\n1. Ending connections when the `FIN_ACK` flag is set and avoid waiting the `EndConnectionTimeout`.\n2. Swapping source and destination of a connection when `SYN_ACK` is set on the first flow log.\nThe source and destination of a connection are determined by the first received flow log of the connection.\nIf the first received flow log happens to be of the opposite direction (server -\u003e client) either because of sampling or out of order,\nthen the source and destination of the connection are swapped.\nIn special cases, where the first received flow log has the `SYN_ACK` flag,\nwe can assume that it is the second step of the TCP handshake,\nthe direction is from the server (source) to the client (destination) and we can swap them in the connection so the client will be the source and the server will be the destination.  \n\n\n### Timebased TopK\n\nIt is sometimes desirable to return only a subset of records, such as those connections that use the most bandwidth.\nThis information is often relevant only for recently reported records.\nThis stage enables the reporting of records for the top (or bottom) K entries that have recently been processed.\nThe specification of the Timebased TopK details is placed in the `extract` stage of the pipeline.\n\nFor Example, assuming a set of flow-logs, with a single sample flow-log that looks like:\n```\n{\n    \"srcIP\": \"10.0.0.1\",\n    \"dstIP\":  \"20.0.0.2\",\n    \"srcSubnet\": \"10.0.0.0/16\",\n    \"bytes\":  4096,\n}\n```\n\nIt is possible to request the entries indexed by subnet with the top number of bytes.\nThere may be multiple records with the same index (e.g. same srcIP or same subnet, as the case may be).\nThe time interval over which to select the TopK may be specified.\nIt may further be specified what operation to perform on the multiple entries of the same index that fall within the allowed time inerval.\nThe allowed operations are: `sum`, `min`, `max`, `avg`, `diff`, `last`.\nTo obtain the bottom K entries instead of the Top K entries, set `reversed` to `true`.\n\nA sample configuration record looks like this:\n\n```yaml\npipeline:\n  - name: timebased1\n    follows: \u003csomething\u003e\nparameters:\n  - name: timebased1\n    extract:\n      type: timebased\n      timebased:\n        rules:\n          - name: \"Top 3 Sum of bytes per source subnet over last 10 seconds\"\n            operation: sum\n            operationKey: bytes\n            recordKeys: srcSubnet\n            topK: 3\n            reversed: false\n            timeInterval: 10s\n```\n\nThe output fields of the aggregates stage are:\n- `name`; the name of the rule.\n- `index_key`; the fields specified in the rules upon which to index, comma separated. Each of these keys will be append in the output with their corresponding values.\n- `operation`; the operation of the rule. The result value of the operation is append in `operationKey` output field.\n\nExample output:\n```json\n{\n   \"name\":\"Top 3 Sum of bytes per source subnet over last 10 seconds\",\n   \"index_key\":\"srcSubnet\",\n   \"operation\":\"sum\",\n   \"srcSubnet\":\"10.0.0.0/16\",\n   \"bytes\":1234,\n}\n```\n\nThese fields are used by the next stage (for example `prom` encoder).\n\n### Prometheus encoder\n\nThe prometheus encoder specifies which metrics to export to prometheus and which labels should be associated with those metrics.\nFor example, we may want to report the number of bytes and packets for the reported flows.\nFor each reported metric, we may specify a different set of labels.\nEach metric may be renamed from its internal name.\nThe internal metric name is specified as `valueKey` and the exported name is specified as `name`.\nA prefix for all exported metrics may be specified, and this prefix is prepended to the `name` of each specified metric.\n\n```yaml\nparameters:\n  - name: prom1\n    encode:\n      type: prom\n      prom:\n        prefix: test_\n        metrics:\n          - name: Bytes\n            type: gauge\n            valueKey: bytes\n            labels:\n              - srcAddr\n              - dstAddr\n              - srcPort\n          - name: Packets\n            type: counter\n            valueKey: packets\n            labels:\n              - srcAddr\n              - dstAddr\n              - dstPort\n```\n\nIn this example, for the `bytes` metric we report with the labels which specify srcAddr, dstAddr and srcPort.\nEach different combination of label-values is a distinct gauge reported to prometheus.\nThe name of the prometheus gauge is set to `test_Bytes` by concatenating the prefix with the metric name.\nThe `packets` metric is very similar. It makes use of the `counter` prometheus type which adds reported values\nto a prometheus counter.\n\n### Loki writer\n\nThe loki writer persists flow-logs into [Loki](https://github.com/grafana/loki). The flow-logs are sent with defined \ntenant ID and with a set of static labels and dynamic labels from the record fields. \nFor example, sending flow-logs into tenant `theTenant` with labels \nfrom `foo` and `bar` fields \nand including static label with key `job` with value `flowlogs-pipeline`. \nAdditional parameters such as `url` and `batchWait` are defined in \nLoki writer API [docs/api.md](docs/api.md)\n\n```yaml\nparameters:\n  - name: write_loki\n    write:\n      type: loki\n      loki:\n        tenantID: theTenant\n        url: http://loki.default.svc.cluster.local:3100\n        staticLabels:\n          job: flowlogs-pipeline\n        batchWait: 1m\n        labels:\n          - foo\n          - bar\n  ```\n\n\u003e Note: to view loki flow-logs in `grafana`: Use the `Explore` tab and choose the `loki` datasource. In the `Log Browser` enter `{job=\"flowlogs-pipeline\"}` and press `Run query` \n\n### Object Store encoder\n\nThe object store encoder allows to export flows into an object store using the S3 API.\nA batch of flow logs received in some time interval are collected and stored in a single object.\nThe configuration provides the URL of the object store, credentials to access the object store, the bucket in the object store into which the objects should be placed, and parameters (key/value pairs) to be stored as metadata of the created objects.\nObject names are constructed according to the following format:\n```\n\u003cbucket\u003e/\u003caccount\u003e/year={xxxx}/month={yy}/day={zz}/hour={hh}/stream-id={stream-id}/{sequence-number}\n```\n\nThe `{stream-id}` is derived from the time flowlogs-pipeline started to run.\nThe syntax of a sample configuration file is as follows:\n\n```\nparameters:\n  - name: encodeS3\n    encode:\n      type: s3\n      s3:\n        endpoint: 1.2.3.4:9000\n        bucket: bucket1\n        account: account1\n        accessKeyId: accessKey1\n        secretAccessKey: secretAccessKey1\n        writeTimeout: 60s\n        batchSize: 100\n        objectHeaderParameters:\n          key1: val1\n          key2: val2\n          key3: val3\n          key4: val4\n```\n\nThe key/value pairs in `objectHeaderParameters` may contain arbitrary configuration information that the administrator wants to save as metadata for the produced objects, such as `tenant_id` or `network_interface_id`.\nThe content of the object consists of object header fields followed by the actual flow logs.\nThe object header contains the following fields: `version`, `capture_start_time`, `capture_end_time`, `number_of_flow_logs`, plus all the fields provided in the configuration under the `objectHeaderParameters`.\n\nIf no flow logs arrive within the `writeTimeout` period, then an object is created with no flows.\nAn object is created either when we have accumulated `batchSize` flow logs or when `writeTimeout` has passed.\n\n### Metrics Settings\n\nSome global metrics settings may be set in the configuration file.\nA sample is the following:\n\n```\nmetricsSettings:\n  suppressDefaultMetrics: true\n  prefix: flp_operational_\n  port: 9102\n\n```\n\nFLP metrics are reported to a prometheus client interface.\nIn addition, there are default metrics reported by `Go`, which are also directed to the prometheus client interface.\nThe port to which these metrics are made available is specified in the `port` configuration parameter.\nIf a `prefix` is specified, then the specified prefix is prepended to each of the operational metrics generated by FLP.\nA different `prefix` may be specified on an `encode prom` stage to be prepended to the prometheus metrics defined in that stage.\nThe `suppressDefaultMetrics` parameter may be set to `true` in order to suppress the reporting of the default Prometheus metrics, such as `Go` and process metrics.\n\n# Development\n\n## Build\n\n- Clone this repository from github into a local machine (Linux/X86):\n  `git clone git@github.com:netobserv/flowlogs-pipeline.git`\n- Change directory into flowlogs-pipeline into:\n  `cd flowlogs-pipeline`\n- Build the code:\n  ```bash\n  # compile project\n  make build\n\n  # build the default image (quay.io/netobserv/flowlogs-pipeline:main):\n  make image-build\n\n  # push the default image (quay.io/netobserv/flowlogs-pipeline:main):\n  make image-push\n\n  # build and push on your own quay.io account (quay.io/myuser/flowlogs-pipeline:dev):\n  IMAGE_ORG=myuser VERSION=dev make images\n\n  # build and push on a different registry\n  IMAGE=dockerhub.io/myuser/plugin:tag make images\n  ```\n\nFLP uses `Makefile` to build, tests and deploy. Following is the output of `make help` :\n\n\u003c!---AUTO-makefile_help---\u003e\n```bash\n  \nUsage:  \n  make \u003ctarget\u003e  \n  \nGeneral  \n  help                  Display this help.  \n  prereqs               Check if prerequisites are met, and install missing dependencies  \n  prereqs-kind          Check if prerequisites are met for running kind, and install missing dependencies  \n  vendors               Check go vendors  \n  \nDevelop  \n  lint                  Lint the code  \n  compile               Compile main flowlogs-pipeline and config generator  \n  build                 Build flowlogs-pipeline executable and update the docs  \n  docs                  Update flowlogs-pipeline documentation  \n  clean                 Clean  \n  tests-unit            Unit tests  \n  coverage-report       Generate coverage report  \n  coverage-report-html  Generate HTML coverage report  \n  tests-fast            Fast unit tests (no race tests / coverage)  \n  tests-e2e             End-to-end tests  \n  tests-all             All tests  \n  benchmarks            Benchmark  \n  run                   Run  \n  \nImages  \n  image-build           Build MULTIARCH_TARGETS images  \n  image-push            Push MULTIARCH_TARGETS images  \n  manifest-build        Build MULTIARCH_TARGETS manifest  \n  manifest-push         Push MULTIARCH_TARGETS manifest  \n  extract-binaries      Extract all MULTIARCH_TARGETS binaries  \n  goyacc                Regenerate filters query langage  \n  \nkubernetes  \n  deploy                Deploy the image  \n  undeploy              Undeploy the image  \n  deploy-loki           Deploy loki  \n  undeploy-loki         Undeploy loki  \n  deploy-prometheus     Deploy prometheus  \n  undeploy-prometheus   Undeploy prometheus  \n  deploy-grafana        Deploy grafana  \n  undeploy-grafana      Undeploy grafana  \n  deploy-netflow-simulator  Deploy netflow simulator  \n  undeploy-netflow-simulator  Undeploy netflow simulator  \n  \nkind  \n  create-kind-cluster   Create cluster  \n  delete-kind-cluster   Delete cluster  \n  kind-load-image       Load image to kind  \n  \nmetrics  \n  generate-configuration  Generate metrics configuration  \n  \nEnd2End  \n  local-deploy          Deploy locally on kind (with simulated flowlogs)  \n  local-cleanup         Undeploy from local kind  \n  local-redeploy        Redeploy locally (on current kind)  \n  ocp-deploy            Deploy to OCP  \n  ocp-cleanup           Undeploy from OCP  \n  dev-local-deploy      Deploy locally with simulated netflows  \n  \nshortcuts helpers  \n  build-image           Build MULTIARCH_TARGETS images  \n  push-image            Push MULTIARCH_TARGETS images  \n  build-manifest        Build MULTIARCH_TARGETS manifest  \n  push-manifest         Push MULTIARCH_TARGETS manifest  \n  images                Build and push MULTIARCH_TARGETS images and related manifest\n```\n\u003c!---END-AUTO-makefile_help---\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnetobserv%2Fflowlogs-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnetobserv%2Fflowlogs-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnetobserv%2Fflowlogs-pipeline/lists"}