{"id":13582012,"url":"https://github.com/cloudflare/flow-pipeline","last_synced_at":"2025-04-04T06:08:38.956Z","repository":{"id":39612207,"uuid":"123599434","full_name":"cloudflare/flow-pipeline","owner":"cloudflare","description":"A set of tools and examples to run a flow-pipeline (sFlow, NetFlow)","archived":false,"fork":false,"pushed_at":"2024-11-04T10:56:55.000Z","size":79,"stargazers_count":181,"open_issues_count":4,"forks_count":37,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-03-28T05:12:07.955Z","etag":null,"topics":["clickhouse","cloudflare","docker","goflow","kafka","netflow","protobuf","sflow"],"latest_commit_sha":null,"homepage":"https://www.cloudflare.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudflare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-02T15:58:12.000Z","updated_at":"2025-03-08T16:02:27.000Z","dependencies_parsed_at":"2024-10-03T21:41:22.834Z","dependency_job_id":"d8d5b490-5e1f-4274-b975-8e2d90501fbe","html_url":"https://github.com/cloudflare/flow-pipeline","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fflow-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fflow-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fflow-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fflow-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudflare","download_url":"https://codeload.github.com/cloudflare/flow-pipeline/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247128751,"owners_count":20888235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clickhouse","cloudflare","docker","goflow","kafka","netflow","protobuf","sflow"],"created_at":"2024-08-01T15:02:23.044Z","updated_at":"2025-04-04T06:08:38.929Z","avatar_url":"https://github.com/cloudflare.png","language":"Go","readme":"# flow-pipeline\n\nThis repository contains a set of tools and examples for [GoFlow](https://github.com/cloudflare/goflow),\na NetFlow/IPFIX/sFlow collector by [Cloudflare](https://www.cloudflare.com).\n\n## Start a flow pipeline\n\nThe demo directory contains a startup file for an example pipeline including:\n* GoFlow: an sFlow collector\n* A mock collector\n* Kafka/Zookeeper\n* A database (Postgres/clickhouse)\n* An inserter: to insert the flows in a database (for Postgres)\n\nIt will listen on port 6343/UDP for sFlow and 2055/UDP for NetFlow.\n\nThe protobuf provided in this repository is a light version of\nthe GoFlow original one. Only a handful of fields will be inserted.\n\nA basic pipeline looks like this:\n\n```\n\n\n\n                   +------+         +-----+\n     sFlow/NetFlow |goflow+---------\u003eKafka|\n                   +------+         +-----+\n                                       |\n                                       +--------------+\n                      Topic: flows     |              |\n                                       |              |\n                                 +-----v----+       +-v---------+\n                                 | inserter |       |new service|\n                                 +----------+       +-----------+\n                                      |\n                                      |\n                                   +--v--+\n                                   |  DB |\n                                   +-----+\n\n```\n\nYou can add a _processor_ that would enrich the data\nby consuming from Kafka and re-injecting the data into Kafka \nor directly into the database.\n\nFor instance, IP addresses can be mapped to countries, ASN\nor customer information.\n\nA suggestion is extending the GoFlow protobuf with new fields.\n\n## Run a mock insertion\n\nA mock insertion replaces the GoFlow decoding part. A _mocker_ generates\nprotobuf messages and sends them to Kafka.\n\nClone the repository, then run the following (for Postgres):\n\n```\n$ cd compose\n$ docker-compose -f docker-compose-postgres-mock.yml up\n```\n\nWait a minute for all the components to start.\n\nYou can connect on the local Grafana http://localhost:3000 (admin/admin) to look at the flows being collected.\n\n## Run a GoFlow insertion\n\nIf you want to send sFlow/NetFlow/IPFIX to a GoFlow, run the following:\n\nUsing Postgres:\n```\n$ cd compose\n$ docker-compose -f docker-compose-postgres-collect.yml up\n```\n\nUsing Clickhouse (see next section):\n```\n$ cd compose\n$ docker-compose -f docker-compose-clickhouse-collect.yml up\n```\n\nKeep in mind this is a development/prototype setup.\nSome components will likely not be able to process more than a few\nthousands rows per second.\nYou will likely have to tweak configuration statements,\nnumber of workers.\n\nUsing a production setup, GoFlow was able to process more than +100k flows\nper seconds and insert them in a Clickhouse database.\n\n## About the Clickhouse setup\n\nIf you choose to visualize in Grafana, you will need a\n[Clickhouse Data source plugin](https://grafana.com/grafana/plugins/vertamedia-clickhouse-datasource).\nYou can connect to the compose Grafana which has the plugin installed.\n\nThe insertion is handled natively by Clickhouse:\n* Creates a table with a [Kafka Engine](https://clickhouse.tech/docs/en/operations/table_engines/kafka/).\n* Uses [Protobuf format](https://clickhouse.tech/docs/en/interfaces/formats/#protobuf).\n\nNote: the protobuf messages to be written with their lengths.\n\nClickhouse will connect to Kafka periodically and fetch the content. Materialized views \nallow to store the data persistently and aggregate over fields.\n\nTo connect to the database, you have to run the following:\n```\n$ docker exec -ti compose_db_1 clickhouse-client\n```\n\nOnce in the client CLI, a handful of tables are available:\n* `flows` is directly connected to Kafka, it fetches from the current offset\n* `flows_raw` contains the materialized view of `flows`\n* `flows_5m` contains 5-minutes aggregates of ASN\n\nCommands example:\n```\n:) DESCRIBE flows_raw\n\nDESCRIBE TABLE flows_raw\n\n┌─name───────────┬─type────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐\n│ Date           │ Date            │              │                    │         │                  │                │\n│ TimeReceived   │ DateTime        │              │                    │         │                  │                │\n│ TimeFlowStart  │ DateTime        │              │                    │         │                  │                │\n│ SequenceNum    │ UInt32          │              │                    │         │                  │                │\n│ SamplingRate   │ UInt64          │              │                    │         │                  │                │\n│ SamplerAddress │ FixedString(16) │              │                    │         │                  │                │\n│ SrcAddr        │ FixedString(16) │              │                    │         │                  │                │\n│ DstAddr        │ FixedString(16) │              │                    │         │                  │                │\n│ SrcAS          │ UInt32          │              │                    │         │                  │                │\n│ DstAS          │ UInt32          │              │                    │         │                  │                │\n│ EType          │ UInt32          │              │                    │         │                  │                │\n│ Proto          │ UInt32          │              │                    │         │                  │                │\n│ SrcPort        │ UInt32          │              │                    │         │                  │                │\n│ DstPort        │ UInt32          │              │                    │         │                  │                │\n│ Bytes          │ UInt64          │              │                    │         │                  │                │\n│ Packets        │ UInt64          │              │                    │         │                  │                │\n└────────────────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘\n\n:) SELECT Date,TimeReceived,IPv6NumToString(SrcAddr), IPv6NumToString(DstAddr), Bytes, Packets FROM flows_raw;\n\nSELECT\n    Date,\n    TimeReceived,\n    IPv6NumToString(SrcAddr),\n    IPv6NumToString(DstAddr),\n    Bytes,\n    Packets\nFROM flows_raw\n\n┌───────Date─┬────────TimeReceived─┬─IPv6NumToString(SrcAddr)─┬─IPv6NumToString(DstAddr)─┬─Bytes─┬─Packets─┐\n│ 2020-03-22 │ 2020-03-22 21:26:38 │ 2001:db8:0:1::80         │ 2001:db8:0:1::20         │   105 │      63 │\n│ 2020-03-22 │ 2020-03-22 21:26:38 │ 2001:db8:0:1::c2         │ 2001:db8:0:1::           │   386 │      43 │\n│ 2020-03-22 │ 2020-03-22 21:26:38 │ 2001:db8:0:1::6b         │ 2001:db8:0:1::9c         │   697 │      29 │\n│ 2020-03-22 │ 2020-03-22 21:26:38 │ 2001:db8:0:1::81         │ 2001:db8:0:1::           │  1371 │      54 │\n│ 2020-03-22 │ 2020-03-22 21:26:39 │ 2001:db8:0:1::87         │ 2001:db8:0:1::32         │   123 │      23 │\n\n```\n\nTo look at aggregates (optimizing will run the summing operation).\nThe Nested structure allows to have sum per structures (in our case, per Ethernet-Type).\n\n```\n:) OPTIMIZE TABLE flows_5m;\n\nOPTIMIZE TABLE flows_5m\n\nOk.\n\n:) SELECT * FROM flows_5m WHERE SrcAS = 65001;\n\nSELECT *\nFROM flows_5m\nWHERE SrcAS = 65001\n\n┌───────Date─┬────────────Timeslot─┬─SrcAS─┬─DstAS─┬─ETypeMap.EType─┬─ETypeMap.Bytes─┬─ETypeMap.Packets─┬─ETypeMap.Count─┬─Bytes─┬─Packets─┬─Count─┐\n│ 2020-03-22 │ 2020-03-22 21:25:00 │ 65001 │ 65000 │ [34525]        │ [2930]         │ [152]            │ [4]            │  2930 │     152 │     4 │\n│ 2020-03-22 │ 2020-03-22 21:25:00 │ 65001 │ 65001 │ [34525]        │ [1935]         │ [190]            │ [3]            │  1935 │     190 │     3 │\n│ 2020-03-22 │ 2020-03-22 21:25:00 │ 65001 │ 65002 │ [34525]        │ [4820]         │ [288]            │ [6]            │  4820 │     288 │     6 │\n```\n\n**Regarding the storage of IP addresses:**\nAt the moment, the current Clickhouse table does not perform any transformation of the addresses before insertion.\nThe bytes are inserted in a `FixedString(16)` regardless of the family (IPv4, IPv6).\nIn the dashboards, the function `IPv6NumToString(SrcAddr)` is used.\n\nFor example, **192.168.1.1** will end up being **101:a8c0::**\n```sql\nWITH toFixedString(reinterpretAsString(ipv4), 16) AS ipv4c\nSELECT\n    '192.168.1.1' AS ip,\n    IPv4StringToNum(ip) AS ipv4,\n    IPv6NumToString(ipv4c) AS ipv6\n\n┌─ip──────────┬───────ipv4─┬─ipv6───────┐\n│ 192.168.1.1 │ 3232235777 │ 101:a8c0:: │\n└─────────────┴────────────┴────────────┘\n```\n\nIn order to convert it:\n```sql\nWITH IPv6StringToNum(ip) AS ipv6\nSELECT\n    '101:a8c0::' AS ip,\n    reinterpretAsUInt32(ipv6) AS ipv6c,\n    IPv4NumToString(ipv6c) AS ipv4\n\n┌─ip─────────┬──────ipv6c─┬─ipv4────────┐\n│ 101:a8c0:: │ 3232235777 │ 192.168.1.1 │\n└────────────┴────────────┴─────────────┘\n```\n\nWhich for instance to display either IPv4 or IPv6 in a single query:\n```sql\nSELECT\n  if(EType = 0x800, IPv4NumToString(reinterpretAsUInt32(SrcAddr)), IPv6NumToString(SrcAddr) AS SrcIP\n```\n\nThis will be fixed in future dashboard/db schema version.\n\n## Information and roadmap\n\nThis repository is an example and does not offer any warranties. I try to update it whenever I can.\nContributions are welcome.\n\nThe main purpose is for users to get started quickly and provide a basic system.\nThis should not be used in production.\n\nI received requests to publish the Flink aggregator source code as you may have seen it\nbeing used in GoFlow presentations.\nUnfortunately, we moved entirely towards Clickhouse, the old code has not been updated in a while.\nIt may get published at some point but this is currently low priority.\n\n## Issue troubleshooting\n\nThe compose files don't bind to specific versions of the containers. You will likely need to `down` in order to clean  the setup (volumes, network), `push` to resynchronize repositories like GoFlow and `build` to rebuild components like inserter .\n\n```bash\n$ docker-compose -f some-yaml-listed-above.yml down\n$ docker-compose -f some-yaml-listed-above.yml pull\n$ docker-compose -f some-yaml-listed-above.yml build\n$ docker-compose -f some-yaml-listed-above.yml up\n```\n","funding_links":[],"categories":["Go","Integrations"],"sub_categories":["ETL and Data Processing"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudflare%2Fflow-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudflare%2Fflow-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudflare%2Fflow-pipeline/lists"}