{"id":13650529,"url":"https://github.com/criteo/netprobify","last_synced_at":"2025-06-22T08:36:41.929Z","repository":{"id":35327962,"uuid":"213343539","full_name":"criteo/netprobify","owner":"criteo","description":"Network probing tool crafted for datacenters (but not only)","archived":false,"fork":false,"pushed_at":"2025-03-06T04:35:58.000Z","size":354,"stargazers_count":33,"open_issues_count":4,"forks_count":12,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-13T17:05:08.264Z","etag":null,"topics":["datacenter","infrastructure","monitoring","network","probing","scapy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/criteo.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-10-07T09:29:26.000Z","updated_at":"2024-12-19T10:31:12.000Z","dependencies_parsed_at":"2024-01-03T05:09:46.012Z","dependency_job_id":"29ec9e8e-d43b-444e-8375-42bd36b6f185","html_url":"https://github.com/criteo/netprobify","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/criteo/netprobify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/criteo%2Fnetprobify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/criteo%2Fnetprobify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/criteo%2Fnetprobify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/criteo%2Fnetprobify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/criteo","download_url":"https://codeload.github.com/criteo/netprobify/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/criteo%2Fnetprobify/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261263272,"owners_count":23132553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datacenter","infrastructure","monitoring","network","probing","scapy"],"created_at":"2024-08-02T02:00:37.576Z","updated_at":"2025-06-22T08:36:36.915Z","avatar_url":"https://github.com/criteo.png","language":"Python","readme":"![build](https://travis-ci.org/criteo/netprobify.svg?branch=master)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# Requirements\n\nPython \u003e= 3.8\n\n# What is netprobify?\n\nnetprobify is a tool to probe destinations using various protocols/methods.\n\nUsing scapy makes the tool easy to extend as well as adding new kinds of probe.\n\nThe tool is designed to scale by using multiprocessing.\n\nAlso as it uses scapy, no sockets are actually opened.\n\n# Usecases\n\nAt Criteo, netprobify is used to provide metrics to all services provided by the network teams: datacenter, WAN, internet.\n\nFor example:\n- datacenter: we probe all of our Top of Racks using UDPunreachable probing mode\n- WAN: full mesh between our datacenters in TCPsyn probing mode\n- internet: probing common and strategic targets using TCPsyn, UDPunreachable, ICMPping\n\n# How to use netprobify\n\n## How to run it\n\nTo run netprobify you can:\n- run `sudo netprobify_start.py` from the source code\n- or build a PEX and use it (see details in `How to build` section below)\n\nIf you are willing to run it in production, the use of \"stable\" branch is recommended.\n\n## How to configure it\n\nTo configure netprobify, you need to add a `netprobify.yaml` configuration file.\n\nAll the details regarding the configuration can be found in `netprobify/schema_config.yaml`.\n\nYou can find a quick example below:\n```\n---\nglobal:\n  probe_name: \"lab\"\n  interval: 30\n  nb_proc: 8\n  verbose: 0\n  logging_level: INFO\n  prometheus_port: 8000\n  dns_update_interval: 300\n  reload_conf_interval: 0\n\ngroups:\n  standard:\n    src_port_a: 65100\n    src_port_z: 65199\n\ntargets:\n  google_ipv4:\n    description: \"google_search\"\n    type: \"TCPsyn\"\n    destination: \"google.com\"\n    address_family: \"ipv4\"\n    dst_port: 443\n    nb_packets: 100\n    timeout: 1\n  google_ipv6:\n    description: \"google_search\"\n    type: \"TCPsyn\"\n    destination: \"google.com\"\n    address_family: \"ipv4\"\n    dst_port: 443\n    nb_packets: 100\n    timeout: 1\n  bing:\n    description: \"bing_search\"\n    type: \"TCPsyn\"\n    destination: \"bing.com\"\n    dst_port: 443\n    nb_packets: 100\n    timeout: 1\n```\n\nThen, you just have to scrape the result using Prometheus. In this example, you will need to scrape the host on port 8000.\n\n## Prometheus basic authentication\n\nFor security purposes, netprobify can secure /metrics endpoint using HTTP basic authentication.\nTo set this up, you need to add `PROM_USER` and `PROM_PASSWORD` environment variables.\n\n## Prometheus alerts rules\n\nYou will find below example of Prometheus alerts rule for netprobify.\n\nRaise an alert when loss ratio is above 0.1%:\n\u003e tcpsyn_loss_ratio{probe_name=\"lab\"} * 100 \u003e 0.1\n\nSame but only if the probe is actually sending packets:\n\u003e tcpsyn_loss_ratio{probe_name=\"lab\"} * 100 \u003e 0.1 and on(probe_name) sum by(probe_name) (increase(tcp_syn_sent_total,probe_name=\"lab\"}[10m])) \u003e 0\n\nRaise an alert if the latency is above 100 millisecond:\n\u003e tcpsyn_round_trip_seconds{probe_name=\"lab\",percentile=\"95\"} * 1000 \u003e 100\n\nRaise an alert if the probe is taking too long to probe all the targets (more than 90 seconds):\n\u003e app_iteration_time_seconds{probe_name=\"lab\"} \u003e 90\n\nRaise an alert if netprobify is not running (or not scraped by Prometheus):\n\u003e up{instance=~\"lab\",job=\"netprobify\"} == 0\n\nRaise an alert if the probe is not sending any TCPsyn packets:\n\u003e sum by(probe_name) (increase(tcpsyn_sent_total{probe_name=\"lab\"}[10m])) == 0\n\nRaise an alert if the probe reloaded with a bad configuration:\n\u003e app_reload_conf_failed_status{probe_name=\"lab\"} \u003e 0\n\n## Grafana examples\n\n![netprobify workflow](https://raw.githubusercontent.com/criteo/netprobify/master/images/grafana_probing.png)\n\n![netprobify workflow](https://raw.githubusercontent.com/criteo/netprobify/master/images/grafana_application_health.png)\n\n# How to build\n\n## During developement phase\n\n1. Create a virtualenv to avoid that local packages clash with your system\n   * `python3 -m venv .venv`\n   * `source .venv/bin/activate`\n2. Once in your venv, install all the dependencies\n   * `pip install -r requirements/netprobify.txt`\n   * `pip install -r requirements/tests.txt`\n   * `pip install -e .`\n3. Run your program\n   * `sudo netprobify`\n\n## How to run the tests\n\n1. Run the command `tox`. It will run tests, code coverage, linter for python3.\n\n## Build an executable\n\n1. Get out of your virtualenv by running in your shell\n   - `deactivate`\n2. Run the command `tox -e bundle`. It will build the pex\n3. You will find your executable in dist/netprobify\n\n# Architecture\n\n## Workflow\n\n![netprobify workflow](https://raw.githubusercontent.com/criteo/netprobify/master/images/netprobify-workflow.png)\n\n## Probes\n\nnetprobify can probe a host using an IP address, or a hostname, or a subnet.\nHowever, pinging a subnet will aggregate the results, and not expose metrics\nby hosts.\n\nIf a hostname is used to define the probe, the DNS resolution will be done\nat the interval defined in the config file (global or in the target definition).\n\nAll probes type can be specified with payload size.\n\n### TCPsyn\n\nThis probe is using the TCPsyn stealth:\n- send a TCP SYN\n- wait for a\nresponse (TCP SYN or ICMP)\n- send a TCP RST to close the connection\n- calculate the latency between the TCP SYN and the first response.\n\nTo avoid collision, a seq id is defined using a global counter.\nThat way, even if a target is defined twice and run at the same time,\nthe tool will be able to match the response packets with the good sent packet.\n\n### ICMP\n\nThis probe is using ICMP echo request. It is a basic ping.\n\n### UDPunreachable\n\nUDPunreachable probe goal is to target an UDP closed port.\nIt waits for an ICMP Destination Unreachable (Port unreachable).\n\nIt can be useful to target network devices when TCPsyn stealth doesn't work.\nThe interest compared to ICMP, is that UDP is using ECMP by changing the source port.\n\nTo avoid collision, a unique ID parameter is setup at the IP level for each packets sent.\n\nIt works out of the box on Arista and Juniper devices.\n\nIf you are targeting another devices, you should make sure there is no rate-limit applied\nto ICMP Destination Unreachable.\n\nBy default, on linux:\n- icmp_ratemask = 6168\n- icmp_ratelimit = 1000\n\nYou can either deactivate completely the rate-limit, or simply deactivate the rate-limit for\nICMP Destination Unreachable.\n\nTo do so, you just have to set icmp_ratemask to 6160.\n\nMore details about icmp_ratemask:\n\nicmp_ratemask - INTEGER\n\tMask made of ICMP types for which rates are being limited.\n\tSignificant bits: IHGFEDCBA9876543210\n\tDefault mask:     0000001100000011000 (6168)\n\n\tBit definitions (see include/linux/icmp.h):\n\t\t0 Echo Reply\n\t\t3 Destination Unreachable *\n\t\t4 Source Quench *\n\t\t5 Redirect\n\t\t8 Echo Request\n\t\tB Time Exceeded *\n\t\tC Parameter Problem *\n\t\tD Timestamp Request\n\t\tE Timestamp Reply\n\t\tF Info Request\n\t\tG Info Reply\n\t\tH Address Mask Request\n\t\tI Address Mask Reply\n\nsource: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt\n\n## Group notion\n\nA group has several parameters. The most important ones are:\n- src_port_a\n- src_port_b\n\nTogether, they define a range. For each TCPSyn target associated to a group,\npackets will be generated. The number of packets sent per group is configured\nvia nb_packets parameter in the target definition. It will change the source port\nfor each packet by using round robin in the range define in the group.\n\nBy default, all targets are associated to all groups.\nBut you can change this behavior with parameters in config.yaml\n\n- groups: permit_target_auto_register\n  - default: true\n  - if false, the targets will not automatically be in the group\n- targets: auto_register_to_groups\n  - default: true\n  - if false, the target will not be automatically in any group\n- targets: explicit_groups/register_to\n  - default: none\n  - explicity associate a target to a group even if permit_target_auto_register\n    is set to false\n- targets: explicit_groups/exclude_from\n  - default: none\n  - explicity remove the target from a group (useful when permit_target_auto_register\n    is set to true)\n\n## Threshold\n\nThe thresholds are exposed in prometheus with the right label,\nso you can match it with a metric and then create an alert.\n\nThe value unit must match the metric you want to monitor.\n\nExample:\n- Latency in seconds\n- Loss in percentage\n\nExample of Prometheus alert using the threshold metrics:\n\nRaise an alert if the latency is above the threshold defined in the netprobify configuration file:\n\u003e tcpsyn_round_trip_seconds{probe_name=\"lab\",percentile=\"95\"} * 1000 \u003e on(destination, probe_name) threshold{alert_level=\"paging\",type=\"latency\"} * 1000`\n\n## Dynamic inventories\n\nDynamic inventories are custom modules loaded automatically.\nThe goal is to set dynamically targets based on dynamic sources such as a CMDB, an API etc...\n\nTo load a dynamic inventory, you have to add a Python module in the dynamic_inventories directory.\n\nThe module must contain a \"start\" method with the following parameters:\n- targets: dict shared among all processes (main process and dynamic inventories)\n           Each modules should register its targets in \"targets[module_name]\"\n           The minimal targets parameters are defined in schema_config.yaml\n- module_name\n- logging_level\n\nAll modules are started only at the netprobify startup in a dedicated subprocess.\nSo, you may want the module to have an infinite loop.\n\n## Other parameters\n\nAll parameters are defined and described in schema_config.yaml\n\n# Known limitations\n\n## BPF filters on IPv6 upper-layer protocols\n\nDue to an inherited limitation from libpcap (see https://github.com/the-tcpdump-group/libpcap/issues/600),\nnetprobify is not able to filter a specific subset of TCP and UDP packets. This will impact performance,\nespecially when you receive real traffic from a target you try to probe: netprobify will also receive this\ntraffic and will have to do more work to identify traffic related to probing (which could lead to false results).\n","funding_links":[],"categories":["Tools"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcriteo%2Fnetprobify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcriteo%2Fnetprobify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcriteo%2Fnetprobify/lists"}