{"id":35640880,"url":"https://github.com/compscifutures/apmonitor","last_synced_at":"2026-04-07T02:05:17.415Z","repository":{"id":327907638,"uuid":"1101889663","full_name":"CompSciFutures/APMonitor","owner":"CompSciFutures","description":"On-prem/LAN availability monitoring with realtime guarantees \u0026 decaying alert pacing.  Multithreaded high speed availability checking for PING, TCP/UDP, QUIC \u0026 HTTP/S resources incl. SSL/TLS cert. pinning. Integrates w/Site24x7 heartbeat monitoring for failover alerts + Slack \u0026 Pushover webhooks. Thread safe, reentrant, easily modifiable.","archived":false,"fork":false,"pushed_at":"2026-02-11T06:07:38.000Z","size":6879,"stargazers_count":2,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-11T06:54:34.259Z","etag":null,"topics":["availability","availability-and-monitoring","discord","h3","heartbeat-monitor","http","http3","https","lan","monitoring","on-premise","pagerduty","pushover","quic","realtime","servermasters","site24x7","slack","ssl-pinning","tcp"],"latest_commit_sha":null,"homepage":"https://blog.andrewprendergast.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CompSciFutures.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-22T12:34:55.000Z","updated_at":"2026-02-11T06:07:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"adfbf3be-bbd3-4a2a-9b81-ac91ed8f3c21","html_url":"https://github.com/CompSciFutures/APMonitor","commit_stats":null,"previous_names":["compscifutures/apmonitor"],"tags_count":21,"template":false,"template_full_name":null,"purl":"pkg:github/CompSciFutures/APMonitor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompSciFutures%2FAPMonitor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompSciFutures%2FAPMonitor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompSciFutures%2FAPMonitor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompSciFutures%2FAPMonitor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CompSciFutures","download_url":"https://codeload.github.com/CompSciFutures/APMonitor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompSciFutures%2FAPMonitor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29669842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T00:11:43.526Z","status":"ssl_error","status_checked_at":"2026-02-20T23:52:33.807Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["availability","availability-and-monitoring","discord","h3","heartbeat-monitor","http","http3","https","lan","monitoring","on-premise","pagerduty","pushover","quic","realtime","servermasters","site24x7","slack","ssl-pinning","tcp"],"created_at":"2026-01-05T11:18:14.395Z","updated_at":"2026-04-01T17:12:27.213Z","avatar_url":"https://github.com/CompSciFutures.png","language":"Python","funding_links":["https://www.paypal.com/donate/?hosted_button_id=WN472NX5XC5CJ"],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"images/APMonitor-logo.png\" alt=\"APMonitor logo\" width=\"128\" height=\"128\" style=\"vertical-align: text-bottom;\"\u003e\n\n# `APMonitor.py` - A Hands-Off Layer 2 \u0026 4 On-Premises Monitoring Tool with Alert Delivery Guarantees\n\n***Built for NOCs and OT/ICS Sensor Networks***: This is an on-prem monitoring tool written completely in very clear Python-only code (so you can modify it) and is designed to work on a LAN for on-prem availability monitoring of resources that aren't necessarily connected to The Internet, and/or where the on-prem monitoring itself is also required to have availability guarantees.\n\nIt is particularly suited to availability monitoring of embedded devices +/- 10 secs. It's designed primarily for firewalls, switches, routers, hubs, environmental sensors \u0026 #OT / #ICS systems, but works with normal servers \u0026amp; services as well.\n\nIt supports multi-threading of the availability checking of monitored resources for high speed near-realtime performance, if that is what you need (see the `-t` command line option). The default operation mode is single-threaded for log clarity that runs on small systems like a Raspberry Pi.\n\nIt also supports pacing of monitoring alarms using a decaying curve that delivers alert notifications quickly at the start, then slows down notifications over time.\n\n`APMonitor.py` (APMonitor) is primarily designed to work in tandem with [Site24x7](https://site24x7.com) and integrates very well with their \"[Heartbeat Monitoring](https://www.site24x7.com/help/heartbeat/)\".\n\nTo achieve **guaranteed always-on monitoring service levels**, simply setup local availability monitors in your config, [sign-up for a Pro Plan at Site24x7](https://www.site24x7.com/site24x7-pricing.html) then use `heartbeat_url` and `heartbeat_every_n_secs` configuration options to `APMonitor.py` to ping a [Heartbeat Monitoring](https://www.site24x7.com/help/heartbeat/) URL endpoint at [Site24x7](https://site24x7.com) when the monitored resource is up. This then ensures that when a heartbeat doesn't arrive from APMonitor, monitoring alerts fall back to Site24x7, and when both are working you have second-opinion availability monitoring reporting.\n\n**The service level guarantee works as follows:** If the resource is down, `APMonitor.py` won't hit the [Heartbeat Monitoring](https://www.site24x7.com/help/heartbeat/) endpoint URL, and Site24x7 will then send an alert about the missed heartbeat without the need for any additional dependencies on-prem/on-site. So the entire machine `APMonitor.py` is running on can fall over, and you still get availability monitoring alerts sent, with all the benefits of having on-prem monitoring on your local network behind your firewall.\n\nYou can quickly signup for a [Site24x7.com Lite or Pro Plan](https://www.site24x7.com/site24x7-pricing.html) for \\$10-\\$50 USD per month, then setup a bunch of [Heartbeat Monitoring](https://www.site24x7.com/help/heartbeat/) URL endpoints that works with `APMonitor.py` rather easily.\n\n**Note: Heartbeat Monitoring is not available on their Website Monitoring plans. You need an 'Infrastructure Monitoring' or 'All-In-One' plan for it to work correctly.**\n\nAPMonitor also integrates well with [Slack](https://slack.com/) and [Pushover](https://pushover.net/) via webhook URL endpoints, and supports email notifications via SMTP.\n\nAPMonitor is a neat way to guarantee your on-prem availability monitoring will always let you know about an outage and to avoid putting resources onto the net that don't need to be.\n\n\u003cb\u003eAndrew (AP) Prendergast\u003c/b\u003e\u003cbr /\u003e\nhttps://linktr.ee/CompSciFutures\u003cbr /\u003e\nMaster of Science\u003cbr /\u003e\n\n\u003ci\u003eEx-ServerMasters\u003cbr/\u003e\nEx-Googler\u003cbr /\u003e\nEx-Xerox PARC/PARK\u003cbr/\u003e\nEx-Intel Foundry\u003cbr/\u003e\nEx Chief Scientist @ Clemenger BBDO / Omnicom\u003c/i\u003e\n\n\u003ci\u003e[ACM](https://acm.org/), [IEEE](https://ieee.org) \u0026 [INFORMS](https://informs.org) member.\u003c/i\u003e\n\n[![buy-me-a-coffee.png](images/buy-me-a-coffee.png)](https://www.paypal.com/donate/?hosted_button_id=WN472NX5XC5CJ)\n\n\u003ci\u003eIf you find APMonitor.py useful in your NOC, for monitoring your IOT/ICS devices,\nor would like email / telephone support, please consider\n\u003ca href=\"https://www.paypal.com/donate/?hosted_button_id=WN472NX5XC5CJ\"\u003ea regular donation via Buy me a coffee\u003c/a\u003e,\nso I can keep improving it.\u003cbr /\u003e\n\nTelephone Support: +61497222775\u003cbr /\u003e\nSupport email: hello@enertium.org\u003cbr /\u003e\n\u003c/i\u003e\n\n# Quickstart\n\nTo run APMonitor with a configuration file and auto-derived statefile under `/var/tmp/APMonitor/`:\n```bash\n./APMonitor.py test-apmonitor-config.yaml --generate-rrds\n./APMonitor.py site1.yaml site2.yaml --generate-mrtg-config\n```\n\nTo properly setup `APMonitor.py`:\n\n1. Spin up Debian Linux on a VM or PC on a Card/PC on a Chip (e.g., rPI) - optional but recommended\n\n    This is required because control of `/var/www/html` is taken over when installing the MRTG web interface.\n\n2. Install APMonitor (to spin up `APMonitor.py` in `systemctl` as `apmonitor.service`)\n    ```bash\n    sudo make install\n    ```\n\n3. Install MRTG web interface (to spin up an NGINX webserver for MRTG charts in `systemctl` as `apmonitor-nginx.service`)\n    ```bash\n    sudo make installmrtg\n    ```\n\n4. Edit `/usr/local/etc/apmonitor-config.yaml`\n\n   See \u003ca href=\"#apmonitorpy-yamljson-site-configuration-options\"\u003eConfiguration Options\u003c/a\u003e for site file configuration details.\n\n4. Test the config (using `./APMonitor.py --test-config /usr/local/etc/apmonitor-config.yaml`):\n    ```\n    sudo make test-config\n    ```\n\n6. Start monitoring:\n    ```bash\n    sudo make enable\n    ```\n\n   **Note:** Statefiles are stored under `/var/tmp/APMonitor/` by default, e.g. `/var/tmp/APMonitor/apmonitor-config.statefile.json` for a default install. The `-s` flag overrides this for single-config invocations only.\n\nThat's it!\n\n\u003e [!WARNING]\n\u003e If you are upgrading to the 1.3.x stream: This is a schema change release stream that contains RRD \u0026 config YAML schema changes that require existing RRD files to be deleted and recreated before upgrading.\n\u003e APMonitor will auto-heal existing RRDs on first run when `--generate-rrds` or `--generate-mrtg-config` is specified.\n\u003e\n\u003e To do a full upgrade change your YAML to replace `type: snmp` with `type: ports` then execute something similar to this command:\n\u003e\n\u003e ```\n\u003e cp tellusion-apmonitor-config.yaml /usr/local/etc/apmonitor-config.yaml; \\\n\u003e make install; make installmrtg; \\\n\u003e rm /var/tmp/apmonitor-statefile.rrd/*\n\u003e ```\n\n\n# Expected Output with \u003ca href=\"https://github.com/CompSciFutures/APMonitor?tab=readme-ov-file#mrtgrrd-integration-for-performance-graphing\"\u003eMRTG/RRD Integration Enabled\u003c/a\u003e\n\nInstalling MRTG with `make install; make installmrtg` will spin up via `rc.d` a small lightweight NGINX web server with FastCGI on http://localhost:888/, as follows:\n\n![mrtg-availability.png](images/mrtg-availability.png)\n\nThis layout is specifically designed for now commonly available 4K Ultra HD (3840x2160 16:9 2160p) screens. It's not uncommon to see modern NOCs with an array of these on the wall at eye height when someone is sitting down.\nInstead of just having CCTV, you can now add some proper network telemetry and instrumentation, say with one YAML site file per screen, on the top row of screens.\n\nClicking on the heading associated with a set of ports will provide more L2/L3 information (depending on what's available via SNMP):\n\n\u003cimg src=\"images/L2L3-detail-page.png\" width=\"650\" /\u003e\n\nNote the NGINX/FastCGI combination means we don't need to keep a machine chewing on itself generating charts anymore - they are now generated on demand in near-realtime and extremely efficiently. The only I/O is the RRD files, which under the hood operate very much like the older MRTG text file format.\n\nI chose RRD because it's a rather good frequency domain format for data warehousing of frequency domain sample data that's still compatible with Tier 1 NOCs.\n\nIf you want to work with this data directly, consider looking at \u003ca href=\"https://librosa.org/doc/latest/index.html\"\u003eLibROSA\u003c/a\u003e from NYU's Fourier Lab team.\nIt is designed for working with Frequency Domain/Time Domain data and has a rather nifty spectrogram visualisation which might be relevant to you, amongst other things.\nSee the \u003ca href=\"https://www.youtube.com/watch?v=MhOdbtPhbLU\"\u003elaunch lecture given at SciPy\u003c/a\u003e for more information.\n\nYou might also want to look at \u003cA href=\"https://nixtla.io\"\u003enixtla.io\u003c/a\u003e or R's seasonal decomposition function called `stl`. Nixtla is more advanced and I've \u003ca href=\"https://x.com/CompSciFutures/status/2033814554430607794?s=20\"\u003eposted on 𝕏 about it here\u003c/a\u003e.\n\n\n# Design Philosophy \u0026amp; Provenance\n\nOnce upon a time, I was well known in data center circles along Highway 101 in Silicon Valley for carrying in my back pocket a super lightweight pure C/libc cross-platform availability monitoring tool with no dependencies whatsoever called `APMonitor.c`. I'd graciously provide the source code to anyone who asked.\n\nThis is a rebuild of that project with enhanced features, starting with a Python prototype.\n\nThe design philosophy centers on simplicity and elegance: a single, unified source file containing the main execution flow for a 100% on-premises/LAN availability monitoring tool with guaranteed alerts and intelligent pacing.\n\nKey Features:\n\n- Near-realtime programming so heartbeats and alerts arrive when they say they are going to (+/- 10 secs)\n- Multithreaded high-speed availability checking for PING, TCP, UDP, QUIC, HTTP(S), and SNMP resources\n- SSL/TLS certificate checking and pinning so you can use self-signed certificates on-lan safely\n- SNMP monitoring for network device interface bandwidth, I/O statistics, and TCP retransmit metrics\n- Host performance monitoring (CPU, memory, disk I/O, swap, interrupts) per *System Performance Tuning* by Musumeci \u0026 Loukides (O'Reilly)\n- Integration with Site24x7/PagerDuty heartbeat monitoring for high-availability second-opinion and failover alerting\n- Integration with Slack and Pushover webhooks for notifications, plus standard email support\n- Smart notification pacing: rapid alerts initially, then gradually decreasing frequency for extended outages\n- Multi-site monitoring: for multiple single panes of glass, pass multiple config files on the command line; each runs concurrently as an independent subprocess with its own statefile, RRD database, and MRTG index\n- Runs on everything from Raspberry Pi to enterprise systems\n- Super accurate, high-frequency monitoring for real-time / embedded / heartbeat monitored environments\n- Thread-safe, reentrant, and easily modifiable\n- GPL 3.0 free open source always, so you know there are no backdoors\n\n## Alternatives\n\nIf lightweight or realtime guarantees aren't important to you, and you want something more feature packed,\nconsider these on-prem alternatives:\n\n- Uptime Kuma\n- Statping\n- UptimeRobot\n- Paessler PRTG\n\nAPMonitor is simple, minimalist, elegant and lightweight and comes from a reliable line of heritage so you can spin\nit up fast as a 2nd opinion monitoring tool with little more than a `make install`. If you want something more\nsophisticated that's less focused on realtime programming or elegant simplicity, take a look at those very capable\nalternatives.\n\n# Relevance to the 12 Pillars of Information Security\n\nNB: This tool is useful for implementing the second \u0026amp; third pillars (Availability \u0026amp; System Integrity)\nfrom the 12 Pillars of Information Security, for Necessary, Sufficient \u0026 Complete Security:\n\n\u003cimg src=\"images/The-Pillars-of-Information-Security.png\" width=\"500\" /\u003e\n\nAlso be mindful of the Attack Surface Kill-Switch Riddle:\n\n![The-attack-surface-kill-switch-riddle.png](images/The-attack-surface-kill-switch-riddle.png)\n\nTo address this riddle, you should try to configure your machines \u0026 devices so that even if they are shutdown or halted in some way,\nthe Ethernet MAC address can still be read at Layer 2 so you can still receive alerts like this:\n\n\u003cimg src=\"images/port-change-notification.png\" width=\"500\" /\u003e\n\nNB. Be careful that your definition of \"Kill Switched\" is well defined and tested before the need to make use of it comes time.\nE.g., downing a port never works long term, it's merely advisory and something one does as they walk across the floor to unplug the cable from a switch.\nOr is it, if you have this? YMMV.\n\nSee DOI [10.13140/RG.2.2.12609.84321](https://doi.org/10.13140/RG.2.2.12609.84321) and associated [LinkedIn post](https://www.linkedin.com/feed/update/urn:li:activity:7331490410197905409/) for more information on the Pillars of Information Security. It borrows from a piece of work I did back when #PARC needed me to work on #BookMasters in the digital era.\n\n## Recommended configurations for addressing the first pillar: Physical Security\n\nUsing `APMonitor.py` to address Availability \u0026 System Integrity can help with maintaining Physical Security. Here are some tips from the trenches on keeping server equipment secure.\n\n### Removing SIM Cards from Inner Range T4000 remote monitored alarm devices\n\nInner Range has become a dominating force in access control and alarm systems in IDCs, offices and high-end homes around the western world in recent times. \nWhat installers don't tell you is that they are full of vendor backdoors. The best way to address this is to remove it's access to your monitoring station via 3G/4G via The Internets entirely and put it into your LAN so it goes through normal governance, risk and compliance as per all other devices.\n\nNB: Know this: in addition to vendor backdoors, every remote monitored alarm is a reverse shell. That's just how it is.\n\nSteps to securing your T4000 and Inner Range devices from Vendor Backdoors:\n\n1. Block all communications with Inner Range directly fromm your IOT network:\n\n    You do not want your T4000, Inception or Integriti devices communicating with the \u003ca href=\"https://www.skytunnel.com.au/info\"\u003edefault IPs associated with Inner Range which are published here\u003c/a\u003e.\n\n2. Remove the SIMs from your T4000 so all traffic routes through your availability monitored network:\n\n    A boxed T4000 unit:\n\n    \u003cimg src=\"images/T4000-boxed.jpeg\" width=\"500\" /\u003e\n\n    A T4000 unit with it's SIMs removed:\n\n    \u003cimg src=\"images/T4000-sim-strated.jpeg\" width=\"500\" /\u003e\n\n    This will stop it talking to home base with reverse shells and vendor backdoors.\n\n4. Plugin the GigE adapter from your IOT network to the T4000 (grey cable in picture above).\n\nNB: Removing the SIMs breaks the circuit that allows the device to communicate wirelesley.\n\nNNB: This is a valid enterprise grade T4000 configuration.\n\n\n\n\n\n\n### Using Chinese made pin entry locks with protective covers\n\nAll locks can be picked, and all high security registered key systems can have additional keys cut by the police\nor anyone persuasive enough (read: vendor backdoors \u0026 $$$ respectively) to get a locksmith to make spare key.\nI've seen it happen to server rooms several times over the years.\n\nTo get around the problem, we combine normal physical locks with \u003ca href=\"https://www.ebay.com/sch/159907/i.html?_nkw=electronic+pin+door+lock\u0026_salic=45\u0026LH_LocatedIn=1\u0026mkcid=1\u0026mkrid=711-53200-19255-0\u0026siteid=0\u0026campid=5339147142\u0026customid=\u0026toolid=10001\u0026mkevt=1\"\u003eChinese made electronic pin locks from eBay\u003c/a\u003e,\nbut they all suffer the same issue of being circumventable using a credit card or knife, as this video demonstrates how easy it is:\n\n\u003cimg src=\"physical-security/IMG_0944.gif\" width=\"500\" height=\"888\" /\u003e\u003cbr /\u003e\u003cbr /\u003e\n\nTo address the problem, we get a metal fab to manufacture a protective plate to cover the lock so it can't be so easily circumvented:\n\n\u003cimg src=\"physical-security/striker-plate-cover.jpeg\" width=\"500\" /\u003e\n\nHere is the same video for a lock with a plate installed - can't open it now:\n\n\u003cimg src=\"physical-security/IMG_4424.gif\" width=\"500\" height=\"888\" /\u003e\u003cbr /\u003e\u003cbr /\u003e\n\nAnd here are the basic plans to get a metal fab to create a Protective Striker Cover Plate for you: \n\n[![PDF preview](physical-security/Striker-Plate-Cover-CAD-design.png)](physical-security/Striker-Plate-Cover-CAD-design.pdf)\n\nFor maximum security, try to customize the lip that covers the front of the door to be as wide as possible without \nbumping into the actual lock (marked as 35.0 and 19.3 in the CAD diagram).\n\n### Using a span port + tcpdump to analyse IOT traffic for security devices\n\nSometimes we just want to know what a device or an IOT network is communicating with on The Internets. Here is how it's done.\nFirst you need to slurp up some packets using tcpdump + spans, then analyse it using tshark and sed/awk/grep, as follows.\n\nSteps to monitor TCP/IP connectivity by a device:\n\n1. Setup your IOT switch so that all traffic over the uplink port is spanned onto a secondary port (all managed switches do this - look at the manual on how to setup a span).\n\n    NB: `APMonitor.py` may take this input as a live feed in future, so get used to working with spans and taps.\n\n2. Plug a linux box into the span port and dump the traffic on the port using `tcpdump` into daily `.pcap` files:\n\n    ```\n    apt install tcpdump wireshark tshark\n    tcpdump -i eno1 \\\n        -nn -e -v -t --print --immediate-mode -l \\\n        -G 86400 -Z ap -w %Y%m%d-%H%M%S-eno1.pcap -W 90 -C 10240\n    ```\n\n3. Run this script over the `.pcap` files:\n\n    ```\n    ls *.pcap | \\\n    xargs -I {} tshark -r {} -d tcp.port==40844,http -d tcp.port==40844,tls -Y '(eth.addr==00:11:b9:06:93:fe or eth.addr==00:11:b9:09:04:ff) and (ip or ipv6)' -T fields -e eth.src -e eth.dst -e ip.version -e ip.proto -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport -e udp.srcport -e udp.dstport -e http.host -e tls.handshake.extensions_server_name \u003e /tmp/tshark_output.txt\n    \n    awk -F'\\t' '\n    # Pass 1: Build lookup table\n    NR==FNR {\n        ip = ($1 == \"00:11:b9:06:93:fe\" || $1 == \"00:11:b9:09:04:ff\") ? $6 : $5;\n        http_host = $11;\n        tls_sni = $12;\n        if ((http_host || tls_sni) \u0026\u0026 !app_hosts[ip]) {\n            app_hosts[ip] = http_host ? http_host : tls_sni;\n            print \"added: \" ip \" = \" app_hosts[ip] \u003e \"/dev/stderr\";\n        }\n        next;\n    }\n    # Pass 2: Use lookup table\n    {\n        mac = ($1 == \"00:11:b9:06:93:fe\" || $1 == \"00:11:b9:09:04:ff\") ? $1 : $2;\n        ip = ($1 == \"00:11:b9:06:93:fe\" || $1 == \"00:11:b9:09:04:ff\") ? $6 : $5;\n        proto = ($4 == \"6\") ? \"tcp\" : ($4 == \"17\") ? \"udp\" : $4;\n        src_port = $7 ? $7 : $9;\n        dst_port = $8 ? $8 : $10;\n        remote_port = ($1 == \"00:11:b9:06:93:fe\" || $1 == \"00:11:b9:09:04:ff\") ? dst_port : src_port;\n        app_host = (app_hosts[ip] ? app_hosts[ip] : \"-\");\n        if (remote_port) print mac \"\\t\" ip \"\\t\" remote_port \"/\" proto \"\\t\" app_host;\n    }\n    ' /tmp/tshark_output.txt /tmp/tshark_output.txt | \\\n    sort | uniq -c | \\\n    awk '{print $1 \"\\t\" $2 \"\\t\" $3 \"\\t\" $4 \"\\t\" $5}' | \\\n    while IFS=$'\\t' read count mac ip port_proto app_host; do\n        hostname=$(host $ip 2\u003e/dev/null | awk '{print $NF}' | sed 's/\\.$//')\n        port=$(echo $port_proto | cut -d/ -f1)\n        proto=$(echo $port_proto | cut -d/ -f2)\n        service=$(getent services \"$port/$proto\" 2\u003e/dev/null | awk '{print $1}')\n        echo \"$count $mac $ip $port_proto ${service:-unknown} $app_host $hostname\"\n    done \u0026\u0026 rm /tmp/tshark_output.txt\n    ```\n   \n    Which for a T4000 should generate output such as the following:\n\n    ```\n    added: 142.251.2.109 = smtp.gmail.com\n    added: 74.125.137.108 = smtp.gmail.com\n    added: 74.125.137.109 = smtp.gmail.com\n    added: 142.251.2.108 = smtp.gmail.com\n    added: 142.250.101.108 = smtp.gmail.com\n    added: 142.250.141.108 = smtp.gmail.com\n    added: 142.250.141.109 = smtp.gmail.com\n    added: 142.250.101.109 = smtp.gmail.com\n    added: 212.227.81.55 = ipv4.connman.net\n    added: 172.67.221.214 = irmsg.vizdynamics.com\n    added: 104.21.67.116 = irmsg.vizdynamics.com\n    201 00:11:b9:06:93:fe 137.116.114.112 40844/tcp unknown - 3(NXDOMAIN)\n    16 00:11:b9:06:93:fe 192.168.68.1 67/udp bootps - 3(NXDOMAIN)\n    5382 00:11:b9:06:93:fe 23.101.229.107 40844/tcp unknown - 3(NXDOMAIN)\n    11 00:11:b9:06:93:fe 255.255.255.255 67/udp bootps - 3(NXDOMAIN)\n    2 00:11:b9:06:93:fe 9.9.9.9 53/udp domain - dns9.quad9.net\n    12 00:11:b9:09:04:ff 104.21.67.116 443/tcp https irmsg.vizdynamics.com 3(NXDOMAIN)\n    16 00:11:b9:09:04:ff 115.70.68.136 123/udp ntp - 115-70-68-136.ip4.exetel.com.au\n    12 00:11:b9:09:04:ff 119.18.6.37 123/udp ntp - smtp.juneks.com.au\n    31 00:11:b9:09:04:ff 129.250.35.251 123/udp ntp - y.ns.gin.ntt.net\n    3 00:11:b9:09:04:ff 129.250.35.251,192.168.68.204 40756/1,17 unknown - 3(NXDOMAIN)\n    18 00:11:b9:09:04:ff 13.55.50.68 123/udp ntp - ec2-13-55-50-68.ap-southeast-2.compute.amazonaws.com\n    46700 00:11:b9:09:04:ff 137.116.114.112 40844/tcp unknown - 3(NXDOMAIN)\n    34 00:11:b9:09:04:ff 139.180.160.82 123/udp ntp - syd.clearnet.pw\n    6 00:11:b9:09:04:ff 139.99.135.247 123/udp ntp - vps-b7eaeed7.vps.ovh.ca\n    76 00:11:b9:09:04:ff 142.250.101.108 587/tcp submission smtp.gmail.com dz-in-f108.1e100.net\n    230 00:11:b9:09:04:ff 142.250.101.109 587/tcp submission smtp.gmail.com dz-in-f109.1e100.net\n    2065 00:11:b9:09:04:ff 142.250.141.108 587/tcp submission smtp.gmail.com dd-in-f108.1e100.net\n    1500 00:11:b9:09:04:ff 142.250.141.109 587/tcp submission smtp.gmail.com dd-in-f109.1e100.net\n    380 00:11:b9:09:04:ff 142.251.2.108 587/tcp submission smtp.gmail.com dl-in-f108.1e100.net\n    1600 00:11:b9:09:04:ff 142.251.2.109 587/tcp submission smtp.gmail.com dl-in-f109.1e100.net\n    15719 00:11:b9:09:04:ff 149.112.112.112 53/udp domain - dns.quad9.net\n    54 00:11:b9:09:04:ff 150.107.75.115 123/udp ntp - time.pickworth.net\n    16 00:11:b9:09:04:ff 159.196.178.7 123/udp ntp - 3(NXDOMAIN)\n    37 00:11:b9:09:04:ff 159.196.3.239 123/udp ntp - 159-196-3-239.9fc403.mel.nbn.aussiebb.net\n    16 00:11:b9:09:04:ff 159.196.45.149 123/udp ntp - record\n    20 00:11:b9:09:04:ff 162.159.200.1 123/udp ntp - time.cloudflare.com\n    24 00:11:b9:09:04:ff 162.159.200.123 123/udp ntp - time.cloudflare.com\n    32 00:11:b9:09:04:ff 167.179.162.50 123/udp ntp - 167-179-162-50.a7b3a2.bne.nbn.aussiebb.net\n    16 00:11:b9:09:04:ff 172.105.179.71 123/udp ntp - 172-105-179-71.ip.linodeusercontent.com\n    100218 00:11:b9:09:04:ff 172.67.221.214 443/tcp https irmsg.vizdynamics.com 3(NXDOMAIN)\n    20826 00:11:b9:09:04:ff 172.67.221.214 80/tcp http irmsg.vizdynamics.com 3(NXDOMAIN)\n    6 00:11:b9:09:04:ff 180.150.8.191 123/udp ntp - bitburger.simonrumble.com\n    11 00:11:b9:09:04:ff 192.168.68.1 123/udp ntp - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 34051/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 35951/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 36204/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 38036/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 40942/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 44065/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 48603/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.203 55896/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.204 42573/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.204 52984/1,17 unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 192.168.68.1,192.168.68.204 57294/1,17 unknown - 3(NXDOMAIN)\n    31 00:11:b9:09:04:ff 192.168.68.1 67/udp bootps - 3(NXDOMAIN)\n    6 00:11:b9:09:04:ff 194.195.249.28 123/udp ntp - ap-southeast-2.clearnet.pw\n    50 00:11:b9:09:04:ff 203.12.5.225 123/udp ntp - my.blockbluemedia.com\n    24 00:11:b9:09:04:ff 203.14.0.250 123/udp ntp - tic.ntp.telstra.net\n    50 00:11:b9:09:04:ff 212.227.81.55 80/tcp http ipv4.connman.net ipv4.connman.net\n    48 00:11:b9:09:04:ff 220.158.215.20 123/udp ntp - 220-158-215-20.broadband.telesmart.co.nz\n    99 00:11:b9:09:04:ff 224.0.0.251 5353/udp mdns - mdns.mcast.net\n    6187 00:11:b9:09:04:ff 23.101.229.107 40844/tcp unknown - 3(NXDOMAIN)\n    1 00:11:b9:09:04:ff 239.255.255.250 1902/udp unknown - 3(NXDOMAIN)\n    38 00:11:b9:09:04:ff 255.255.255.255 67/udp bootps - 3(NXDOMAIN)\n    48 00:11:b9:09:04:ff 27.124.125.250 123/udp ntp - ntp1.ds.network\n    6 00:11:b9:09:04:ff 45.124.53.221 123/udp ntp - ns1.adelaidewebsites.com.au\n    8 00:11:b9:09:04:ff 67.219.100.202 123/udp ntp - mel.clearnet.pw\n    494 00:11:b9:09:04:ff 74.125.137.108 587/tcp submission smtp.gmail.com dy-in-f108.1e100.net\n    643 00:11:b9:09:04:ff 74.125.137.109 587/tcp submission smtp.gmail.com dy-in-f109.1e100.net\n    70 00:11:b9:09:04:ff 82.165.8.211 80/tcp http - 3(NXDOMAIN)\n    15739 00:11:b9:09:04:ff 9.9.9.9 53/udp domain - dns9.quad9.net\n    ```\n\n5. Inspect the list and go through each host/protocol and build a whitelist of what you want to allow.\n\n# Recommended configuration for real-time environments\n\nTo put APMonitor into near-realtime mode so that it checks resources multiple times per second, use these global settings:\n\n- Dial up threads with `-t 15` on the command line or `max_threads: 15` in the site config,\n- set `max_retries` to `1` and\n- dial down `max_try_secs` to `10` or `15` seconds\n\nfor real-time environments.\n\nNB: If you are running `APMonitor.py` out of `systemd` with a default install, not specifying `max_threads` will default to `20`.\n\n\u003e [!WARNING]\n\u003e You need to make sure your configs have enough threads to finish in \u003c\u003c 10 seconds to get near-realtime performance.\n\u003e Make sure `max_threads` \u0026 `max_try_secs` are configured appropriately. Also note that separate site configs are executed\n\u003e in parallel as subprocesses, so any down monitors in one site do not slow down monitors in other sites, regardless of settings.\n\u003e\n\u003e Note that the thing that usually slows down a site configuration are monitors that are down —\n\u003e you need enough threads to cover the maximum number of down monitors at any one time, on average.\n\u003e We say 'on average' because not all monitors are polled simultaneously after a decent period of\n\u003e a site config having been operational.\n\n# Recommended configuration for securing IOT/OT/ICS networks\n\n***IOT is not supposed to be a thing*** - to compensate **if you have an NVR**, you need L2 monitoring of MAC address changes for each OT/ICS device such as cameras, NVRs \u0026 Security Computer on your IOT network.\n\nUse \u003ca href=\"Added note about NVRs, SSL pinning, MAC port monitoring/pinning \u0026 reverse shells\"\u003eLayer 2 Port MAC Change Monitoring\u003c/a\u003e,\n\u003ca href=\"https://github.com/CompSciFutures/APMonitor?tab=readme-ov-file#https-monitor-with-certificate-pinning\"\u003eLayer 4 HTTPS Self-Signed Certificate Pinning\u003c/a\u003e and\n\u003ca href=\"https://github.com/CompSciFutures/APMonitor#single-port-mac-pinning-monitor\"\u003eLayer 2 MAC Address Pinning\u003c/a\u003e so your network can't be tampered with.\n\nTo avoid \u003ca href=\"https://x.com/search?q=%23VendorBackdoors\u0026src=typed_query\"\u003evendor backdoors\u003c/a\u003e, disable IPV6 and stop your IOT devices from communicating directly with The Internets excepting whitelisted addresses for purposes you specify (don't whitelist any cloud admin reverse shells).\n\n\n# Recommended configuration of Site24x7 Heartbeat Monitor Thresholds for HA Availability Monitoring\n\nYou do need to configure Site24x7's Heartbeat Monitoring to achieve high-availability second opinion availability monitoring.\n\nAs an exemplar, for the following monitored resource:\n```yaml\nmonitors:\n  - type: http\n    name: home-nas\n    address: https://192.168.1.12/api/bump\n    expect: \"(C) COPYRIGHT 2005, Super NAS Storage Inc.\"\n    ssl_fingerprint: a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890\n    heartbeat_url: https://plus.site24x7.com/hb/your-unique-heartbeat-id/homenas\n    heartbeat_every_n_secs: 300\n```\n\nSetup Site24x7 as follows:\n\n![site24x7-heartbeat-settings.png](images%2Fsite24x7-heartbeat-settings.png)\n\nThis will send a heartbeat to [Site24x7](https://site24x7.com) every 5 minutes, and Site24x7 will drop an alarm whenever a heartbeat\ndoesn't arrive or arrives out of sequence +/- 1 minute (i.e., if the heartbeat doesn't arrive or is \u003e 60 seconds out).\nThis ensures availability monitoring will always function, even when one of APMonitor or Site24x7 is down.\n\nThis also means you don't need to expose internal LAN network resources to The Internets.\n\nAPMonitor's near-realtime capabilities will deliver heartbeats +/- 10 secs, so if you want high-precision alerts\ndrop an alarm if a heartbeat does not arrive bang on 5 minutes apart +/- 10 secs.\n\nTo see the accuracy, configure Site24x7 as follows:\n\n![site24x7-realtime-heartbeat-settings.png](images/site24x7-realtime-heartbeat-settings.png)\n\nSite24x7 will record the error in their dashboard for anything that is more than +/- 1000 ms out,\nso you can keep a record of how accurate the near-realtime heartbeat timing is.\n\nSee Site24x7 docs for more info:\n- [Heartbeat Monitoring](https://www.site24x7.com/help/heartbeat/)\n- [Thresholds configuration](https://www.site24x7.com/help/admin/configuration-profiles/threshold-and-availability/server-monitor.html)\n\nNB: \"+/- 10 secs\" means your errors should be measurable in 10ths of a minute. Once Mercator Queues are added, this will\ndrop down to \"+/- 1 sec\" or possibly \"+/- 100 ms\", depending on how well Python performs with high-speed realtime\nprogramming. A workaround in the meantime is to make sure your number of threads is equal to the number of monitored\nresources - something that is not necessarily practical or required in most settings.\n\n\n# Recommended configuration for 'Hands-Off' alarm notification pacing\n\nIf you want to avoid the need to connect to the monitoring server to hush alarms as they happen and ensure you receive\nUP notifications as soon as things return to normal, you might also want to consider alarm notification pacing, so that\nrecently down resources generate more frequent messages, whilst long outages are notified less frequently. To enable:\n\n- Set `notify_every_n_secs` to `3600` seconds (i.e., 1 hour), and\n- Set `after_every_n_notifications` to `8`,\n\nwhich will slow alarms down to one per hour after 8 notifications.\n\nAn alternate config for monitored resources that have long outages is as follows:\n\n- Set `notify_every_n_secs` to `43200` (i.e., 12 hours), and\n- Set `after_every_n_notifications` to `6`,\n\nwhich will slow alarms down to one every 12 hours after 6 notifications, which means after a few days you will only get at most one alarm whilst asleep.\n\nTo see how the alarm pacing will accelerate then subsequently delay notifications, use the example calculations spreadsheet in  [20151122 Reminder Timing with Quadratic Bezier Curve.xlsx](devnotes/20151122%20Reminder%20Timing%20with%20Quadratic%20Bezier%20Curve.xlsx) to experiment with various configuration scenarios:\n\n![Screenshot_of_Reminder_Timing_simulator.png](images/Screenshot_of_Reminder_Timing_simulator.png)\n\nNote that alarm pacing can be set at a global level in the `site:` config, and is overridden when set at a per monitored resource level in the `monitors:` section of the config.\n\n# Recommended configuration for running multiple site configurations \u0026 panes of glass\n\nAPMonitor supports monitoring multiple sites from a single service instance by passing multiple configuration files on the command line. Each config file is processed as an independent site with its own statefile, RRD database, and MRTG index page under `/var/www/html/mrtg/\u003csite-name\u003e/`.\n\nThis is useful for running multiple single panes of glass out of one monitoring box.\n\nIf you are running multiple single panes of glass out of one computer, consider buying a \u003ca href=\"https://www.ebay.com/sch/i.html?_nkw=usb+air+mouse\u0026_sacat=51086\u0026_from=R40\u0026_trksid=p2334524.m570.l1313\u0026_odkw=air+mouse\u0026_osacat=51086\u0026mkcid=1\u0026mkrid=711-53200-19255-0\u0026siteid=0\u0026campid=5339147142\u0026customid=\u0026toolid=10001\u0026mkevt=1\"\u003eUSB Air Mouse or three\u003c/a\u003e till you find one that works well for you, like this one:\n\n\u003cimg src=\"images/air-mouse.png\" width=\"500\" /\u003e\n\n## How it works\n\nWhen multiple config files are specified, APMonitor spawns one subprocess per config file and runs them concurrently, joining all subprocesses before exiting. Each subprocess:\n\n- Derives its own statefile automatically from the config filename under `/var/tmp/APMonitor/` (e.g. `apmonitor-config.yaml` → `/var/tmp/APMonitor/apmonitor-config.statefile.json`)\n- Writes its MRTG index and detail pages to `/var/www/html/mrtg/\u003csite-name\u003e/` where `\u003csite-name\u003e` is derived from `site.name` in the config\n- Maintains completely independent monitoring state, notification history, and RRD data\n\n## Systemd service configuration\n\nEdit `/etc/systemd/system/apmonitor.service` to list all config files on the `ExecStart` line:\n```\n[Unit]\nDescription=APMonitor Network Resource Monitor\nAfter=network.target\n\n[Service]\nType=simple\nExecStart=/bin/bash -c 'while true; do /usr/local/bin/APMonitor.py -t 20 -vv /usr/local/etc/apmonitor-config.yaml /usr/local/etc/site2-config.yaml /usr/local/etc/site3-config.yaml --generate-mrtg-config; sleep 10; done'\nRestart=always\nRestartSec=10\nUser=monitoring\nStandardOutput=journal\nStandardError=journal\n\n[Install]\nWantedBy=multi-user.target\n```\n\nIt is useful to keep a commented-out single-site `ExecStart` line for quick debugging:\n```\n#ExecStart=/bin/bash -c 'while true; do /usr/local/bin/APMonitor.py -vv /usr/local/etc/apmonitor-config.yaml --generate-mrtg-config; sleep 10; done'\n```\n\nAfter editing the service file, reload systemd and restart the service:\n```bash\nsudo systemctl daemon-reload\nsudo systemctl restart apmonitor.service\n```\n\nNote that `make install` will preserve a customized `ExecStart` line on subsequent installs — it only writes the default if no service file exists yet.\n\n## Statefiles and MRTG output\n\nEach config file produces its own set of derived files. Statefiles are stored under `/var/tmp/APMonitor/` (mode 755, no www-data access) and MRTG output is written into a per-site subdirectory of the MRTG working directory:\n\n| Config file | Statefile | MRTG index |\n|---|---|---|\n| `apmonitor-config.yaml` | `/var/tmp/APMonitor/apmonitor-config.statefile.json` | `http://\u003chost\u003e:888/mrtg/HomeLab/` |\n| `site2-config.yaml` | `/var/tmp/APMonitor/site2-config.statefile.json` | `http://\u003chost\u003e:888/mrtg/TellusionLab/` |\n| `site3-config.yaml` | `/var/tmp/APMonitor/site3-config.statefile.json` | `http://\u003chost\u003e:888/mrtg/OfficeLab/` |\n\nThe MRTG subdirectory name comes from `site.name` in each config file (sanitised to a filesystem-safe string), not from the config filename. The statefile name is always derived from the config filename stem.\n\n## Default state file location\n\nOn Unix-like systems, APMonitor stores all statefiles under `/var/tmp/APMonitor/`:\n\n- Directory is created automatically with mode `755` (no group write — www-data is explicitly excluded)\n- Persists across reboots (unlike `/tmp`)\n- All sibling files (`.json`, `.json.new`, `.json.old`, `.mrtg.cfg`, `.rrd/`) live in this directory\n\nThe `-s/--statefile` flag overrides this for single-config invocations. It is not valid when multiple config files are specified.\n\n## Migrating statefiles from older versions\n\nIf upgrading from a version that stored statefiles in `/var/tmp/` directly, run:\n```bash\nsudo make migrate\n```\n\nThis performs a two-phase migration:\n\n1. Renames `apmonitor-statefile.*` → `apmonitor-config.statefile.*` in `/var/tmp/` (legacy name fix)\n2. Moves all `apmonitor-*.statefile.*` files and `.rrd` directories from `/var/tmp/` into `/var/tmp/APMonitor/`\n\nThe service is stopped before migration and restarted afterwards. If a destination file already exists it is skipped with a warning rather than overwritten.\n\n## Threading with multiple sites\n\nThe `-t` flag sets the number of monitor-checking threads **per site**, not globally. With three sites and `-t 20`, up to 60 threads may be active concurrently across all subprocesses. Size `-t` based on the largest single site's monitor count rather than the total across all sites.\n\n## Notes\n\n- `-s/--statefile` is not valid when multiple config files are specified — each site always derives its own statefile automatically from the config filename.\n- `make install` writes a default single-site `ExecStart`. Edit it manually after installation to add additional config files — subsequent `make install` runs will preserve your customized `ExecStart`.\n- `make test-config` only tests the default config at `$(CONFIG_DIR)/apmonitor-config.yaml`. Test additional configs directly: `APMonitor.py --test-config /usr/local/etc/site2-config.yaml`.\n\n# Recommended configuration for SNMP monitoring on Debian Linux\n\nTo enable SNMP monitoring on a Debian host so that APMonitor can poll it, install and configure `snmpd` with a read-only community string restricted to your APMonitor machine.\n\n## Install\n```bash\nsudo apt install snmpd snmp\n```\n\n## Configure `/etc/snmp/snmpd.conf`\n\nReplace the default config with the following minimal read-only configuration:\n```\n# Listen on all interfaces (lock to a specific IP if preferred)\nagentAddress udp:161\n\n# Read-only community, restricted to your APMonitor host only\n# Replace 192.168.1.50 with the IP of your APMonitor machine\nrocommunity YourCommunityString 192.168.1.50\n\n# Optional: identify the device\nsysLocation \"Server Room Rack 3\"\nsysContact \"admin@example.com\"\nsysName \"my-debian-host\"\n```\n\n## Enable and restart\n```bash\nsudo systemctl restart snmpd\nsudo systemctl enable snmpd\n```\n\n## Firewall\n\nIf the host runs a firewall, allow UDP port 161 from your APMonitor machine only:\n```bash\n# ufw\nsudo ufw allow from 192.168.1.50 to any port 161 proto udp\n\n# iptables\nsudo iptables -A INPUT -s 192.168.1.50 -p udp --dport 161 -j ACCEPT\n```\n\n## Test from your APMonitor host\n```bash\nsnmpwalk -v 2c -c YourCommunityString 192.168.1.x\n```\n\n## Notes\n\n- `rocommunity` is the read-only directive — the absence of any `rwcommunity` line is what keeps access strictly read-only.\n- Locking the source IP to your APMonitor machine is the primary access control on a LAN. Do not use `default` or `0.0.0.0/0` unless there is no alternative.\n- Change `YourCommunityString` to something non-obvious — `public` is the first string any scanner tries.\n- SNMPv3 with authentication and encryption is the correct choice for hosts on networks you do not fully trust. For a closed LAN behind a firewall, SNMPv2c with a non-default community string and source IP restriction is workable.\n\n## APMonitor configuration\n\nOnce `snmpd` is running, add a `ports` monitor pointing at the host:\n```yaml\n- type: ports\n  name: my-debian-ports\n  address: \"snmp://192.168.1.x\"\n  community: \"YourCommunityString\"\n  check_every_n_secs: 300\n```\n\nFor host performance monitoring (CPU, memory, disk I/O), use `type: host` instead:\n```yaml\n- type: host\n  name: my-debian-host\n  address: \"snmp://192.168.1.x\"\n  community: \"YourCommunityString\"\n  check_every_n_secs: 300\n```\n\n# MRTG/RRD Integration for Performance Graphing\n\nAPMonitor integrates with MRTG (Multi Router Traffic Grapher) and RRDtool to provide historical performance graphs of resource availability and response times. This integration enables trend analysis, capacity planning, and visual monitoring dashboards.\n\n## Quick Start\n\nInstall MRTG and related dependencies:\n```bash\nsudo make installmrtg\n```\n\nThis installs nginx on port 888, fcgiwrap for CGI support, and sets up the MRTG web interface.\n\nEnable RRD data collection by running APMonitor with `--generate-mrtg-config`:\n```bash\n./APMonitor.py -vv -s /var/tmp/apmonitor-statefile.json config.yaml --generate-mrtg-config\n```\n\nAccess graphs at `http://localhost:888/mrtg/\u003csite-name\u003e/` or `http://\u003cyour-ip\u003e:888/mrtg/\u003csite-name\u003e/`.\n\n## How It Works\n\nWhen `--generate-mrtg-config` is specified:\n\n1. **RRD Collection Enabled**: APMonitor records response times and availability status to RRDtool databases\n2. **MRTG Config Generated**: Creates a `.mrtg.cfg` file derived from the statefile path\n3. **Site subdirectory created**: MRTG output (index.html, detail pages) is written to `/var/www/html/mrtg/\u003csite-name\u003e/` where `\u003csite-name\u003e` is sanitised from `site.name` in the config\n4. **Web Interface Updated**: Updates `mrtg-rrd.cgi.pl` with the new config path and generates `index.html`\n5. **Continuous Updates**: Subsequent runs update RRD files and regenerate the index with latest metrics and outage state\n\n**Output file locations:**\n- Statefile: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.json`\n- MRTG config: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.mrtg.cfg`\n- RRD databases:\n  - Availability monitors: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.rrd/\u003cmonitor\u003e-availability.rrd`\n  - SNMP monitors: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.rrd/\u003cmonitor\u003e-snmp.rrd`\n- MRTG index: `/var/www/html/mrtg/\u003csite-name\u003e/index.html`\n- Detail pages: `/var/www/html/mrtg/\u003csite-name\u003e/\u003ctype\u003e-\u003cmonitor\u003e-detail.html`\n- Web interface: `http://localhost:888/mrtg/\u003csite-name\u003e/`\n\n## Command Options\n\nGenerate MRTG config with default base working directory (`/var/www/html/mrtg`):\n```bash\n./APMonitor.py apmonitor-config.yaml --generate-mrtg-config\n```\n\nSpecify a custom base working directory (site subdirectory is always appended):\n```bash\n./APMonitor.py apmonitor-config.yaml --generate-mrtg-config /var/www/html/graphs\n```\n\n## RRD Data Collection\n\n### Availability Monitors (ping, http, quic, tcp, udp)\n\nEach availability monitor's RRD file tracks two metrics:\n\n- **`response_time`** (GAUGE, milliseconds): Time taken for check to complete\n  - Range: 0 to unlimited\n  - Value: `U` (unknown) when check fails\n\n- **`is_up`** (GAUGE, boolean): Service availability\n  - `100` = service up\n  - `0` = service down\n\n### SNMP Monitors (port, ports, host)\n\nAll SNMP-family monitors (`port`, `ports`, `host`) use a single unified RRD schema per device. The schema is divided into three sections: per-interface DS pairs (used by `ports`/`port` only), fixed aggregate network DS (used by `ports`/`port`; stored as `U` for `host`), and fixed host performance DS (used by `host`; stored as `U` for `ports`/`port`).\n\n**Filename**: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.rrd/\u003cmonitor-name\u003e-snmp.rrd`\n\n**Per-Interface Data Sources** (one pair per discovered interface, COUNTER — `ports`/`port` only):\n\n- **`if{index}_in`**: Inbound bytes for interface at ifIndex `{index}` (IF-MIB::ifInOctets)\n- **`if{index}_out`**: Outbound bytes for interface at ifIndex `{index}` (IF-MIB::ifOutOctets)\n\nDS names use the raw ifIndex integer (e.g., `if1_in`, `if2_out`), not the interface description string. DS order is stable — interfaces are sorted numerically by ifIndex at both create and update time.\n\n**Fixed Aggregate Network Data Sources** (COUNTER — `ports`/`port` populated, `host` stores `U`):\n\n- **`tcp_retrans`**: Global TCP retransmit segment counter (TCP-MIB::tcpRetransSegs) — `ports` only\n- **`total_bits_in`**: Sum of inbound octets × 8 across all interfaces\n- **`total_bits_out`**: Sum of outbound octets × 8 across all interfaces\n- **`total_pkts_in`**: Sum of all inbound packets (unicast + multicast + broadcast) across all interfaces\n- **`total_pkts_out`**: Sum of all outbound packets across all interfaces\n- **`total_errors_in`**: Sum of inbound interface errors across all interfaces (IF-MIB::ifInErrors)\n- **`total_errors_out`**: Sum of outbound interface errors across all interfaces (IF-MIB::ifOutErrors)\n- **`total_pkts_ucast`**: Total unicast packets in+out combined across all interfaces\n- **`total_pkts_bmcast`**: Total broadcast+multicast packets in+out combined across all interfaces\n\n**System Resource Data Sources** (GAUGE — all types):\n\n- **`cpu_load`**: CPU utilization percentage, range 0–100. Sourced from vendor-specific OIDs (Cisco/HP/Juniper/Ubiquiti) with HOST-RESOURCES-MIB::hrProcessorLoad as fallback. Stored as `U` if unavailable.\n- **`memory_pct`**: Memory utilization percentage, range 0–100. Sourced from vendor-specific OIDs with HOST-RESOURCES-MIB::hrStorage as fallback. Stored as `U` if unavailable.\n\n**Fixed Host Performance Data Sources** (COUNTER/GAUGE — `host` populated, `ports`/`port` store `U`):\n\n- **`context_switches`** (COUNTER): Raw context switch counter (UCD-SNMP-MIB::ssRawContexts)\n- **`swap_io`** (COUNTER): Raw swap pages in + out combined (UCD-SNMP-MIB::ssRawSwapIn + ssRawSwapOut)\n- **`disk_read`** (COUNTER): Disk read bytes summed across all block devices (UCD-DISKIO-MIB::diskIOReadX)\n- **`disk_write`** (COUNTER): Disk write bytes summed across all block devices (UCD-DISKIO-MIB::diskIOWriteX)\n- **`disk_space_pct`** (GAUGE): Root filesystem utilization percentage 0–100 (HOST-RESOURCES-MIB::hrStorage `/` entry). Also persisted to statefile for display in MRTG index and detail page headers.\n- **`swap_used`** (GAUGE): Swap space used in bytes (HOST-RESOURCES-MIB::hrStorage virtual memory entry, with UCD-SNMP-MIB::memTotalSwap − memAvailSwap as fallback)\n- **`interrupts`** (COUNTER): Raw hardware interrupt counter (UCD-SNMP-MIB::ssRawInterrupts)\n\n**Fixed Tamper/Network Capacity Data Sources** (GAUGE — `ports` only, `port`/`host` store `U`):\n\n- **`ports_up_count`**: Count of interfaces with oper=up\n- **`nvram_flash_bytes`**: Sum of used bytes across NVRAM/flash hrStorage entries\n- **`mac_count`**: Count of learned FDB entries via Q-BRIDGE-MIB\n- **`arp_count`**: Count of ARP entries via ipNetToPhysicalTable / ipNetToMediaTable\n\n**Total fixed DS count: 22** (11 network/system + 7 host performance + 4 tamper/network). Expected DS count for auto-heal check = `(2 × interface_count) + 22`.\n\n**MRTG Targets generated per monitor type:**\n\n| Target suffix | DS pair | Monitor types | Description |\n|---|---|---|---|\n| `-bandwidth` | `total_bits_in` / `total_bits_out` | `ports`, `port` | Total bandwidth in/out (bits) |\n| `-packets` | `total_pkts_in` / `total_pkts_out` | `ports`, `port` | Total packets in/out |\n| `-packets-type` | `total_pkts_ucast` / `total_pkts_bmcast` | `ports`, `port` | Unicast vs broadcast+multicast |\n| `-errors` | `total_errors_in` / `total_errors_out` | `ports`, `port` | Interface errors in/out |\n| `-retransmits` | `tcp_retrans` / `tcp_retrans` | `ports` only | TCP retransmits (single line) |\n| `-system` | `cpu_load` / `memory_pct` | `ports` only | CPU \u0026 memory utilization |\n| `-tamper` | `ports_up_count` / `nvram_flash_bytes` | `ports` only | Active ports \u0026 NVRAM/flash bytes |\n| `-network` | `mac_count` / `arp_count` | `ports` only | Learned MACs \u0026 ARP entries |\n| `-system1` | `cpu_load` / `context_switches` | `host` | CPU \u0026 Load |\n| `-system2` | `memory_pct` / `swap_io` | `host` | Memory \u0026 Paging |\n| `-system3` | `disk_read` / `disk_write` | `host` | Disk I/O (Disk Use % in PageTop) |\n| `-system4` | `swap_used` / `interrupts` | `host` | System Thrashing |\n\n**Notes:**\n- COUNTER type automatically calculates per-second rates and handles 32/64-bit wraparound.\n- All interfaces for a device are stored in a single RRD for atomic updates. If the interface list changes, stale DS entries remain in the RRD unused — the RRD is never recreated on interface list change alone.\n- If the discovered interface count grows such that the expected DS count exceeds what was created, APMonitor auto-heals by deleting and recreating the RRD on the next run.\n- `disk_space_pct` is stored in the RRD as a GAUGE DS and also persisted to the statefile so that `generate_mrtg_config()` and `generate_mrtg_index()` can embed the live value (e.g., `Disk Use: 73.4%`) in MRTG PageTop headers and index cell headings without a live SNMP poll at generation time. Displays as `Disk Use: N/A` until the first successful poll.\n- UCD-SNMP-MIB host performance metrics (context switches, swap I/O, disk I/O, interrupts) are Linux `net-snmp` specific. On network devices (Cisco, HP, Juniper, Ubiquiti), these DS will store `U`.\n\n### RRD Retention Policy\n\n| Time Range | Resolution | MRTG Standard Rows | APMonitor Default |\n|---|---|---|---|\n| High-resolution recent | Native step | 1 day native | 31 days native |\n| Short-term | 5-minute | 600 (~2 days) | 18600 (~64 days) |\n| Medium-term | 30-minute | 600 (~12.5 days) | 18600 (~387 days) |\n| Long-term | 1-hour | — | 43830 (~5 years) |\n| Historical | 1-day | 732 (~2 years) | 22692 (~62 years) |\n\n\u003e [!WARNING]\n\u003e Be careful if upgrading to the 1.3.x stream. This release contains RRD schema changes that require existing RRD files to be deleted and recreated before upgrading. APMonitor will auto-heal existing RRDs on first run when `--generate-rrds` or `--generate-mrtg-config` is specified.\n\nTo use custom retention, modify the row constants in `create_rrd_rras()`:\n```python\nrows_1day_native  = 86400 // step_secs * 31  # 31 days at native resolution\nrows_2days_5min   = 18600                     # ~64 days at 5-min\nrows_12days_30min = 18600                     # ~387 days at 30-min\nrows_5years_1hour = 43830                     # ~5 years at 1-hour\nrows_2years_daily = 22692                     # ~62 years at 1-day\n```\n\n## Working with RRD Files Directly\n```bash\n# Query availability RRD database info\nrrdtool info /var/tmp/APMonitor/apmonitor-config.statefile.rrd/monitor-name-availability.rrd\n\n# Query SNMP RRD database info\nrrdtool info /var/tmp/APMonitor/apmonitor-config.statefile.rrd/switch-snmp.rrd\n\n# Run APMonitor with MRTG \u0026 RRD enabled\n./APMonitor.py -vv apmonitor-config.yaml --generate-mrtg-config\n\n# Check when the RRD was created\nls -la /var/tmp/APMonitor/apmonitor-config.statefile.rrd/tellusion-gw-availability.rrd\n\n# Dump RRD info to see its structure\nrrdtool info /var/tmp/APMonitor/apmonitor-config.statefile.rrd/tellusion-gw-availability.rrd | head -50\n\n# Check the last update timestamp\nrrdtool lastupdate /var/tmp/APMonitor/apmonitor-config.statefile.rrd/tellusion-gw-availability.rrd\n\n# Fetch the last 300 seconds\nrrdtool fetch /var/tmp/APMonitor/apmonitor-config.statefile.rrd/tellusion-gw-availability.rrd AVERAGE -s end-300 -e now\n\n# Fetch SNMP interface data\nrrdtool fetch /var/tmp/APMonitor/apmonitor-config.statefile.rrd/switch-snmp.rrd AVERAGE -s end-3600 -e now\n```\n\n**References:**\n- [MRTG-RRD Documentation](https://directory.fsf.org/wiki/Mrtg-rrd)\n- [mrtg-rrd.cgi FAQ](https://web.archive.org/web/20081228131907/http://www.fi.muni.cz:80/~kas/mrtg-rrd/cvsweb.cgi/FAQ?rev=HEAD)\n- *System Performance Tuning*, 2nd Ed. — Gian-Paolo D. Musumeci \u0026 Mike Loukides (O'Reilly) — the canonical reference for the host performance metrics collected by `type: host`\n\n**Note:** RRD data collection is disabled by default. Run with `--generate-mrtg-config` once to enable, then continue normal monitoring to collect historical data.\n\n# `APMonitor.py` YAML/JSON Site Configuration Options\n\nAPMonitor uses a YAML or JSON configuration file to define the site being monitored and the resources to check. The configuration consists of two main sections: site-level settings that apply globally, and per-monitor settings that define individual resources to check.\n\n## Complete Example Configuration\n\nHere's a complete example showing all available configuration options:\n```yaml\nsite:\n  name: \"HomeLab\"\n\n  email_server:\n    smtp_host: \"smtp.gmail.com\"\n    smtp_port: 587\n    smtp_username: \"alerts@example.com\"\n    smtp_password: \"app_password_here\"\n    from_address: \"alerts@example.com\"\n    use_tls: true\n\n  outage_emails:\n    - email: \"admin@example.com\"\n      email_outages: true\n      email_recoveries: true\n      email_reminders: true\n    - email: \"manager@example.com\"\n      email_outages: yes\n      email_recoveries: yes\n      email_reminders: no\n\n  outage_webhooks:\n    - endpoint_url: \"https://api.pushover.net/1/messages.json\"\n      request_method: POST\n      request_encoding: JSON\n      request_prefix: \"token=your_app_token\u0026user=your_user_key\u0026message=\"\n      request_suffix: \"\"\n\n  max_threads: 1\n  max_retries: 3\n  max_try_secs: 20\n  check_every_n_secs: 60\n  notify_every_n_secs: 600\n  after_every_n_notifications: 1\n\nmonitors:\n  # Single-port MAC-pinning monitor (hidden from MRTG display, monitoring continues)\n  - type: port\n    name: \"switch-port0\"\n    address: snmp://192.168.1.6\n    community: TellusionLab\n    check_every_n_secs: 10\n    notify_every_n_secs: 60\n    after_every_n_notifications: 6\n    port: 0\n    mac: 18:E8:29:45:F8:F7\n    always_up: yes\n    display: false\n\n  # Switch port status + SNMP metrics monitoring\n  - type: ports\n    name: office-switch\n    address: \"snmp://192.168.1.6\"\n    community: \"public\"\n    percentile: 95\n    check_every_n_secs: 10\n    notify_every_n_secs: 3600\n    after_every_n_notifications: 1\n\n  # Host performance monitoring (CPU, memory, disk I/O, swap, interrupts)\n  - type: host\n    name: debmon-host\n    address: \"snmp://192.168.1.10\"\n    community: \"public\"\n    check_every_n_secs: 300\n\n  # TCP port check with send/receive\n  - type: tcp\n    name: smtp-server\n    address: \"tcp://mail.example.com:25\"\n    send: \"EHLO apmonitor\\r\\n\"\n    content_type: text\n    expect: \"250\"\n    check_every_n_secs: 60\n\n  # TCP connection-only check\n  - type: tcp\n    name: mysql-db\n    address: \"tcp://192.168.1.100:3306\"\n    check_every_n_secs: 30\n\n  # UDP send with hex data\n  - type: udp\n    name: custom-protocol\n    address: \"udp://192.168.1.200:9999\"\n    send: \"01 02 03 04\"\n    content_type: hex\n    expect: \"OK\"\n    check_every_n_secs: 60\n\n  # UDP send with text data\n  - type: udp\n    name: syslog-collector\n    address: \"udp://192.168.1.50:514\"\n    send: \"\u003c134\u003eAPMonitor: test message\"\n    check_every_n_secs: 300\n\n  - type: ping\n    name: home-fw\n    address: \"192.168.1.1\"\n    check_every_n_secs: 60\n    email: true\n    heartbeat_url: \"https://hc-ping.com/uuid-here\"\n    heartbeat_every_n_secs: 300\n\n  - type: http\n    name: in3245622\n    address: \"http://192.168.1.21/Login?oldUrl=Index\"\n    expect: \"System Name: \u003cb\u003eHomeLab\u003c/b\u003e\"\n    check_every_n_secs: 120\n    notify_every_n_secs: 3600\n    after_every_n_notifications: 5\n    email: yes\n\n  - type: http\n    name: json-api\n    address: \"https://api.example.com/webhook\"\n    send: '{\"event\": \"test\", \"status\": \"ok\"}'\n    content_type: \"application/json\"\n    expect: \"success\"\n\n  - type: http\n    name: nvr0\n    address: \"https://192.168.1.12/api/system\"\n    expect: \"nvr0\"\n    ssl_fingerprint: \"a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890\"\n    ignore_ssl_expiry: true\n    email: false\n    heartbeat_url: \"https://plus.site24x7.com/hb/uuid/nvr0\"\n    heartbeat_every_n_secs: 60\n\n  - type: quic\n    name: fast-api\n    address: \"https://192.168.1.50/api/health\"\n    expect: \"ok\"\n    check_every_n_secs: 30\n```\n\n## site: configuration options\n\nThe `site` section defines global settings for the monitoring site.\n\n### Required Fields\n\n- **`name`** (string): The name of the site being monitored. Used in notification messages and as the MRTG output subdirectory name (sanitised to a filesystem-safe string).\n```yaml\nsite:\n  name: \"HomeLab\"\n```\n\n### Optional Fields\n\n- **`email_server`** (object, optional): SMTP server configuration for sending email notifications. Required if `outage_emails` is configured.\n```yaml\nemail_server:\n  smtp_host: \"smtp.gmail.com\"\n  smtp_port: 587\n  smtp_username: \"alerts@example.com\"\n  smtp_password: \"app_password_here\"\n  from_address: \"alerts@example.com\"\n  use_tls: true\n```\n  - **`smtp_host`** (string, required): SMTP server hostname or IP address\n  - **`smtp_port`** (integer, required): SMTP server port (typically 587 for TLS, 465 for SSL, 25 for unencrypted). Must be between 1 and 65535\n  - **`smtp_username`** (string, optional): SMTP authentication username\n  - **`smtp_password`** (string, optional): SMTP authentication password. Use app-specific passwords for Gmail/Google Workspace\n  - **`from_address`** (string, required): Email address to use in the \"From\" field. Must be a valid email address\n  - **`use_tls`** (boolean, optional): Whether to use TLS/STARTTLS encryption. Default: true\n\n**Note**: For Gmail/Google Workspace, you must use an [app-specific password](https://support.google.com/accounts/answer/185833) rather than your account password. Port 587 with `use_tls: true` is the recommended configuration for most SMTP servers.\n\n- **`outage_emails`** (list of objects, optional): Email addresses to notify when resources go down or recover. Requires `email_server` to be configured.\n```yaml\noutage_emails:\n  - email: \"admin@example.com\"\n    email_outages: true\n    email_recoveries: true\n    email_reminders: true\n  - email: \"oncall@example.com\"\n    email_outages: yes\n    email_recoveries: no\n```\n  - **`email`** (string, required): Valid email address\n  - **`email_outages`** (boolean/integer/string, optional): Send email when resource goes down. Default: true\n  - **`email_recoveries`** (boolean/integer/string, optional): Send email when resource recovers. Default: true\n  - **`email_reminders`** (boolean/integer/string, optional): Send email for ongoing outage reminders. Default: true\n\n- **`outage_webhooks`** (list of objects, optional): Webhook endpoints to call when resources go down or recover.\n```yaml\noutage_webhooks:\n  - endpoint_url: \"https://api.example.com/alerts\"\n    request_method: POST\n    request_encoding: JSON\n    request_prefix: \"\"\n    request_suffix: \"\"\n```\n  - **`endpoint_url`** (string, required): Valid URL with scheme and host\n  - **`request_method`** (string, required): HTTP method, must be `GET` or `POST`\n  - **`request_encoding`** (string, required): Message encoding format:\n    - `URL`: URL-encode the message (for query parameters or form data)\n    - `HTML`: HTML-escape the message\n    - `JSON`: Send as JSON object with `message` field (POST only)\n    - `CSVQUOTED`: CSV-quote the message for comma-separated values\n  - **`request_prefix`** (string, optional): String to prepend to encoded message (e.g., API tokens, field names)\n  - **`request_suffix`** (string, optional): String to append to encoded message\n\n- **`max_threads`** (integer, optional): Number of concurrent threads for checking resources in parallel. Must be ≥ 1. Default: 1 (single-threaded). Can be overridden by command line `-t` option.\n```yaml\nmax_threads: 1\n```\n\n**Note**: For near-realtime monitoring environments, set `max_threads` to 5-15 to enable parallel checking of multiple resources. Single-threaded mode (1) is recommended for small systems like Raspberry Pi or when log clarity is important. This setting is overridden by the `-t` command line argument if specified.\n\n- **`max_retries`** (integer, optional): Number of times to retry failed checks before marking resource as down. Must be ≥ 1. Default: 3\n```yaml\nmax_retries: 3\n```\n\n**Note**: For near-realtime monitoring, set `max_retries: 1` to reduce detection latency. Higher values (3-5) are better for unstable networks where transient failures are common.\n\n- **`max_try_secs`** (integer, optional): Timeout in seconds for each individual check attempt. Must be ≥ 1. Default: 20\n```yaml\nmax_try_secs: 20\n```\n\n- **`check_every_n_secs`** (integer, optional): Default seconds between checks for all monitors. Individual monitors can override this with their own `check_every_n_secs` setting. Must be ≥ 1. Default: 60\n```yaml\ncheck_every_n_secs: 300\n```\n\n**Note**: This sets the baseline check interval for all monitors. Can be overridden per-monitor for resources requiring different check frequencies. When a monitor's configuration changes (detected via SHA-256 checksum), it is checked immediately regardless of this interval.\n\n- **`notify_every_n_secs`** (integer, optional): Default minimum seconds between outage notifications for all monitors. Individual monitors can override this with their own `notify_every_n_secs` setting. Must be ≥ 1. Default: 600\n```yaml\nnotify_every_n_secs: 1800\n```\n\n**Note**: This sets the baseline notification throttling interval. Combined with `after_every_n_notifications`, controls the notification escalation curve for all monitors unless overridden per-monitor.\n\n- **`after_every_n_notifications`** (integer, optional): Default number of notifications after which the notification interval reaches `notify_every_n_secs` for all monitors. Individual monitors can override this with their own `after_every_n_notifications` setting. Must be ≥ 1. Default: 1 (constant notification intervals)\n```yaml\nafter_every_n_notifications: 1\n```\n\n**Note**: When set to a value \u003e 1, notification intervals start shorter and gradually increase following a quadratic Bezier curve until reaching `notify_every_n_secs` after the specified number of notifications. This provides more frequent alerts at the start of an outage when immediate attention is needed, then reduces notification frequency as the outage continues. A value of 1 maintains constant notification intervals (original behavior).\n\n- **`alarms`** (boolean/integer/string, optional): Master switch to enable/disable all outage/recovery/reminder notifications for every monitor in this site. Accepts: `true`/`yes`/`on`/`1` (case-insensitive) for enabled, `false`/`no`/`off`/`0` for disabled. Default: true\n```yaml\nalarms: false\n```\n\n**Note**: When set to `false`, no email or webhook notifications are sent for any monitor in the site. Monitoring, state tracking, heartbeats, RRD collection, and MRTG display all continue unaffected. Useful for silencing a site during planned maintenance or initial deployment. Can be overridden per-monitor with a monitor-level `alarms` setting.\n\n## monitors: configuration options\n\nThe `monitors` section is a list of resources to monitor. Each monitor defines what to check and how often.\n\n### Required Fields (All Monitor Types)\n\n- **`type`** (string): Type of check to perform. Must be one of:\n  - `ping`: ICMP ping check\n  - `http`: HTTP/HTTPS endpoint check (supports both HTTP and HTTPS schemes, follows and checks redirect chain for errors)\n  - `quic`: HTTP/3 over QUIC endpoint check (UDP-based, faster than HTTP/HTTPS for high-latency networks)\n  - `tcp`: TCP port connectivity and protocol check\n  - `udp`: UDP datagram send/receive check\n  - `ports`: SNMP network device monitor — collects interface bandwidth/packet/error metrics, TCP retransmits, CPU \u0026 memory, and tracks per-interface oper/admin state and MAC address changes\n  - `port`: SNMP single-port MAC-pinning monitor (pins one switch port to one MAC address; fires alerts on wrong MAC, port down, or MAC absence depending on `always_up`)\n  - `host`: SNMP host performance monitor — collects CPU, memory, disk I/O, swap activity, and hardware interrupt metrics per *System Performance Tuning* (Musumeci \u0026 Loukides, O'Reilly)\n\n\u003e [!NOTE]\n\u003e `type: snmp` has been removed. Use `type: ports` for network device monitoring or `type: host` for server performance monitoring.\n\n- **`name`** (string): Unique identifier for this monitor.\n\n- **`address`** (string): Resource to check. Format depends on monitor type:\n  - For `ping`: Valid hostname, IPv4, or IPv6 address\n  - For `http`/`quic`: Full URL with scheme and host\n  - For `tcp`: URL with `tcp://` scheme, hostname/IP, and port (e.g., `tcp://server.example.com:22`)\n  - For `udp`: URL with `udp://` scheme, hostname/IP, and port (e.g., `udp://192.168.1.1:161`)\n  - For `ports`: URL with `snmp://` scheme and hostname/IP (e.g., `snmp://192.168.1.1` or `snmp://192.168.1.1:161`)\n  - For `port`: URL with `snmp://` scheme and hostname/IP — uses SNMP transport, same format as `ports` (e.g., `snmp://192.168.1.6`)\n  - For `host`: URL with `snmp://` scheme and hostname/IP — uses SNMP transport, same format as `ports` (e.g., `snmp://192.168.1.10`)\n\n### Optional Fields (All Monitor Types)\n\n- **`check_every_n_secs`** (integer, optional): Seconds between checks for this resource. Overrides site-level `check_every_n_secs`. Must be ≥ 1. Default: 60 (or site-level setting if configured)\n```yaml\ncheck_every_n_secs: 300\n```\n\n**Note**: When a monitor's configuration changes (any field modification), the monitor is checked immediately on the next run regardless of this interval. Configuration changes are detected via SHA-256 checksum stored in the state file.\n\n- **`notify_every_n_secs`** (integer, optional): Minimum seconds between outage notifications while resource remains down. Must be ≥ 1 and ≥ `check_every_n_secs`. Default: 600\n```yaml\nnotify_every_n_secs: 1800\n```\n\n- **`after_every_n_notifications`** (integer, optional): Number of notifications after which the notification interval reaches `notify_every_n_secs` for this specific monitor. Overrides site-level `after_every_n_notifications`. Can only be specified if `notify_every_n_secs` is present. Must be ≥ 1.\n```yaml\nnotify_every_n_secs: 3600\nafter_every_n_notifications: 5\n```\n\n**Behavior**: Notification timing follows a quadratic Bezier curve—intervals start shorter and gradually increase over the first N notifications until reaching the full `notify_every_n_secs` interval. After N notifications, the interval remains constant at `notify_every_n_secs`. This provides aggressive early alerting that tapers off as outages persist.\n\n- **`email`** (boolean/integer/string, optional): Master switch to enable/disable email notifications for this specific monitor. Accepts: `true`/`yes`/`on`/`1` (case-insensitive) for enabled, `false`/`no`/`off`/`0` for disabled. Default: true (enabled if `email_server` configured)\n```yaml\nemail: true\n```\n\n**Note**: When set to `false`, this monitor will not send any email notifications regardless of site-level `outage_emails` configuration. Useful for non-critical resources or during maintenance windows. This is a monitor-level override that takes precedence over all other email settings.\n\n- **`display`** (boolean/integer/string, optional): Controls whether this monitor appears in the MRTG index page. Accepts: `true`/`yes`/`on`/`1` (case-insensitive) for visible, `false`/`no`/`off`/`0` for hidden. Default: true (displayed)\n```yaml\ndisplay: false\n```\n\n**Note**: When set to `false`, the monitor is completely excluded from the MRTG index HTML output and MRTG config file — no graphs are generated and no graph cells appear. Monitoring, alerting, heartbeats, and RRD data collection continue unaffected. Hidden monitors are listed by name in a small audit footer at the bottom of the MRTG index page; if a hidden monitor is down, its name appears in red in that footer so outages remain visible as a detective control. Useful for suppressing internal infrastructure monitors (e.g., the APMonitor host itself) that would clutter the dashboard without adding operational value.\n\n- **`alarms`** (boolean/integer/string, optional): Enable/disable all outage/recovery/reminder notifications for this specific monitor. Accepts: `true`/`yes`/`on`/`1` (case-insensitive) for enabled, `false`/`no`/`off`/`0` for disabled. Default: true (or site-level `alarms` setting if configured)\n```yaml\nalarms: false\n```\n\n**Note**: Monitor-level `alarms` overrides site-level `alarms`. When set to `false`, no email or webhook notifications are sent for this monitor. Monitoring, state tracking, heartbeats, RRD collection, and MRTG display all continue unaffected. Useful for silencing noisy or non-critical monitors without removing them from the config.\n\n- **`heartbeat_url`** (string, optional): URL to ping (HTTP GET) when resource check succeeds. Useful for external monitoring services like Site24x7 or Healthchecks.io. Must be valid URL with scheme and host.\n```yaml\nheartbeat_url: \"https://hc-ping.com/your-uuid-here\"\n```\n\n- **`heartbeat_every_n_secs`** (integer, optional): Seconds between heartbeat pings. Must be ≥ 1. Can only be specified if `heartbeat_url` is present. If not specified, heartbeat is sent on every successful check.\n```yaml\nheartbeat_every_n_secs: 300\n```\n\n### HTTP/QUIC Monitor Specific Fields\n\nThese fields are only valid for monitors with `type: http` or `type: quic`:\n\n- **`expect`** (string, optional): Substring that must appear in the HTTP response body for the check to succeed. If not present, any 200 OK response is considered successful. The check performs a simple string search—if the expected content appears anywhere in the response body, the check passes.\n```yaml\nexpect: \"System Name: \u003cb\u003eHomeLab\u003c/b\u003e\"\n```\n\n**Note**: The `expect` field is string-only for simplicity. It performs exact substring matching (case-sensitive). For complex validation scenarios requiring status code checks, header validation, or regex matching, consider using external monitoring tools or extending APMonitor.\n\n- **`ssl_fingerprint`** (string, optional): SHA-256 fingerprint of the expected SSL/TLS certificate (with or without colons). Enables certificate pinning for self-signed certificates. When specified, the certificate is verified before making the HTTP request.\n```yaml\nssl_fingerprint: \"e85260e8f8e85629cfa4d023ea0ae8dd3ce8ccc0040b054a4753c2a5ab269296\"\n```\n\n- **`ignore_ssl_expiry`** (boolean/integer/string, optional): Skip SSL/TLS certificate expiration checking. Accepts: `true`/`1`/`\"yes\"`/`\"ok\"` (case-insensitive) for true, or `false`/`0`/`\"no\"` for false. Useful for development environments or when certificate renewal is managed separately.\n```yaml\nignore_ssl_expiry: true\n```\n\n### HTTP/QUIC POST Request Fields\n\nThese optional fields enable HTTP/QUIC monitors to send POST requests with data:\n\n- **`send`** (string, optional): Data to send in HTTP/QUIC POST request body. When specified, the monitor sends a POST request instead of GET. Data is always UTF-8 encoded.\n```yaml\nsend: '{\"event\": \"test\", \"status\": \"ok\"}'\n```\n\n- **`content_type`** (string, optional): MIME type for the Content-Type header. Can only be specified if `send` is present. This is a raw MIME type string (e.g., `application/json`, `application/x-www-form-urlencoded`, `text/plain`). Default: `text/plain; charset=utf-8`\n```yaml\ncontent_type: \"application/json\"\nsend: '{\"event\": \"test\", \"status\": \"ok\"}'\n```\n\n**HTTP JSON POST Example:**\n```yaml\n- type: http\n  name: json-api\n  address: \"https://api.example.com/webhook\"\n  send: '{\"event\": \"test\", \"status\": \"ok\"}'\n  content_type: \"application/json\"\n  expect: \"success\"\n```\n\n**HTTP Form POST Example:**\n```yaml\n- type: http\n  name: form-submit\n  address: \"https://example.com/submit\"\n  send: \"name=test\u0026value=123\"\n  content_type: \"application/x-www-form-urlencoded\"\n  expect: \"received\"\n```\n\n**QUIC POST Example:**\n```yaml\n- type: quic\n  name: text-endpoint\n  address: \"https://fast.example.com/log\"\n  send: \"Test message\"\n  content_type: \"text/plain; charset=utf-8\"\n```\n\n**Note**: HTTP/QUIC monitors without `send` perform GET requests (original behavior). The `content_type` for HTTP/QUIC is a raw MIME type header, unlike TCP/UDP where it specifies encoding format (text/hex/base64).\n\n### TCP/UDP Monitor Specific Fields\n\nThese fields are only valid for monitors with `type: tcp` or `type: udp`:\n\n- **`send`** (string, optional for TCP, **required for UDP**): Data to send to the service. UDP monitors require this parameter because UDP is connectionless and needs application-layer data to verify connectivity.\n```yaml\nsend: \"EHLO apmonitor\\r\\n\"\n```\n\n- **`content_type`** (string, optional): Encoding format for the `send` data. Can only be specified if `send` is present. Valid values:\n  - `text` (default): UTF-8 encoded string\n  - `hex`: Hexadecimal byte string (spaces and colons are stripped)\n  - `base64`: Base64-encoded binary data\n```yaml\ncontent_type: hex\nsend: \"01 02 03 04\"\n```\n\n**Note**: TCP monitors without `send` perform connection-only checks. TCP monitors automatically attempt to receive data after connecting (useful for banner protocols like SSH, SMTP, FTP). UDP monitors without `expect` succeed if the packet is sent without socket errors, but cannot verify if the service is actually listening.\n\n- **`expect`** (string, optional): Substring that must appear in the response for the check to succeed. For TCP, this validates the received banner or response. For UDP, this requires a matching response to be received.\n```yaml\nexpect: \"SSH-2.0\"\n```\n\n**UDP Behavior Notes**:\n- **With `expect`**: Real service validation (recommended for SNMP, DNS, NTP) - waits for response and validates content\n- **Without `expect`**: Fire-and-forget (useful for syslog, statsd) - succeeds if packet sends without socket error, cannot detect if port is listening\n- UDP is connectionless, so there's no \"connection established\" signal like TCP's three-way handshake\n\n### Ports Monitor Specific Fields\n\nThe `ports` monitor type polls a managed network switch, router, or Linux host via SNMPv2c. It combines two orthogonal functions in one monitor: it collects bandwidth, packet, error, TCP retransmit, CPU, and memory metrics into RRD (the former `type: snmp` function), and it also tracks the operational and administrative status of every interface plus the set of learned MAC addresses on each port (the original `ports` function), firing one notification per changed interface.\n\n\u003e [!NOTE]\n\u003e `type: ports` subsumes the former `type: snmp`. If you previously used `type: snmp` for bandwidth/metric monitoring, change it to `type: ports`. The only functional difference is that `ports` also performs port state and MAC change detection; for devices where that is not relevant (e.g., a Linux host with no managed switching), the MAC walk will simply return empty results harmlessly.\n\n**Required Fields:**\n- **`type`**: Must be `ports`\n- **`address`**: URL with `snmp://` scheme and hostname/IP — same format as former `snmp` monitors (e.g., `snmp://192.168.1.6`). Uses IF-MIB via SNMP transport.\n\n**Optional Fields:**\n\n- **`community`** (string, optional): SNMP community string. Default: `public`\n\n- **`percentile`** (integer, optional): Percentile value to compute and display beneath each MRTG graph (e.g., `95` for 95th percentile billing). Must be an integer between 1 and 99. When specified, the Nth percentile is calculated over the graphed time range and shown in the stats table below each graph alongside Max/Average/Current.\n\n  The 95th percentile is the standard metric for burstable bandwidth (\"95th percentile billing\"), which discards the top 5% of traffic samples to allow for short bursts without penalising peak usage in capacity planning.\n```yaml\n- type: ports\n  name: office-switch\n  address: \"snmp://192.168.1.6\"\n  community: \"public\"\n  percentile: 95\n  check_every_n_secs: 300\n```\n\n  **Note**: `percentile` is only valid for `ports` and `port` monitors and has no effect unless `--generate-mrtg-config` is also used.\n\n- **`notify_every_n_secs`** / **`after_every_n_notifications`** (integers, optional): Control the per-interface silence window for port state change alerts. Default values from site config apply.\n\n**Monitored MIB Objects:**\n- **IF-MIB::ifDescr** (1.3.6.1.2.1.2.2.1.2) — Interface name/description (single walk shared by metrics and state)\n- **IF-MIB::ifOperStatus** (1.3.6.1.2.1.2.2.1.8) — Operational status\n- **IF-MIB::ifAdminStatus** (1.3.6.1.2.1.2.2.1.7) — Administrative status\n- **IF-MIB::ifInOctets / ifOutOctets** (1.3.6.1.2.1.2.2.1.10/16) — Byte counters per interface\n- **IF-MIB::ifInErrors / ifOutErrors** (1.3.6.1.2.1.2.2.1.14/20) — Error counters per interface\n- **IF-MIB::ifHCIn/OutUcastPkts, ifHCIn/OutMulticastPkts, ifHCIn/OutBroadcastPkts** — 64-bit packet counters\n- **TCP-MIB::tcpRetransSegs** (1.3.6.1.2.1.6.12.0) — Global TCP retransmit counter\n- **Vendor-specific CPU OIDs** (Cisco/HP/Juniper/Ubiquiti) → fallback HOST-RESOURCES-MIB::hrProcessorLoad\n- **Vendor-specific memory OIDs** (Cisco/HP/Juniper/Ubiquiti) → fallback HOST-RESOURCES-MIB::hrStorage\n- **Q-BRIDGE-MIB::dot1qTpFdbPort** (1.3.6.1.2.1.17.7.1.2.2.1.2) — MAC-to-port mappings\n- **Q-BRIDGE-MIB::dot1qTpFdbStatus** (1.3.6.1.2.1.17.7.1.2.2.1.3) — FDB entry status (learned=3 filter)\n\n**MRTG Targets generated:** `-bandwidth`, `-packets`, `-packets-type`, `-errors`, `-retransmits`, `-system`, `-tamper`, `-network` (see MRTG targets table above).\n\n**State Tracking:**\n\nThe state file stores one key per `ports` monitor:\n- `ports_state`: committed baseline — dict of `{if_index: {name, oper, admin, macs}}` per interface; advances to current state on each successful poll\n\n**Field Restrictions:**\n- `expect`, `ssl_fingerprint`, `ignore_ssl_expiry`, `send`, `content_type` are not valid for `ports` monitors\n- `ports` monitors support `heartbeat_url` and `heartbeat_every_n_secs` like other monitor types\n\n**Example Ports Monitor Configuration:**\n```yaml\n- type: ports\n  name: office-switch\n  address: \"snmp://192.168.1.6\"\n  community: \"public\"\n  percentile: 95\n  check_every_n_secs: 30\n  notify_every_n_secs: 3600\n  after_every_n_notifications: 1\n```\n\n**Sample Notification Output:**\n```\n##### PORT CHANGE: office-switch in HomeLab: GigabitEthernet0/2 oper=down admin=up (was oper=up admin=up) at 2:15 PM #####\n##### PORT MAC CHANGE: office-switch in HomeLab: GigabitEthernet0/1 MAC change appeared=[AA:BB:CC:DD:EE:FF] at 2:22 PM #####\n```\n\n### Host Monitor Specific Fields\n\nThe `host` monitor type polls a Linux host (or any net-snmp compatible device) via SNMPv2c for system performance metrics drawn from UCD-SNMP-MIB and HOST-RESOURCES-MIB. The four MRTG charts generated correspond directly to the canonical performance tuning metrics defined in *System Performance Tuning* by Gian-Paolo D. Musumeci \u0026 Mike Loukides (O'Reilly, 2nd Ed.).\n\n`type: host` uses the same SNMP RRD schema as `ports` and `port`. Network DS (`total_bits_*`, `total_pkts_*`, etc.) are stored as `U` since `host` does not poll interface counters.\n\n**Required Fields:**\n- **`type`**: Must be `host`\n- **`address`**: URL with `snmp://` scheme and hostname/IP (e.g., `snmp://192.168.1.10`)\n\n**Optional Fields:**\n\n- **`community`** (string, optional): SNMP community string. Default: `public`\n\n**MRTG Charts Generated:**\n\n| Slot | DS pair | Title | Description |\n|---|---|---|---|\n| `-system1` | `cpu_load` / `context_switches` | CPU \u0026 Load | CPU utilization % + context switches/sec |\n| `-system2` | `memory_pct` / `swap_io` | Memory \u0026 Paging | Memory utilization % + swap I/O rate |\n| `-system3` | `disk_read` / `disk_write` | Disk I/O | Disk read/write bytes/sec (all devices summed). Disk space utilization % shown in PageTop header as *Disk Use: ##.#%* |\n| `-system4` | `swap_used` / `interrupts` | System Thrashing | Swap used bytes + hardware interrupts/sec |\n\n**Disk Space Display**: The current root filesystem utilization percentage is embedded in the MRTG `-system3` detail page header (PageTop) and in the MRTG index cell heading, e.g., `Disk I/O — Disk Use: 73.4%`. The value is read from state (persisted on each successful poll) so it updates on every monitoring cycle without requiring a live SNMP poll at graph generation time. Displays as `Disk Use: N/A` until the first successful poll.\n\n**Monitored MIB Objects:**\n- **HOST-RESOURCES-MIB::hrProcessorLoad** (1.3.6.1.2.1.25.3.3.1.2) — CPU load per core (averaged)\n- **HOST-RESOURCES-MIB::hrStorage** (1.3.6.1.2.1.25.2.3.1.*) — Physical memory, swap, and root filesystem utilization\n- **UCD-SNMP-MIB::ssRawContexts** (1.3.6.1.4.1.2021.11.60.0) — Raw context switch counter\n- **UCD-SNMP-MIB::ssRawSwapIn** (1.3.6.1.4.1.2021.11.62.0) — Raw swap-in counter\n- **UCD-SNMP-MIB::ssRawSwapOut** (1.3.6.1.4.1.2021.11.63.0) — Raw swap-out counter\n- **UCD-SNMP-MIB::ssRawInterrupts** (1.3.6.1.4.1.2021.11.59.0) — Raw hardware interrupt counter\n- **UCD-SNMP-MIB::memTotalReal / memAvailReal** (1.3.6.1.4.1.2021.4.5/6.0) — Memory fallback if hrStorage unavailable\n- **UCD-SNMP-MIB::memTotalSwap / memAvailSwap** (1.3.6.1.4.1.2021.4.3/4.0) — Swap fallback if hrStorage unavailable\n- **UCD-DISKIO-MIB::diskIOReadX** (1.3.6.1.4.1.2021.13.15.1.1.5) — 64-bit disk read bytes per device (walked, summed)\n- **UCD-DISKIO-MIB::diskIOWriteX** (1.3.6.1.4.1.2021.13.15.1.1.6) — 64-bit disk write bytes per device (walked, summed)\n\n**Notes:**\n- UCD-SNMP-MIB OIDs (`ssRaw*`, `diskIO*`) are Linux `net-snmp` specific. On network devices these DS store `U`.\n- Disk I/O bytes are summed across all block devices discovered by `diskIOTable`. This gives aggregate host I/O throughput rather than per-device breakdown.\n- hrStorage physical memory and swap are used preferentially; UCD memTotal/memAvail OIDs are fallback.\n- Root filesystem is identified by matching hrStorageDescr against `/`, `root`, `c:\\`, or `c:`.\n\n**Field Restrictions:**\n- `expect`, `ssl_fingerprint`, `ignore_ssl_expiry`, `send`, `content_type`, `percentile` are not valid for `host` monitors\n- `host` monitors support `heartbeat_url` and `heartbeat_every_n_secs` like other monitor types\n\n**Example Host Monitor Configuration:**\n```yaml\n- type: host\n  name: debmon-host\n  address: \"snmp://192.168.1.10\"\n  community: \"YourCommunityString\"\n  check_every_n_secs: 300\n  heartbeat_url: \"https://hc-ping.com/uuid-here\"\n  heartbeat_every_n_secs: 600\n```\n\n### Port Monitor Specific Fields\n\nThe `port` monitor type polls a single switch port by ifIndex via SNMPv2c, pinning it to a specific MAC address. It is orthogonal to the `ports` type: `ports` watches all interfaces on a device holistically; `port` watches one interface with a hard MAC binding.\n\n**Required Fields:**\n- **`type`**: Must be `port`\n- **`address`**: URL with `snmp://` scheme and hostname/IP — same format as `snmp`/`ports` (e.g., `snmp://192.168.1.6`)\n- **`port`** (integer): ifIndex of the switch port to monitor. Must be a non-negative integer. This is the raw ifIndex as returned by IF-MIB, not a zero-based port number.\n- **`mac`** (string): Pinned MAC address in `XX:XX:XX:XX:XX:XX` format (case-insensitive). This is the expected device on the port.\n\n**Optional Fields:**\n\n- **`community`** (string, optional): SNMP community string. Default: `public`\n\n- **`percentile`** (integer, optional): Percentile value for MRTG graphs. Must be an integer between 1 and 99. See `ports` monitor for details.\n\n- **`always_up`** (boolean/integer/string, optional): Controls alarm semantics. Default: `false`\n\n**Alarm Logic:**\n\n| Condition | `always_up: true` | `always_up: false` |\n|---|---|---|\n| Port oper≠up | Alarm | No alarm |\n| Pinned MAC absent from port | Alarm | No alarm |\n| Wrong MAC present on port | Alarm | Alarm |\n| All clear | Recovery | Recovery |\n\n- **`always_up: true`**: The port must be operationally up AND the pinned MAC must be present AND be the only learned MAC. Any deviation alarms.\n- **`always_up: false`**: Only alarms when a non-pinned MAC is present on the port. Port down and MAC absence are silent (useful for ports that legitimately go idle).\n\n**Recovery:** A recovery notification fires whenever all alarm conditions clear.\n\n**MAC Resolution:**\n\nUses Q-BRIDGE-MIB (RFC 2674) `dot1qTpFdbTable` — the correct table for VLAN-aware managed switches. The classic `dot1dTpFdbTable` (BRIDGE-MIB) returns zero entries on VLAN-aware hardware because its FDB is partitioned per VLAN. MAC walk failure is non-fatal: monitoring continues with `current_mac=None`, which only triggers alarms when `always_up=true` (MAC absent condition).\n\n**State Tracking:**\n\nThe state file stores one key per `port` monitor:\n- `port_state`: dict of `{oper, mac}` from last successful poll — used for observability and future state transition logging\n\n**Field Restrictions:**\n- `expect`, `ssl_fingerprint`, `ignore_ssl_expiry`, `send`, `content_type` are not valid for `port` monitors\n- `port` monitors support `heartbeat_url` and `heartbeat_every_n_secs` like other monitor types\n\n**Example Configuration:**\n```yaml\n- type: port\n  name: \"switch-port0\"\n  address: snmp://192.168.1.6\n  community: TellusionLab\n  check_every_n_secs: 10\n  notify_every_n_secs: 60\n  after_every_n_notifications: 6\n  port: 0\n  mac: 18:E8:29:45:F8:F7\n  always_up: yes\n```\n\nWith `always_up: yes`, this fires an alarm if ifIndex 0 is not oper=up, if `18:E8:29:45:F8:F7` is absent, or if any other MAC is present on that port.\n\n**Sample Notification Output:**\n```\n##### NEW OUTAGE: switch-port0 in HomeLab new outage: port ifIndex=0 18:E8:29:45:F8:F7 is down (admin=up) (snmp://192.168.1.6) at 2:15 PM, down for 0 secs #####\n##### NEW OUTAGE: switch-port0 in HomeLab new outage: port ifIndex=0 is up but pinned MAC 18:E8:29:45:F8:F7 absent (snmp://192.168.1.6) at 2:16 PM, down for 0 secs #####\n##### NEW OUTAGE: switch-port0 in HomeLab new outage: port ifIndex=0 wrong MAC: expected 18:E8:29:45:F8:F7, got AA:BB:CC:DD:EE:FF (snmp://192.168.1.6) at 2:17 PM, down for 0 secs #####\n##### RECOVERY: switch-port0 in HomeLab is UP (snmp://192.168.1.6) at 2:18 PM, outage lasted 1 mins 3 secs #####\n```\n\n### Example Configurations\n\n#### **Ping Monitor:**\n```yaml\n- type: ping\n  name: home-gateway\n  address: \"192.168.1.1\"\n  check_every_n_secs: 60\n  heartbeat_url: \"https://hc-ping.com/uuid-here\"\n```\n\n#### **HTTP Monitor with Content Check:**\n```yaml\n- type: http\n  name: web-server\n  address: \"http://192.168.1.100/health\"\n  expect: \"status: ok\"\n  check_every_n_secs: 120\n  notify_every_n_secs: 3600\n```\n\n#### **HTTPS Monitor with Certificate Pinning:**\n```yaml\n- type: http\n  name: nvr0\n  address: \"https://192.168.1.12/api/system\"\n  expect: \"nvr0\"\n  ssl_fingerprint: \"e85260e8f8e85629cfa4d023ea0ae8dd3ce8ccc0040b054a4753c2a5ab269296\"\n  ignore_ssl_expiry: true\n  heartbeat_url: \"https://plus.site24x7.com/hb/uuid/nvr0\"\n  heartbeat_every_n_secs: 60\n```\n\n#### **QUIC Monitor (HTTP/3):**\n```yaml\n- type: quic\n  name: fast-api\n  address: \"https://api.example.com/health\"\n  expect: \"healthy\"\n  check_every_n_secs: 30\n  ssl_fingerprint: \"a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890\"\n```\n\n**Note**: QUIC monitoring uses HTTP/3 over UDP (port 443 by default) and is particularly effective for high-latency networks or when monitoring resources over unreliable connections. QUIC provides built-in connection migration and improved performance compared to TCP-based HTTP/2.\n\n#### **TCP Banner Check (SSH):**\n```yaml\n- type: tcp\n  name: ssh-server\n  address: \"tcp://server.example.com:22\"\n  expect: \"SSH-2.0\"\n  check_every_n_secs: 60\n```\n\n#### **TCP Send/Receive (SMTP):**\n```yaml\n- type: tcp\n  name: smtp-server\n  address: \"tcp://mail.example.com:25\"\n  send: \"EHLO apmonitor\\r\\n\"\n  content_type: text\n  expect: \"250\"\n  check_every_n_secs: 60\n```\n\n#### **TCP Connection-Only Check:**\n```yaml\n- type: tcp\n  name: mysql-db\n  address: \"tcp://192.168.1.100:3306\"\n  check_every_n_secs: 30\n```\n\n#### **UDP with Response Validation (DNS):**\n```yaml\n- type: udp\n  name: dns-server\n  address: \"udp://8.8.8.8:53\"\n  send: \"...\" # DNS query packet\n  content_type: hex\n  expect: \"...\" # Expected response\n  check_every_n_secs: 60\n```\n\n#### **UDP Fire-and-Forget (Syslog):**\n```yaml\n- type: udp\n  name: syslog-collector\n  address: \"udp://192.168.1.50:514\"\n  send: \"\u003c134\u003eAPMonitor: test message\"\n  check_every_n_secs: 300\n```\n\n#### **Network Switch with 95th Percentile (formerly `type: snmp`):**\n```yaml\n- type: ports\n  name: office-switch\n  address: \"snmp://192.168.1.6\"\n  community: \"public\"\n  percentile: 95\n  check_every_n_secs: 300\n  heartbeat_url: \"https://hc-ping.com/uuid-switch\"\n  heartbeat_every_n_secs: 600\n```\n\n#### **Host Performance Monitor:**\n```yaml\n- type: host\n  name: debmon-host\n  address: \"snmp://192.168.1.10\"\n  community: \"public\"\n  check_every_n_secs: 300\n```\n\n#### **Switch Port Status + Metrics + MAC Change Monitor:**\n```yaml\n- type: ports\n  name: office-switch\n  address: \"snmp://192.168.1.6\"\n  community: \"public\"\n  check_every_n_secs: 30\n  notify_every_n_secs: 3600\n  after_every_n_notifications: 1\n```\n\n#### **Single Port MAC Pinning Monitor:**\n```yaml\n- type: port\n  name: \"switch-port0\"\n  address: snmp://192.168.1.6\n  community: TellusionLab\n  check_every_n_secs: 10\n  notify_every_n_secs: 60\n  after_every_n_notifications: 6\n  port: 0\n  mac: 18:E8:29:45:F8:F7\n  always_up: yes\n```\n\n#### Hidden Monitor (monitoring continues, excluded from MRTG display):\n```yaml\n- type: port\n  name: \"switch-port0\"\n  address: snmp://192.168.1.6\n  community: TellusionLab\n  port: 0\n  mac: 18:E8:29:45:F8:F7\n  always_up: yes\n  display: false\n```\n\n#### Silenced Monitor (monitoring and display continue, notifications suppressed):\n```yaml\n- type: ports\n  name: office-switch\n  address: \"snmp://192.168.1.6\"\n  community: \"public\"\n  alarms: false\n```\n\n### Validation Rules\n\nThe configuration validator enforces these rules:\n\n1. Monitor names must be unique across all monitors\n2. `notify_every_n_secs` must be ≥ `check_every_n_secs` if both specified\n3. `heartbeat_every_n_secs` can only be specified if `heartbeat_url` exists\n4. `expect`, `ssl_fingerprint`, and `ignore_ssl_expiry` are only valid for HTTP/QUIC monitors\n5. `expect` must be a non-empty string if specified\n6. All URLs must include both scheme (http/https/tcp/udp/snmp) and hostname\n7. Email addresses must match standard email format (RFC 5322 simplified)\n8. SSL fingerprints must be valid hexadecimal strings with length that's a power of two\n9. `after_every_n_notifications` can only be specified if `notify_every_n_secs` is present\n10. `outage_emails` can only be specified if `email_server` is configured\n11. If `email_server` is present, `smtp_host`, `smtp_port`, and `from_address` are required\n12. `smtp_username` and `smtp_password` are optional (for servers without authentication)\n13. Email control flags (`email_outages`, `email_recoveries`, `email_reminders`) accept boolean or string values\n14. Monitor-level `email` flag accepts boolean or string values\n15. TCP monitors must use `tcp://` scheme, UDP monitors must use `udp://` scheme\n16. TCP/UDP addresses must include hostname/IP and port\n17. UDP monitors require `send` parameter\n18. `content_type` can only be specified if `send` is present\n19. `content_type` for TCP/UDP must be one of: text, hex, base64 (for HTTP/QUIC it's a raw MIME type string)\n20. `ssl_fingerprint` and `ignore_ssl_expiry` are not allowed for TCP/UDP monitors\n21. `ports` monitors must use `snmp://` scheme (SNMP transport)\n22. `community` field is optional for `ports`/`port`/`host` monitors and must be a non-empty string if specified\n23. `expect`, `ssl_fingerprint`, `ignore_ssl_expiry`, `send`, and `content_type` are not allowed for `ports` monitors\n24. `ports` monitors support `heartbeat_url` and `heartbeat_every_n_secs` like other monitor types\n25. `percentile` is only valid for `ports` and `port` monitors and must be an integer between 1 and 99\n26. `port` monitors must use `snmp://` scheme (SNMP transport)\n27. `port` monitors require `port` (non-negative integer ifIndex) and `mac` (valid `XX:XX:XX:XX:XX:XX` address)\n28. `always_up` is optional for `port` monitors and accepts boolean or string values\n29. `expect`, `ssl_fingerprint`, `ignore_ssl_expiry`, `send`, `content_type` are not allowed for `port` monitors\n30. `port` monitors support `heartbeat_url` and `heartbeat_every_n_secs` like other monitor types\n31. `host` monitors must use `snmp://` scheme (SNMP transport)\n32. `expect`, `ssl_fingerprint`, `ignore_ssl_expiry`, `send`, `content_type`, `percentile` are not allowed for `host` monitors\n33. `host` monitors support `heartbeat_url` and `heartbeat_every_n_secs` like other monitor types\n34. `type: snmp` is not valid — the validator emits: *\"type 'snmp' is not valid. Did you mean type: ports?\"*\n35. `display` is optional for all monitor types and accepts boolean or string values; when `false`, the monitor is excluded from MRTG index output but monitoring, alerting, heartbeats, and RRD collection continue unaffected; hidden monitors appear in the MRTG index audit footer and render in red when down\n36. `alarms` is optional at both site and monitor level; accepts boolean or string values; monitor-level `alarms` overrides site-level `alarms`; when `false`, all outage/recovery/reminder notifications are suppressed while monitoring, state tracking, heartbeats, RRD collection, and MRTG display continue unaffected\n\n# Dependencies\n\nInstall system-wide for production use:\n```bash\nsudo apt install python3-rrdtool librrd-dev python3-dev mrtg rrdtool librrds-perl libsnmp-dev\nsudo pip3 install --break-system-packages PyYAML requests pyOpenSSL urllib3 aioquic rrdtool easysnmp\n```\n\n**Note**:\n- The `aioquic` package is required for QUIC/HTTP3 monitoring support. If you don't plan to use `type: quic` monitors, you can omit this dependency.\n- The `easysnmp` package and `libsnmp-dev` system library are required for SNMP monitoring support. If you don't plan to use `type: ports`, `type: port`, or `type: host` monitors, you can omit these dependencies.\n\n# Example invocations\n```bash\n# Single site, auto-derived statefile\n./APMonitor.py homelab-monitorhosts.yaml\n\n# Single site, explicit statefile\n./APMonitor.py -s /tmp/statefile.json homelab-monitorhosts.yaml\n\n# Multiple sites (concurrent subprocesses, no -s allowed)\n./APMonitor.py site1.yaml site2.yaml site3.yaml --generate-mrtg-config\n\n# Test configuration\n./APMonitor.py --test-config homelab-monitorhosts.yaml\n\n# Test webhooks\n./APMonitor.py --test-webhooks -v homelab-monitorhosts.yaml\n\n# Test emails\n./APMonitor.py --test-emails -v homelab-monitorhosts.yaml\n```\n\n# Command Line Usage\n\nAPMonitor is invoked from the command line with various options to control verbosity, threading, state file location, and testing modes.\n\n## Synopsis\n```\n./APMonitor.py [OPTIONS] \u003cconfig_file\u003e [\u003cconfig_file\u003e ...]\n```\n\n## Command Line Options\n\n- **`config_file`** (required, repeatable): Path to one or more YAML or JSON configuration files. When multiple files are specified, each runs as an independent subprocess concurrently. `-s` is not valid with multiple config files.\n\n- **`-v, --verbose`**: Increase verbosity level (can be repeated: `-v`, `-vv`, `-vvv`).\n\n- **`-t, --threads \u003cN\u003e`**: Number of concurrent threads per site for checking resources (default: 1). Overrides `max_threads` in site config.\n\n- **`-s, --statefile \u003cpath\u003e`**: Path to state file. Only valid with a single config file. Default: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.json`.\n\n- **`--test-config`**: Validate configuration and print a summary of monitors, then exit. Does not check resources or touch the statefile.\n\n- **`--test-webhooks`**: Send a test alert to all configured webhooks, then exit.\n\n- **`--test-emails`**: Send a test alert to all configured email addresses, then exit.\n\n- **`--generate-rrds`**: Enable RRD database creation and updates (implied by `--generate-mrtg-config`).\n\n- **`--generate-mrtg-config [WORKDIR]`**: Generate MRTG config, update `mrtg-rrd.cgi.pl`, write `index.html` and detail pages into `WORKDIR/\u003csite-name\u003e/`. Default WORKDIR: `/var/www/html/mrtg`. Implies `--generate-rrds`.\n\n## Common Usage Examples\n\n### Basic Monitoring (Single-Threaded)\n\nRun with default settings, state stored in tmpfs:\n```\n./APMonitor.py -s /tmp/statefile.json monitoring-config.yaml\n```\n\n### Verbose Monitoring for Debugging\n\nShow detailed progress and decision-making:\n```\n./APMonitor.py -v -s /tmp/statefile.json monitoring-config.yaml\n```\n\n### High-Frequency Monitoring (Multiple Threads)\n\nCheck many resources concurrently for near-realtime behavior:\n```\n./APMonitor.py -t 10 -s /tmp/statefile.json monitoring-config.yaml\n```\n\nUse higher thread counts (`-t 5` to `-t 20`) when:\n- Monitoring many independent resources (50+)\n- Resources have long check timeouts\n- Near-realtime alerting is required\n- System has sufficient CPU cores\n\n**Warning**: High thread counts increase lock contention. Test with `-v` to ensure checks aren't blocking each other.\n\n### Test Webhook Configuration\n\nVerify webhooks are configured correctly before production use:\n```\n./APMonitor.py --test-webhooks -v monitoring-config.yaml\n```\n\nThis sends test messages to all configured webhooks with verbose output showing request/response details.\n\n### Test Email Configuration\n\nVerify email settings work correctly:\n```\n./APMonitor.py --test-emails -v monitoring-config.yaml\n```\n\n## Running `APMonitor.py` Continuously\n\nAPMonitor is designed to be run repeatedly rather than as a long-running daemon.\n\n### Option 1: Cron (Recommended for Most Cases)\n```\n* * * * * /path/to/APMonitor.py /path/to/monitoring-config.yaml 2\u003e\u00261 | logger -t apmonitor\n```\n\nNB: PID file locking should keep this under control, in case you get a long-running process.\n\n**Advantages**:\n- Automatic restart if process crashes\n- Built-in scheduling\n- System handles process lifecycle\n- Easy to enable/disable (comment out cron entry)\n\n**Best for**: Production systems, servers with standard monitoring requirements (check intervals ≥ 60 seconds)\n\n### Option 2: While Loop (For Sub-Minute Monitoring)\n\nRun continuously with short sleep intervals for near-realtime monitoring:\n```\n#!/bin/bash\nwhile true; do\n    ./APMonitor.py -t 5 monitoring-config.yaml\n    sleep 10\ndone\n```\n\nOr as a one-liner:\n```\nwhile true; do ./APMonitor.py -s /tmp/statefile.json monitoring-config.yaml; sleep 30; done\n```\n\n**Advantages**:\n- Sub-minute check intervals\n- Near-realtime alerting\n- Fine control over execution frequency\n\n**Best for**: Development, testing, systems requiring rapid failure detection (check intervals \u003c 60 seconds)\n\n**Note**: Use short sleep intervals (5-30 seconds) combined with per-resource `check_every_n_secs` settings to balance responsiveness and system load. APMonitor's internal scheduling prevents redundant checks even with frequent invocations.\n\n### Systemd Service (Alternative)\n\nFor production deployments requiring process supervision:\n```\n[Unit]\nDescription=APMonitor Network Resource Monitor\nAfter=network.target\n\n[Service]\nType=simple\nExecStart=/bin/bash -c 'while true; do /usr/local/bin/APMonitor.py -vv /usr/local/etc/apmonitor-config.yaml --generate-mrtg-config; sleep 10; done'\nRestart=always\nRestartSec=10\nUser=monitoring\nStandardOutput=journal\nStandardError=journal\n\n[Install]\nWantedBy=multi-user.target\n```\n\n## Default State File Location\n\nAPMonitor automatically selects a platform-appropriate default location for the state file if the `-s/--statefile` option is not specified:\n\n### Linux, macOS, FreeBSD, OpenBSD, NetBSD\n**Default**: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.json`\n\n- Directory `/var/tmp/APMonitor/` is created automatically with mode `755` (no www-data write access)\n- Persists across system reboots (unlike `/tmp`)\n- All sibling files (`.new`, `.old`, `.mrtg.cfg`, `.rrd/`) live in the same directory\n\n### Windows\n**Default**: `%TEMP%\\APMonitor\\\u003cconfig-stem\u003e.statefile.json`\n\n### Unknown/Other Platforms\n**Default**: `./\u003cconfig-stem\u003e.statefile.json`\n\n## Concurrency and Multiple Instances\n\nWhen multiple config files are passed on the command line, APMonitor spawns one subprocess per config and joins all before exiting. Each subprocess runs completely independently with its own statefile, RRD database, lock file, and MRTG output directory. A PID lockfile (hashed from the config path) in `/tmp/` prevents duplicate instances per config.\n\nFor manual multi-instance operation with separate invocations, use separate config files — the config filename determines the statefile path and PID lock, so correct cardinality is enforced automatically:\n```bash\n# Instance 1: Production monitoring\n./APMonitor.py prod-apmonitor-config.yaml --generate-mrtg-config\n\n# Instance 2: Development monitoring\n./APMonitor.py dev-apmonitor-config.yaml --generate-mrtg-config\n```\n\n# Developer Notes for modifying `APMonitor.py`\n\n## State File\n\nAPMonitor uses a JSON state file to persist monitoring data across runs:\n\n- **Location**: `/var/tmp/APMonitor/\u003cconfig-stem\u003e.statefile.json` by default\n- **Format**: JSON with per-resource nested objects containing timestamps, status, and counters\n- **Atomic Updates**: Uses `.new` and `.old` rotation to prevent corruption on crashes\n- **Thread Safety**: Protected by internal lock during concurrent access\n\nThe state file tracks per-resource:\n- `is_up`: Current resource status\n- `last_checked`: When resource was last checked (ISO 8601 timestamp)\n- `last_response_time_ms`: Response time in milliseconds for successful checks\n- `last_notified`: When last notification was sent (ISO 8601 timestamp)\n- `last_alarm_started`: When current/last outage began (ISO 8601 timestamp)\n- `last_successful_heartbeat`: When heartbeat URL last succeeded (ISO 8601 timestamp)\n- `down_count`: Consecutive failed checks\n- `notified_count`: Number of notifications sent for current outage\n- `error_reason`: Last error message\n- `last_config_check","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompscifutures%2Fapmonitor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcompscifutures%2Fapmonitor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompscifutures%2Fapmonitor/lists"}