{"id":32751935,"url":"https://github.com/alibo/simple-mqtt-network-lab","last_synced_at":"2026-05-18T10:11:10.210Z","repository":{"id":321839003,"uuid":"1085265764","full_name":"alibo/simple-mqtt-network-lab","owner":"alibo","description":"MQTT connectivity lab: throttle, flake, and observe client resilience.","archived":false,"fork":false,"pushed_at":"2025-10-31T20:02:19.000Z","size":376,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-31T21:20:25.305Z","etag":null,"topics":["emqx","mqtt","network","network-debugging","network-simulation","vibecoding"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alibo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-28T20:01:34.000Z","updated_at":"2025-10-31T20:53:04.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/alibo/simple-mqtt-network-lab","commit_stats":null,"previous_names":["alibo/simple-mqtt-network-lab"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/alibo/simple-mqtt-network-lab","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibo%2Fsimple-mqtt-network-lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibo%2Fsimple-mqtt-network-lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibo%2Fsimple-mqtt-network-lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibo%2Fsimple-mqtt-network-lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alibo","download_url":"https://codeload.github.com/alibo/simple-mqtt-network-lab/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibo%2Fsimple-mqtt-network-lab/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33174091,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T09:27:30.708Z","status":"ssl_error","status_checked_at":"2026-05-18T09:27:28.300Z","response_time":71,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["emqx","mqtt","network","network-debugging","network-simulation","vibecoding"],"created_at":"2025-11-04T00:00:57.963Z","updated_at":"2026-05-18T10:11:10.203Z","avatar_url":"https://github.com/alibo.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Simple MQTT Lab (Minimal Setup)\n\nThis is a minimal, production‑minded lab to exercise an MQTT v3.1 mobile client under network impairments. It includes:\n\n- EMQX 5.8 cluster (3 nodes) behind an HAProxy MQTT gateway.\n- Toxiproxy between the Java client and the gateway (for impairments).\n- A Go backend that consumes driver locations and publishes offers/rides.\n- A Java client (Paho MqttAsyncClient) that publishes driver locations and consumes offers/rides.\n- Rich, structured logs and basic profiling endpoints (pprof for Go, JFR/thread dump for Java).\n\nAll app images use Debian slim bases. No Prometheus, no Grafana, no Streamlit UI.\n\n## Table of Contents\n\n- [Quickstart](#quickstart)\n- [Architecture](#architecture)\n- [Configuration](#configuration)\n- [Toxiproxy Usage (Impairments)](#toxiproxy-usage-impairments)\n- [Network Impairments (tc NetEm)](#network-impairments-tc-netem)\n- [Latency Charts (Per Topic)](#latency-charts-per-topic)\n- [Profiling](#profiling)\n- [MQTT Gateway \u0026 EMQX](#mqtt-gateway--emqx)\n- [MQTT Keepalive \u0026 Reconnect](#mqtt-keepalive--reconnect)\n- [Dual Connections (Pub/Sub)](#dual-connections-pubsub)\n- [Troubleshooting Container (netshoot)](#troubleshooting-container-netshoot)\n- [Operational Notes](#operational-notes)\n- [Message Payload Format](#message-payload-format)\n- [Tests](#tests)\n\n## Quickstart\n\n```bash\ncd simple-mqtt-network-lab\n# Build and run\ndocker compose up --build\n```\n\nServices:\n- HAProxy MQTT gateway: `mqtt-gateway:1883` (host: `localhost:1883`)\n- EMQX dashboard: http://localhost:18083 (admin/public)\n- Toxiproxy API: http://localhost:8474\n  - Proxy preconfigured via `toxiproxy/config.json` (listen `0.0.0.0:18830` → upstream `mqtt-gateway:1883`).\n- Go backend profiling: http://localhost:6060/debug/pprof\n- Java client profiling: http://localhost:6061 (see endpoints below)\n - Network troubleshooting helper: container `network-troubleshooting` shares toxiproxy's netns\n   (tools: tcpdump, mtr, dig, curl, tc, etc.).\n\nLogs are printed to stdout for each service. Stop with Ctrl+C; Compose triggers graceful shutdown for the apps.\n\n## Architecture\n\n```\njava-client ──tcp──\u003e toxiproxy:18830 ──tcp──\u003e mqtt-gateway:1883 ──\u003e emqx[1..3]\n  | publish /driver/location (configurable)\n  | subscribe /driver/offer, /driver/ride\n\nbackend    ───────────────tcp──────────────\u003e mqtt-gateway:1883 ──\u003e emqx[1..3]\n  | subscribe /driver/location\n  | publish   /driver/offer (every Y ms), /driver/ride (every Z ms)\n```\n\n## Topics\n- `/driver/location` (Java → Backend)\n- `/driver/offer` (Backend → Java)\n- `/driver/ride` (Backend → Java)\n\n## Configuration\n\nEdit the YAML files under `configs/` (structured, human‑readable). Defaults are sensible and focus on reconnect robustness and observability.\n\n- `configs/backend.yaml` controls the Go backend (publish rates, keepalive, retry, QoS, payload sizes, socket/buffer/inflight, debug).\n- `configs/client.yaml` controls the Java client (publish rate, keepalive, retry, QoS, payload sizes, socket/buffer/inflight, debug).\n\nBoth apps hot‑reload only on restart (simple by design). Example snippets (full files provided):\n\n```yaml\n# backend.yaml\nmqtt:\n  host: mqtt-gateway\n  port: 1883\n  client_id: backend-1\n  keepalive_secs: 30\n  protocol_version: 3   # MQTT 3.1\n  clean_session: true    # default\nretry:\n  enabled: true\n  connect_timeout_ms: 5000\n  max_reconnect_interval_ms: 10000\n  ping_timeout_ms: 5000\n  write_timeout_ms: 5000\n## App-level ping/pong removed. Built-in MQTT keepalive (PINGREQ/PINGRESP) is used.\npublish:\n  offer_every_ms: 1000\n  ride_every_ms: 2000\nqos:\n  location: 0\n  offer: 0\n  ride: 0\npayload_bytes:\n  offer: 4096\n  ride: 4096\nsocket:\n  tcp_keepalive_secs: 60\n  tcp_nodelay: true\n  read_buffer: 262144\n  write_buffer: 262144\nbuffer_inflight:\n  max_inflight: 64\n  buffer_enabled: true\n  buffer_size: 1000\n  drop_oldest: true\n  persist: false\nlog:\n  debug: false\n```\n\n```yaml\n# client.yaml\nmqtt:\n  host: toxiproxy\n  port: 18830    # Toxiproxy → HAProxy:1883\n  client_id: java-1\n  keepalive_secs: 30\n  protocol_version: 3   # MQTT 3.1\n  clean_session: true    # default\n  # Optional: use two separate MQTT connections\n  # - publisher: publishes /driver/location with client_id \"java-1-pub\"\n  # - subscriber: consumes /driver/offer and /driver/ride with client_id \"java-1-sub\"\n  separate_pubsub_connections: false\nretry:\n  enabled: true\n  automatic_reconnect: true\n  connect_timeout_ms: 5000\n  max_reconnect_delay_ms: 10000\nqos:\n  location: 0\n  offer: 0\n  ride: 0\npayload_bytes:\n  location: 4096\nsocket:\n  tcp_keepalive: true\n  tcp_nodelay: true\n  receive_buffer: 262144\n  send_buffer: 262144\nbuffer_inflight:\n  max_inflight: 64\n  buffer_enabled: true\n  buffer_size: 1000\n  drop_oldest: true\n  persist: false\nlog:\n  debug: false\npublish:\n  location_every_ms: 1000\n```\n\n## Dual Connections (Pub/Sub)\n\nTo compare packet loss and TCP congestion behavior with one vs two MQTT connections, the Java client can split publishing and subscribing over separate connections.\n\n- Enable in `configs/client.yaml` under `mqtt.separate_pubsub_connections: true`.\n- Client IDs derive automatically: `\u003cclient_id\u003e-pub` for publishes to `/driver/location`, `\u003cclient_id\u003e-sub` for subscriptions to `/driver/offer` and `/driver/ride`.\n- Logs include `connected_pub=` and `connected_sub=` in the periodic `[stats]` line to see each connection’s state. Per-message logs and the `[publish]` lines remain unchanged for reporting.\n\nSuggested experiment:\n- Single connection: set `separate_pubsub_connections: false`, run a packet loss scenario (e.g., `bash scripts/netem.sh loss 5`) and capture with `bash scripts/latency-report.sh --pre 5 --post 20 -- bash scripts/netem.sh loss 5`.\n- Dual connections: switch to `true`, rebuild `docker compose up --build`, repeat the same impairment and capture.\n- Compare delivered ratios and latency distributions for `/driver/location` vs `/driver/offer` and `/driver/ride` across the two runs.\n\n## Toxiproxy Usage (Impairments)\n\nThe Java client connects to Toxiproxy (`localhost:18830`), which forwards to the gateway. The proxy is created at boot from `toxiproxy/config.json`.\n\nHelper script (recommended):\n- Control the proxy with `bash scripts/mqtt-proxy.sh`:\n  - Down (hard drop): `bash scripts/mqtt-proxy.sh down`\n  - Up (restore): `bash scripts/mqtt-proxy.sh up`\n  - Timeout 5s: `bash scripts/mqtt-proxy.sh timeout 5000`\n  - Half‑open (client view, block server→client): `bash scripts/mqtt-proxy.sh halfdown`\n  - Half‑open (server view, block client→server): `bash scripts/mqtt-proxy.sh halfup`\n  - Blackhole both ways (no FIN/RST): `bash scripts/mqtt-proxy.sh blackhole` (or `blackhole 600000` for 10m)\n  - Latency and jitter: `bash scripts/mqtt-proxy.sh latency 120 40 [down|up|both]` (default jitter=0, both directions)\n  - Clear latency: `bash scripts/mqtt-proxy.sh unlatency`\n  - Bandwidth limit: `bash scripts/mqtt-proxy.sh bandwidth 256kbps [down|up|both]` (use `bps|kbps|mbps` or bytes/s)\n  - Clear bandwidth: `bash scripts/mqtt-proxy.sh unbandwidth`\n  - Approx packet loss: `bash scripts/mqtt-proxy.sh packetloss 20 [down|up|both]` (uses slicer; not real per‑packet drop)\n  - Clear packet loss: `bash scripts/mqtt-proxy.sh unpacketloss`\n  - Status: `bash scripts/mqtt-proxy.sh status`\n  - Env: set `TOXIPROXY_URL` if not `http://localhost:8474` (default).\n\nDirect API examples:\n\n- Inspect proxies\n```bash\ncurl -s http://localhost:8474/proxies | jq .\n```\n\n- Create the MQTT proxy (compose already does this via `toxiproxy/config.json`):\n```bash\ncurl -s -X POST http://localhost:8474/proxies \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"name\":\"mqtt\",\"listen\":\"0.0.0.0:18830\",\"upstream\":\"mqtt-gateway:1883\"}'\n```\n\n- Simulate full drop (reset connections instantly)\n```bash\ncurl -s -X POST http://localhost:8474/proxies/mqtt/toxics \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"name\":\"drop\",\"type\":\"reset_peer\",\"stream\":\"downstream\",\"toxicity\":1.0}'\n```\n- Remove the drop toxic\n```bash\ncurl -s -X DELETE http://localhost:8474/proxies/mqtt/toxics/drop\n```\n\n- Pause traffic for 5s (timeout toxic)\n```bash\ncurl -s -X POST http://localhost:8474/proxies/mqtt/toxics \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"name\":\"timeout5s\",\"type\":\"timeout\",\"stream\":\"downstream\",\"attributes\":{\"timeout\":5000}}'\n```\n- Remove the timeout toxic\n```bash\ncurl -s -X DELETE http://localhost:8474/proxies/mqtt/toxics/timeout5s\n```\n\nNote: Toxiproxy 2.5 treats `enabled` as read-only via the REST API; use toxics (above) or delete/recreate the proxy instead. For OS‑level packet impairments, see the dedicated section: [Network Impairments (tc NetEm)](#network-impairments-tc-netem).\n\n### Half‑Open Simulation\n\n- `halfdown` adds a downstream `limit_data` toxic with `bytes=0`, effectively blackholing server→client. The client keeps sending (e.g., PINGREQ, publishes) but never receives responses (PINGRESP, PUBACK). No FIN/RST is sent; the server still sees client traffic until keepalive/app timeouts.\n- `halfup` adds an upstream `limit_data` toxic with `bytes=0`, blackholing client→server. The socket stays open but the server won’t see client packets.\n- Use `up` to remove `halfdown`, `halfup`, `timeout`, and `down` toxics.\n\n### Full Blackhole (Both Directions)\n\n- `blackhole [ms]` adds a downstream `timeout_down` and upstream `timeout_up` toxic. With a large timeout (default ~1 year), both directions are blocked without FIN/RST so both ends think the connection is alive until their keepalive or app timeouts fire.\n- Use `up` to remove `timeout_down`/`timeout_up`.\n\n## Network Impairments (tc NetEm)\n\nFor true per‑packet loss/jitter/latency at the OS level, use the NetEm helper. It applies `tc netem` in the Toxiproxy network namespace so all proxied MQTT traffic is affected.\n\n- Helper script: `bash scripts/netem.sh`\n- Defaults: `TARGET=toxiproxy`, `IFACE=eth0`, and it will exec into the `network-troubleshooting` container if present; otherwise it starts a short‑lived helper.\n\nExamples:\n- Show current qdisc: `bash scripts/netem.sh status`\n- 120ms delay with 40ms jitter: `bash scripts/netem.sh delay 120 40`\n- 5% packet loss: `bash scripts/netem.sh loss 5`\n- Combine delay+loss: `bash scripts/netem.sh shape 120 20 2 10`\n- Egress bandwidth limit: `bash scripts/netem.sh rate 512kbps` (accepts `kbps|mbps` or `kbit|mbit`)\n- Combine all (with bandwidth): `bash scripts/netem.sh shape 120 20 2 10 1mbps`\n- Clear NetEm: `bash scripts/netem.sh clear`\n\nNotes:\n- This is real packet impairment below TCP, unlike Toxiproxy's slicer toxic which only fragments streams.\n- Requires NET_ADMIN capability; the `network-troubleshooting` container has it by default.\n- Bandwidth limiting uses a TBF child qdisc under the `netem` root and shapes egress on the target interface. It typically affects both directions of proxied flows since traffic in each direction egresses that interface.\n- You can override TBF tuning via env: `TBF_BURST` (default `32kbit`) and `TBF_LATENCY` (default `400ms`).\n\n## Latency Charts (Per Topic)\n\nBoth apps embed `ts=\u003cunix_ms\u003e|seq=\u003cn\u003e|` at the start of payloads. Receivers log per‑message latency as `latency_ms = recv_ts_ms − pub_ts_ms` with sequence. Use the helper to capture a time window and generate CSVs + a PNG‑based HTML report.\n\nRequirements:\n- Python 3\n- gnuplot (for charts)\n  - macOS: `brew install gnuplot`\n  - Ubuntu/Debian: `sudo apt-get update \u0026\u0026 sudo apt-get install -y gnuplot`\n  - Fedora: `sudo dnf install -y gnuplot`\n  - CentOS/RHEL: `sudo yum install -y gnuplot`\n  - Arch: `sudo pacman -S gnuplot`\n  - Alpine: `sudo apk add gnuplot`\n\nHelper script (reporting):\n- `bash scripts/latency-report.sh [--pre N] [--post N] [--] [command ...]`\n\nExamples:\n- Capture 5s before and 10s after a netem change:\n  `bash scripts/latency-report.sh --pre 5 --post 10 -- bash scripts/netem.sh shape 120 20 2 10 1mbps`\n- Capture around a toxiproxy latency change:\n  `bash scripts/latency-report.sh --pre 5 --post 10 -- bash scripts/mqtt-proxy.sh latency 120 40 both`\n\nOutputs (under `captures/latency-\u003cts\u003e/`):\n- `latency_offer.csv`, `latency_ride.csv`, `latency_location.csv` with columns: `seq,latency_ms,pub_ts_ms,recv_ts_ms`.\n- Missing publishes (not received within window): `latency_offer_missing.csv`, `latency_ride_missing.csv`, `latency_location_missing.csv` (seq, pub_ts_ms).\n- Per-second delivery rate CSV per topic: `rate_\u003ctopic\u003e.csv` with columns `second_unix,published,received,delivered_ratio`.\n- Summary files: `summary.json` and `summary.txt` with totals, delivered ratio, and latency stats (min/mean/p50/p95/p99/max) per topic.\n- HTML report: `index.html` summarizes stats and embeds generated PNG charts; requires `gnuplot` to render charts.\n\nNotes:\n- The script uses `docker logs --since/--until` to bound the time window. Adjust `--pre`/`--post` to include exactly the period you care about.\n- Open `index.html` to see: latency line charts, missing markers (red), publish vs receive rates, and delivered ratios. Click any image to expand it (full‑screen lightbox). If charts are missing, install `gnuplot` and rerun the report.\n\nWhat the charts mean:\n- Latency vs Seq: per‑message latency for received messages; x‑axis is published sequence.\n- Latency + Missing: same latency line plus red markers at y=0 for publishes with no receive inside the window.\n- Published vs Received per Second: published counts grouped by publish second; received counts by receive second (time on x‑axis).\n- Delivered Ratio per Pub‑Second: for each publish second, delivered/published ∈ [0,1].\n\nScreenshot\n\n![Latency report (sample)](docs/img/latency-report.jpg)\n\n## Profiling\n\n- Go backend\n  - Endpoint: `http://localhost:6060/debug/pprof`\n  - CPU profile 30s: `go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30`\n  - Heap: `curl -s http://localhost:6060/debug/pprof/heap \u003e heap.pb.gz`\n\n- Java client\n  - Base URL: `http://localhost:6061`\n  - Health: `GET /healthz`\n  - Thread dump: `GET /profiling/threads` (text/plain)\n  - Start JFR (60s): `POST /profiling/jfr/start?name=run1\u0026durationSec=60`\n  - Stop JFR: `POST /profiling/jfr/stop?name=run1` (returns path inside container)\n\nNote: JFR requires a JDK (we run on OpenJDK 17 slim). Retrieve the recorded JFR with `docker cp` if needed.\n\n## MQTT Gateway \u0026 EMQX\n\n- HAProxy runs with TCP logging enabled and acts as a front door to EMQX cluster via round‑robin.\n- EMQX 5.8 cluster (3 nodes) is formed with static seeds. Dashboard is exposed on http://localhost:18083 (admin/public).\n\n## MQTT Keepalive \u0026 Reconnect\n\n### Keepalive Basics\n- Purpose: Detect half‑open TCP connections without application pings.\n- Mechanism: Client must send any MQTT control packet within the negotiated keepalive interval; if idle, it sends PINGREQ. The broker replies with PINGRESP.\n- Broker side timeout: Most brokers (including EMQX) consider the connection dead after roughly 1.5 × keepalive with no incoming control packets.\n- Client side timeout:\n  - Java: handled internally by Paho; when PINGRESP is not received, `connectionLost()` fires and auto‑reconnect kicks in if enabled.\n  - Go: configured via `retry.ping_timeout_ms` (default 5000 ms). If a PINGRESP is not received within this window, the client treats the connection as lost.\n\nBoth apps use MQTT keepalive only (no app‑level ping/pong). Enable debug logs to see PINGREQ/PINGRESP traces.\n\n### Reconnect Backoff\n- Java client (Paho MqttAsyncClient):\n  - Auto‑reconnect enabled via `retry.automatic_reconnect: true`.\n  - Exponential backoff doubles per attempt and caps at `maxReconnectDelay` (Paho default ≈ 128 s if not set).\n  - Connect timeout controls how long each handshake may take (`retry.connect_timeout_ms`).\n  - Note: Paho Java’s `setMaxReconnectDelay(..)` expects seconds. Our YAML key `retry.max_reconnect_delay_ms` is milliseconds; the example below uses seconds for readability.\n- Go backend (paho.mqtt.golang):\n  - Auto‑reconnect enabled via `retry.enabled: true`.\n  - Exponential backoff with a cap at `retry.max_reconnect_interval_ms`.\n  - `retry.connect_timeout_ms` limits each connect attempt’s handshake; `retry.ping_timeout_ms` bounds how long to wait for PINGRESP.\n\n### Example: Paho Java Auto‑Reconnect Timeline\n\nConfig:\n\nkeepalive = 120 s → connection considered lost after ~180 s (1.5 × KA)\n\nconnect timeout = 30 s\n\nmaxReconnectDelay = default ≈ 128 s (2 min)\n\nBackoff pattern: 1 → 2 → 4 → 8 → 16 → 32 → 64 → 128 → 128 …\n\n⏱ Event Timeline (network down → up to 5 min)\n```\nt =   0 s  | Normal operation\nt = 180 s  | No packets for 1.5×KeepAlive → connectionLost() triggered\n            | Auto-reconnect loop starts\n──────────────────────────────────────────────────────────────────────\nAttempt #1  | delay=0 s   connect timeout=30 s   (180–210 s)\nAttempt #2  | delay=1 s   connect timeout=30 s   (211–241 s)\nAttempt #3  | delay=2 s   connect timeout=30 s   (243–273 s)\nAttempt #4  | delay=4 s   connect timeout=30 s   (277–307 s)\nAttempt #5  | delay=8 s   connect timeout=30 s   (315–345 s)\nAttempt #6  | delay=16 s  connect timeout=30 s   (361–391 s)\nAttempt #7  | delay=32 s  connect timeout=30 s   (423–453 s)\nAttempt #8  | delay=64 s  connect timeout=30 s   (517–547 s)\nAttempt #9+ | delay≈128 s (capped)               (beyond ~9 min if still offline)\n──────────────────────────────────────────────────────────────────────\n```\n\nNotes:\n- With clean sessions enabled (default), subscriptions are re‑issued on reconnect (both apps already do this in their connect handlers).\n- With persistent sessions (`clean_session: false`), the broker retains subscriptions and queued QoS 1/2 messages; both apps still safely resubscribe.\n- Under full blackhole conditions (no FIN/RST), detection depends on keepalive; expect `~1×KA` to send PINGREQ and up to `~1.5×KA` for disconnect at the broker side. Client‑side may trigger earlier if `ping_timeout_ms` elapses (Go) or Paho Java detects missing PINGRESPs.\n\n\n## Troubleshooting Container (netshoot)\n\nA persistent `nicolaka/netshoot` container named `network-troubleshooting` shares the network namespace with Toxiproxy for deep inspection.\n\n- Start automatically with compose: `docker compose up -d network-troubleshooting` (included in the default stack)\n- Shell: `docker exec -it network-troubleshooting bash`\n- Common tools available: tcpdump, tshark, dig, nslookup, curl, mtr, arping, tc, ss, iproute2.\n- Capture MQTT proxy traffic to pcap: `docker exec -it network-troubleshooting tcpdump -i eth0 -n port 18830 -w /tmp/mqtt.pcap`\n\nHelper script for capture: `bash scripts/capture.sh`\n- Save MQTT proxy traffic 60s to /tmp/mqtt-proxy.pcap: `bash scripts/capture.sh port 60 mqtt-proxy.pcap`\n- Save custom filter 30s: `bash scripts/capture.sh filter \"host mqtt-gateway\" 30 gw.pcap`\n - Live sniff (Ctrl+C to stop): `bash scripts/capture.sh live` (defaults to both directions)\n- Live to Wireshark via named pipe: `bash scripts/capture.sh live-wireshark` (defaults to both directions and auto-decodes MQTT)\n- List saved pcaps: `bash scripts/capture.sh list`\n- Copy to host: `bash scripts/capture.sh copy mqtt-proxy.pcap ./captures`\n\nPreset filters (aliases):\n- List presets: `bash scripts/capture.sh presets`\n- Capture with preset (30s default): `bash scripts/capture.sh preset cp 60 cp.pcap`\n- Live with preset: `bash scripts/capture.sh live-preset both`\n- Live Wireshark with preset: `bash scripts/capture.sh live-wireshark-preset pg`\n\nPreset names:\n- `cp` (client↔proxy): `port 18830`\n- `pg` (proxy↔gateway): `host mqtt-gateway and port 1883`\n- `both` (cp or pg): `port 18830 or (host mqtt-gateway and port 1883)`\n\nDesign choices: Toxiproxy toxics simulate stream conditions (latency, bandwidth, half‑open, fragmentation). For realistic packet loss/reordering/corruption, prefer NetEm.\n\n### Live Capture with Wireshark (from host)\n\nYou can stream packets from the netshoot helper into Wireshark running on your host.\n\nOption A — direct pipe (Linux):\n\n```bash\ndocker exec network-troubleshooting \\\n  tcpdump -i eth0 -U -s 0 -w - \\\n  'port 18830 or (host mqtt-gateway and port 1883)' | \\\n  wireshark -k -i -\n```\n\nOption B — named pipe (Linux/macOS):\n\nTerminal 1 (producer):\n\n```bash\nmkfifo /tmp/mqtt.pipe\ndocker exec network-troubleshooting \\\n  tcpdump -i eth0 -U -s 0 -w - \\\n  'port 18830 or (host mqtt-gateway and port 1883)' \u003e /tmp/mqtt.pipe\n```\n\nTerminal 2 (Wireshark):\n\n- Linux:\n  ```bash\n  wireshark -k -i /tmp/mqtt.pipe\n  ```\n- macOS:\n  ```bash\n  open -a Wireshark --args -k -i /tmp/mqtt.pipe\n  ```\n\nTips:\n- Use display filter `mqtt` or decode ports as MQTT: Analyze → Decode As… → select TCP port 18830/1883 → MQTT.\n- `-U` (unbuffered) and `-s 0` ensure low-latency, full-packet capture.\n- On Docker Desktop (macOS/Windows), capturing on host interfaces won’t see container traffic; the netshoot approach avoids that by sharing toxiproxy’s network namespace.\n\nShortcut: use the helper to set up the FIFO and launch Wireshark (defaults to both directions and sets MQTT decode)\n\n```bash\nbash scripts/capture.sh live-wireshark\n```\n\nNotes:\n- The helper creates a FIFO at `/tmp/mqtt.pipe` (override with `FIFO=/path`), starts tcpdump inside `network-troubleshooting`, and launches Wireshark on the host.\n- It also hints Wireshark to decode TCP ports 18830 and 1883 as MQTT via `-d tcp.port==18830,mqtt` and `-d tcp.port==1883,mqtt`.\n- Close Wireshark to stop capture; the helper cleans up the background tcpdump and removes the FIFO.\n\n## Operational Notes\n\n- Robustness: Both apps use auto‑reconnect, keepalive, inflight/buffers, and structured logs. They log connect/reconnect/disconnect with reasons, subscribe acks, publish results, buffer and inflight counts (every 1s), errors, and shutdown.\n- Graceful shutdown: SIGINT/SIGTERM stops publishers, flushes pending publishes, and disconnects cleanly.\n- Socket tuning: TCP keepalive and buffer sizes are configurable; Java uses a custom SocketFactory; Go uses a custom Dialer and adjusts TCP options.\n\n- Debug keepalive visibility: When debug is enabled in YAML (`log.debug: true`), both apps surface Paho client debug logs. This includes MQTT keepalive traces (PINGREQ/PINGRESP) where supported by the client libraries.\n\n- Liveness logging: Both apps log explicit transitions: `connection dead (lost connectivity)` and `connection alive (recovered)`.\n\n## Message Payload Format\n\n- Human‑readable prefixes include timestamp and sequence number for ordering. Payloads are padded with `x` to reach configured sizes when applicable.\n\n  - Java `/driver/location` example prefix: `ts=\u003cunix_ms\u003e|seq=\u003cn\u003e|xxxx...`\n  - Go publishes `/driver/offer` and `/driver/ride` with: `ts=\u003cunix_ms\u003e|seq=\u003cn\u003e|xxxx...`\n\n## Make It Your Own\n\n- Tweak `configs/*.yaml` and rebuild: `docker compose up --build`.\n- Change QoS, payload sizes, or the publish timers to stress buffering/inflight behavior.\n- Use Toxiproxy toxics to simulate handovers/drops and observe logs for backoff, reconnects, and queue dynamics.\n\n## Tests\n\n- Go unit tests: `cd go-backend \u0026\u0026 go test -cover ./...`\n- Go integration test: set `TEST_MQTT_BROKER=tcp://localhost:1883` then run `go test -run Integration ./...`\n  - Use Docker to bring a broker up: `docker compose up -d mqtt-gateway emqx1` (or full stack)\n- Java unit tests: `cd java-client \u0026\u0026 ./gradlew test`\n- Java integration test: set `TEST_MQTT_BROKER=tcp://localhost:1883` then run `./gradlew test`\n  - The integration test is automatically skipped if `TEST_MQTT_BROKER` is not set\n\nNotes:\n- Go toolchain is set to `go1.25` and the Dockerfile uses `golang:1.25-bookworm`.\n- YAML parsing uses battle‑tested libraries: `gopkg.in/yaml.v3` (Go) and SnakeYAML (Java).\n- Application‑level ping/pong topics were removed; MQTT keepalive interval defaults to 15s and is configurable via YAML.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibo%2Fsimple-mqtt-network-lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falibo%2Fsimple-mqtt-network-lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibo%2Fsimple-mqtt-network-lab/lists"}