{"id":16826771,"url":"https://github.com/retocode/knative-multicontainer-probing","last_synced_at":"2025-07-26T16:08:31.486Z","repository":{"id":217504595,"uuid":"743445267","full_name":"ReToCode/knative-multicontainer-probing","owner":"ReToCode","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-10T07:20:07.000Z","size":98,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-17T19:48:52.324Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ReToCode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-15T08:55:11.000Z","updated_at":"2024-07-10T07:20:10.000Z","dependencies_parsed_at":"2024-01-16T22:31:09.554Z","dependency_job_id":"6731ff98-7a39-4ab6-abc1-a71d38d857a7","html_url":"https://github.com/ReToCode/knative-multicontainer-probing","commit_stats":null,"previous_names":["retocode/knative-multicontainer-probing"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ReToCode/knative-multicontainer-probing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReToCode%2Fknative-multicontainer-probing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReToCode%2Fknative-multicontainer-probing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReToCode%2Fknative-multicontainer-probing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReToCode%2Fknative-multicontainer-probing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ReToCode","download_url":"https://codeload.github.com/ReToCode/knative-multicontainer-probing/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ReToCode%2Fknative-multicontainer-probing/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267191045,"owners_count":24050318,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T11:18:22.617Z","updated_at":"2025-07-26T16:08:31.465Z","avatar_url":"https://github.com/ReToCode.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Knative multi-container probing\n\n## Setup\n```bash\n# cert-manager, net-certmanager\nkubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.yaml\nkubectl wait --for=condition=Established --all crd\nkubectl wait --for=condition=Available -n cert-manager --all deployments\n\n# serving\nko apply --selector knative.dev/crd-install=true -Rf config/core/\nkubectl wait --for=condition=Established --all crd\nko apply -Rf config/core/\n\n# kourier\nko apply -Rf config\n\n# config patches\nkubectl patch configmap/config-network \\\n  --namespace knative-serving \\\n  --type merge \\\n  --patch '{\"data\":{\"ingress-class\":\"kourier.ingress.networking.knative.dev\"}}'\nkubectl patch configmap/config-domain \\\n  --namespace knative-serving \\\n  --type merge \\\n  --patch \"{\\\"data\\\":{\\\"172.17.0.100.sslip.io\\\":\\\"\\\"}}\"\n  \n# enable request logging\n## Activator and Q-P\nkubectl patch configmap/config-observability \\\n  --namespace knative-serving \\\n  --type merge \\\n  --patch '{\"data\":{\"logging.enable-request-log\":\"true\"}}'\n  \n## Kourier gateway\nkubectl patch configmap/config-kourier \\\n  --namespace knative-serving \\\n  --type merge \\\n  --patch '{\"data\":{\"logging.enable-request-log\":\"true\"}}'\n```\n\n## Testing single containers\n\n```bash\nkubectl apply -f 1-single-container/xx.yaml\n```\n\n```bash\nkubectl apply -f 1-single-container/1-ksvc-default.yaml\nkubectl apply -f 1-single-container/6-ksvc-exec-probe-readiness.yaml\n```\n\n### Cross port probing\n\nIt is possible to test the health on another port than the data-traffic:\n\n```bash\nkubectl apply -f 1-single-container/12-cross-port-readiness.yaml\n\n# The tests will be routed through Queue-Proxy:\n#  - name: SERVING_READINESS_PROBE\n#    value: '{\"httpGet\":{\"port\":8090,\"host\":\"127.0.0.1\",\"scheme\":\"HTTP\",\"httpHeaders\":[{\"name\":\"K-Kubelet-Probe\",\"value\":\"queue\"}]},\"successThreshold\":1}'\n```\n\n### Exec probes\n\n#### Readiness\n\n```bash\nko apply -f 1-single-container/10-ksvc-exec-probe-readiness.yaml\n\n# this creates a healthy server, all is ok\n\n# start failing the exec probes\ncurl -iv http://runtime.default.172.17.0.100.sslip.io/toggleExec\n\n# Knative is happy\nk get ksvc,configuration,revision,king,sks\n\nNAME                                  URL                                            LATESTCREATED   LATESTREADY     READY   REASON\nservice.serving.knative.dev/runtime   http://runtime.default.172.17.0.100.sslip.io   runtime-00001   runtime-00001   True\n\nNAME                                        LATESTCREATED   LATESTREADY     READY   REASON\nconfiguration.serving.knative.dev/runtime   runtime-00001   runtime-00001   True\n\nNAME                                         CONFIG NAME   K8S SERVICE NAME   GENERATION   READY   REASON   ACTUAL REPLICAS   DESIRED REPLICAS\nrevision.serving.knative.dev/runtime-00001   runtime                          1            True             0                 1\n\nNAME                                              READY   REASON\ningress.networking.internal.knative.dev/runtime   True\n\nNAME                                                              MODE    ACTIVATORS   SERVICENAME     PRIVATESERVICENAME      READY     REASON\nserverlessservice.networking.internal.knative.dev/runtime-00001   Proxy   3            runtime-00001   runtime-00001-private   Unknown   NoHealthyBackends\n\n# Queue-Proxy readiness is ok\nkubectl exec deployment/curl -n default -it -- curl -iv http://10.42.0.29:8012 -H \"K-Network-Probe: queue\" -H \"K-Kubelet-Probe: value\"\nHTTP/1.1 200 OK\n\n# Kubernetes is not happy, endpoints are removed from private service\nk get endpoints,pod\nNAME                              ENDPOINTS                         AGE\nendpoints/runtime-00001           10.42.0.20:8012,10.42.0.20:8112   2m58s\nendpoints/runtime-00001-private                                     2m58s\n\nNAME                                          READY   STATUS    RESTARTS       AGE\npod/runtime-00001-deployment-c56d87d5-rj5nl   1/2     Running   0              2m58s\n\n# we cannot call the service any longer, as traffic will be sent do activator (10.42.0.20) \n# activator is holding the requests until the probe is ready again until you get\nHTTP/1.1 504 Gateway Timeout\nactivator request timeout%\n```\n\n#### Summary\n* This works as expected traffic wise, even though Knative does not reflect the state properly\n* Queue-Proxy probe is reflecting the \"wrong\" state\n* This only works because Knative threads exec readiness probes differently. Normally, activator would send requests to all Pods that pass Queue-Proxy readiness checks. In this case, Queue-Proxy readiness is ok, but traffic is still not forwarded. When an exec readiness probe is present, Activator waits for the endpoints to be populated as \"ready\" (`addresses` field) by K8s.\n* The situation will not resolve itself, requests will be buffered until that probe is good again\n\n\n#### Liveness\n\n```bash\nko apply -f 1-single-container/11-ksvc-exec-probe-liveness.yaml\n\n# this creates a healthy server, all is ok\n\n# start failing the liveness probe\ncurl -iv http://runtime.default.172.17.0.100.sslip.io/toggleExec\n\n# Knative is happy\n\n# Kubernetes is not happy for a short time and will restart the user-container\n\n# Queue-Proxy tries to forward requests, but will error out:\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-18T14:36:05.753218441Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 8\\nTCP: inuse 0 orphan 0 tw 15 alloc 185 mem 0\\nUDP: inuse 0 mem 0\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/runtime-00001\",\"knative.dev/pod\":\"runtime-00001-deployment-7b9c49d676-dlxmt\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52782\", \"serverIp\": \"10.42.0.34\", \"referer\": \"\", \"latency\": \"0.001448584s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52782\", \"serverIp\": \"10.42.0.34\", \"referer\": \"\", \"latency\": \"0.000335542s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-18T14:36:06.781777885Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 8\\nTCP: inuse 0 orphan 0 tw 15 alloc 185 mem 0\\nUDP: inuse 0 mem 0\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/runtime-00001\",\"knative.dev/pod\":\"runtime-00001-deployment-7b9c49d676-dlxmt\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52782\", \"serverIp\": \"10.42.0.34\", \"referer\": \"\", \"latency\": \"0.00029575s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-18T14:36:07.815057125Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 8\\nTCP: inuse 0 orphan 0 tw 15 alloc 185 mem 0\\nUDP: inuse 0 mem 0\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/runtime-00001\",\"knative.dev/pod\":\"runtime-00001-deployment-7b9c49d676-dlxmt\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52782\", \"serverIp\": \"10.42.0.34\", \"referer\": \"\", \"latency\": \"0.000280208s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-18T14:36:08.847556037Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 8\\nTCP: inuse 0 orphan 0 tw 15 alloc 185 mem 0\\nUDP: inuse 0 mem 0\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/runtime-00001\",\"knative.dev/pod\":\"runtime-00001-deployment-7b9c49d676-dlxmt\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-18T14:36:09.897340832Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 8\\nTCP: inuse 0 orphan 0 tw 15 alloc 185 mem 0\\nUDP: inuse 0 mem 0\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/runtime-00001\",\"knative.dev/pod\":\"runtime-00001-deployment-7b9c49d676-dlxmt\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\n\n# The user sees\n* Connection #0 to host runtime.default.172.17.0.100.sslip.io left intact\n*   Trying 172.17.0.100:80...\n* Connected to runtime.default.172.17.0.100.sslip.io (172.17.0.100) port 80\n\u003e GET / HTTP/1.1\n\u003e Host: runtime.default.172.17.0.100.sslip.io\n\u003e User-Agent: curl/8.4.0\n\u003e Accept: */*\n\u003e\n\u003c HTTP/1.1 502 Bad Gateway\nHTTP/1.1 502 Bad Gateway\n\u003c content-length: 53\ncontent-length: 53\n\u003c content-type: text/plain; charset=utf-8\ncontent-type: text/plain; charset=utf-8\n\u003c date: Thu, 18 Jan 2024 14:36:09 GMT\ndate: Thu, 18 Jan 2024 14:36:09 GMT\n\u003c x-content-type-options: nosniff\nx-content-type-options: nosniff\n\u003c x-envoy-upstream-service-time: 1\nx-envoy-upstream-service-time: 1\n\u003c server: envoy\nserver: envoy\n\n\u003c\ndial tcp 127.0.0.1:8080: connect: connection refused\n```\n\n#### Summary\n* Exec liveness probes do have a race-condition\n* As Queue-Proxy is not aware of the (restarting) state of the User-Container, it tries to send traffic to a closed socket\n* For a short period of time, this causes errors to be propagated to the caller outside the system\n* This can be omitted, when the readiness probe fails at the same time as the liveness probe, then traffic is removed during restart \n\n\n## Testing LivenessProbes with multiple containers\n\n* Disable the webhook validation upfront\n\n```bash\nkubectl apply -f 2-multi-container/2-ksvc-default-liveness.yaml\n```\n\n```bash\nko apply -f 2-multi-container/4-ksvc-liveness-toggle.yaml\n\n# toggle the main containers liveness to false\ncurl  -iv http://test-probe.default.172.17.0.100.sslip.io/toggleLive\n\n# Check the Queue-Proxys readiness probe\nkubectl exec deployment/curl -n default -it -- curl -iv http://10.42.0.18:8012 -H \"K-Network-Probe: queue\" -H \"K-Kubelet-Probe: value\"\nHTTP/1.1 200 OK\n\n# K8s will restart the first container, but Knative will not know about this\nStream closed EOF for default/test-probe-00001-deployment-78cbfd5cb6-tmsfm (first-container)\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-19T07:10:12.650384688Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 9\\nTCP: inuse 0 orphan 0 tw 25 alloc 168 mem 0\\nUDP: inuse 0 mem 256\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/test-probe-00001\",\"knative.dev/pod\":\"test-probe-00001-deployment-78cbfd5cb6-tmsfm\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52686\", \"serverIp\": \"10.42.0.36\", \"referer\": \"\", \"latency\": \"0.000348248s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-19T07:10:13.67737702Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 9\\nTCP: inuse 0 orphan 0 tw 25 alloc 168 mem 0\\nUDP: inuse 0 mem 256\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/test-probe-00001\",\"knative.dev/pod\":\"test-probe-00001-deployment-78cbfd5cb6-tmsfm\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52686\", \"serverIp\": \"10.42.0.36\", \"referer\": \"\", \"latency\": \"0.000290081s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-19T07:10:14.726464988Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 9\\nTCP: inuse 0 orphan 0 tw 25 alloc 168 mem 0\\nUDP: inuse 0 mem 256\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/test-probe-00001\",\"knative.dev/pod\":\"test-probe-00001-deployment-78cbfd5cb6-tmsfm\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52686\", \"serverIp\": \"10.42.0.36\", \"referer\": \"\", \"latency\": \"0.000768828s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-19T07:10:15.766776681Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 9\\nTCP: inuse 0 orphan 0 tw 25 alloc 168 mem 0\\nUDP: inuse 0 mem 256\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/test-probe-00001\",\"knative.dev/pod\":\"test-probe-00001-deployment-78cbfd5cb6-tmsfm\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52686\", \"serverIp\": \"10.42.0.36\", \"referer\": \"\", \"latency\": \"0.000718329s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\nqueue-proxy {\"severity\":\"ERROR\",\"timestamp\":\"2024-01-19T07:10:16.807792449Z\",\"logger\":\"queueproxy\",\"caller\":\"network/error_handler.go:33\",\"message\":\"error reverse proxying request; sockstat: sockets: used 9\\nTCP: inuse 0 orphan 0 tw 25 alloc 168 mem 0\\nUDP: inuse 0 mem 256\\nUDPLITE: inuse 0\\nRAW: inuse 0\\nFRAG: inuse 0 memory 0\\n\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/test-probe-00001\",\"knative.dev/pod\":\"test-probe-00001-deployment-78cbfd5cb6-tmsfm\",\"error\":\"dial tcp 127.0.0.1:8080: connect: connection refused\",\"stacktrace\":\"knative.dev/pkg/network.ErrorHandler.func1\\n\\tknative.dev/pkg@v0.0.0-20240115132401-f95090a164db/network/error_handler.go:33\\nnet/http/httputil.(*ReverseProxy).ServeHTTP\\n\\tnet/http/httputil/reverseproxy.go:475\\nknative.dev/serving/pkg/queue.(*appRequestMetricsHandler).ServeHTTP\\n\\tknative.dev/serving/pkg/queue/request_metric.go:199\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ProxyHandler.func3\\n\\tknative.dev/serving/pkg/queue/handler.go:76\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/queue/sharedmain.mainHandler.ForwardedShimHandler.func4\\n\\tknative.dev/serving/pkg/queue/forwarded_shim.go:54\\nnet/http.HandlerFunc.ServeHTTP\\n\\tnet/http/server.go:2136\\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\\n\\tknative.dev/serving/pkg/http/handler/timeout.go:118\"}\nqueue-proxy {\"httpRequest\": {\"requestMethod\": \"GET\", \"requestUrl\": \"/\", \"requestSize\": \"0\", \"status\": 502, \"responseSize\": \"53\", \"userAgent\": \"curl/8.4.0\", \"remoteIp\": \"10.42.0.20:52686\", \"serverIp\": \"10.42.0.36\", \"referer\": \"\", \"latency\": \"0.000729954s\", \"protocol\": \"HTTP/1.1\"}, \"traceId\": \"]\"}\n\n# Depending on the timing, it is possible that errors propagate to the caller:\n* Connection #0 to host test-probe.default.172.17.0.100.sslip.io left intact\n*   Trying 172.17.0.100:80...\n* Connected to test-probe.default.172.17.0.100.sslip.io (172.17.0.100) port 80\n\u003c HTTP/1.1 502 Bad Gateway\nHTTP/1.1 502 Bad Gateway\n\u003c server: envoy\nserver: envoy\n\u003e\ndial tcp 127.0.0.1:8080: connect: connection refused\n```\n\nThe same test with the liveness-probe on the second container\n\n```bash\n# use a curl pod to deactivate different probes\nkubectl exec deployment/curl -n default -it -- curl -iv http://\u003cpod-ip\u003e:8090/toggleLive\n\n# Logs\n# K8s will restart the sidecar container\nsecond-container Liveness probe called, responding with:  false\nfirst-container Liveness probe called, responding with:  true\nsecond-container Liveness probe called, responding with:  false\nfirst-container Liveness probe called, responding with:  true\nqueue-proxy {\"severity\":\"INFO\",\"timestamp\":\"2024-01-19T07:12:41.367340204Z\",\"logger\":\"queueproxy\",\"caller\":\"sharedmain/handlers.go:107\",\"message\":\"Attached drain handler from user-container\u0026{GET /wait-for-drain HTTP/1.1 1 1 map[Accept:[*/*] Accept-Encoding:[gzip] User-Agent:[kube-lifecycle/1.28]] {} \u003cnil\u003e 0 ] false 10.42.0.36:8022 map] map] \u003cnil\u003e map] 10.42.0.1:40470 /wait-for-drain \u003cnil\u003e \u003cnil\u003e \u003cnil\u003e 0x4000467130}\",\"commit\":\"d96dabb-dirty\",\"knative.dev/key\":\"default/test-probe-00001\",\"knative.dev/pod\":\"test-probe-00001-deployment-78cbfd5cb6-tmsfm\"}\nStream closed EOF for default/test-probe-00001-deployment-78cbfd5cb6-tmsfm (second-container)\nfirst-container Liveness probe called, responding with:  true\nfirst-container Liveness probe called, responding with:  true\n\n# Now it depends on what the sidecar actually does. If it is important for the request-path, users could see errors as well\n# We also seem to have an issue here, K8s attaches the wait-for-drain hook, but our pod is immediately terminated anyway.\n```\n\n### Summary\n\n* Liveness probes can not be enabled without interference. We need to\n  * Add the additional header also for additional probes\n  * Investigate why the wait-for-drain hook is not working (or not holding the SIGTERM long enough)\n* Also. there can be race conditions:\n  * Queue-Proxy (and other Knative components) will not know about the UC being not live and being restarted. We'll see `HTTP/1.1 503 Service Unavailable` when calling the Knative Service\n  * The same applies for sidecars. Queue-Proxy (and other Knative components) will not know about this. Depending on what the sidecar does, this can cause issues\n  * Users are required to also fail ReadinessProbes to make sure traffic is removed from a restarting service. Note: there is still no timing guarantees here, but this is the same issue with vanilla K8s workload\n\n\n## Testing ReadinessProbes with multiple containers\n\n```bash\nko apply -f 2-multi-container/5-ksvc-readiness-toggle.yaml\n\n# toggle readiness in main container\nkubectl exec deployment/curl -n default -it -- curl -iv http://10.42.0.37:8080/toggleReady\n\n# Knative Service is not ready, as we are waiting for Endpoints\nk get ksvc\nNAME         URL                                               LATESTCREATED      LATESTREADY   READY     REASON\ntest-probe   http://test-probe.default.172.17.0.100.sslip.io   test-probe-00001                 Unknown   RevisionMissing\n\nk get configuration\nNAME         LATESTCREATED      LATESTREADY   READY     REASON\ntest-probe   test-probe-00001                 Unknown\n\nk get king\nNo resources found in default namespace.\n\n# Knative Service returns a 404\ncurl -iv http://test-probe.default.172.17.0.100.sslip.io\nHTTP/1.1 404 Not Found\n\n# set the second container to ready\nkubectl exec deployment/curl -n default -it -- curl -iv http://10.42.0.37:8090/toggleReady\n\n# every thing works now as expected\n\n# set the first container to not ready\nkubectl exec deployment/curl -n default -it -- curl -iv http://10.42.0.37:8080/toggleReady\n\n# QP knows about it and starts polling again\nqueue-proxy context deadline exceeded\nfirst-container Readiness probe called, responding with:  false\nfirst-container Readiness probe called, responding with:  false\n\n# QPs own readiness-probe starts to fail:\nkubectl exec deployment/curl -n default -it -- curl -iv http://10.42.0.37:8012 -H \"K-Network-Probe: queue\" -H \"K-Kubelet-Probe: value\"\nHTTP/1.1 503 Service Unavailable\n\n# Knative will only reflect the error in the SKS\nk get configuration,ksvc,king,sks\n\nNAME                                           LATESTCREATED      LATESTREADY        READY   REASON\nconfiguration.serving.knative.dev/test-probe   test-probe-00001   test-probe-00001   True\n\nNAME                                     URL                                               LATESTCREATED      LATESTREADY        READY   REASON\nservice.serving.knative.dev/test-probe   http://test-probe.default.172.17.0.100.sslip.io   test-probe-00001   test-probe-00001   True\n\nNAME                                                 READY   REASON\ningress.networking.internal.knative.dev/test-probe   True\n\nNAME                                                                 MODE    ACTIVATORS   SERVICENAME        PRIVATESERVICENAME         READY     REASON\nserverlessservice.networking.internal.knative.dev/test-probe-00001   Proxy   3            test-probe-00001   test-probe-00001-private   Unknown   NoHealthyBackends\n\n# But K8s has removed the endpoints:\nk get endpoints -n default test-probe-00001-private\nNAME                       ENDPOINTS   AGE\ntest-probe-00001-private               7m19s\n\n# So we are sending traffic to activator now, who will log requests will be hold there until the timeout is reached OR\n# the pod gets ready again \n{\"severity\":\"WARNING\",\"timestamp\":\"2024-01-16T14:53:44.641328597Z\",\"logger\":\"activator\",\"caller\":\"net/revision_backends.go:342\",\"message\":\"Failed probing pods\",\"commit\":\"d96dabb-dirty\",\"knative.dev/controller\":\"activator\",\"knative.dev/pod\":\"activator-865458fff9-5fgpf\",\"knative.dev/key\":\"default/test-probe-00001\",\"curDests\":{\"ready\":\"\",\"notReady\":\"10.42.0.18:8012\"},\"error\":\"error roundtripping http://10.42.0.18:8012/healthz: context deadline exceeded\"}\n{\"severity\":\"INFO\",\"timestamp\":\"2024-01-16T14:53:44.641432431Z\",\"logger\":\"activator\",\"caller\":\"net/revision_backends.go:328\",\"message\":\"Need to reprobe pods who became non-ready\",\"commit\":\"d96dabb-dirty\",\"knative.dev/controller\":\"activator\",\"knative.dev/pod\":\"activator-865458fff9-5fgpf\",\"knative.dev/key\":\"default/test-probe-00001\",\"IPs\":{\"keys\":\"10.42.0.18:8012\"}}\n{\"severity\":\"INFO\",\"timestamp\":\"2024-01-16T14:53:44.641752724Z\",\"logger\":\"activator\",\"caller\":\"net/throttler.go:331\",\"message\":\"Updating Revision Throttler with: clusterIP = \u003cnil\u003e, trackers = 0, backends = 0\",\"commit\":\"d96dabb-dirty\",\"knative.dev/controller\":\"activator\",\"knative.dev/pod\":\"activator-865458fff9-5fgpf\",\"knative.dev/key\":\"default/test-probe-00001\"}\n{\"severity\":\"INFO\",\"timestamp\":\"2024-01-16T14:53:44.641780141Z\",\"logger\":\"activator\",\"caller\":\"net/throttler.go:323\",\"message\":\"Set capacity to 0 (backends: 0, index: 0/1)\",\"commit\":\"d96dabb-dirty\",\"knative.dev/controller\":\"activator\",\"knative.dev/pod\":\"activator-865458fff9-5fgpf\",\"knative.dev/key\":\"default/test-probe-00001\"}\n\n# So client request will hang, and eventually timeout (or work again when probes get healthy again).\n```\n\nFor the same test with the second container we have:\n\n```bash\n# do the same as above, until everything is ready and works\n\n# make second container not ready\nkubectl exec deployment/curl -n default -it -- curl -iv http://10.42.0.18:8090/toggleReady\n\n# now it gets really good, K8s is removing the endpoint because the Pod is not totally ready\n# Traffic will be sent to activator\nk get endpoints -n default\nNAME                       ENDPOINTS                         AGE\nkubernetes                 192.168.5.1:6443                  42d\ntest-probe-00001           10.42.0.17:8012,10.42.0.17:8112   14m\ntest-probe-00001-private                                     14m\n\n# Knative will internally not know that something is wrong, but because K8s fails the pod status\n# SKS thinkgs there are no healthy backends\nk get configuration,ksvc,king,sks\n\nNAME                                           LATESTCREATED      LATESTREADY        READY   REASON\nconfiguration.serving.knative.dev/test-probe   test-probe-00001   test-probe-00001   True\n\nNAME                                     URL                                               LATESTCREATED      LATESTREADY        READY   REASON\nservice.serving.knative.dev/test-probe   http://test-probe.default.172.17.0.100.sslip.io   test-probe-00001   test-probe-00001   True\n\nNAME                                                 READY   REASON\ningress.networking.internal.knative.dev/test-probe   True\n\nNAME                                                                 MODE    ACTIVATORS   SERVICENAME        PRIVATESERVICENAME         READY     REASON\nserverlessservice.networking.internal.knative.dev/test-probe-00001   Proxy   3            test-probe-00001   test-probe-00001-private   Unknown   NoHealthyBackends\n\n# But traffic will still work, which it should not\ncurl  -iv http://test-probe.default.172.17.0.100.sslip.io\nHTTP/1.1 200 OK\n\n# because Activator is only checking QP health, which does not know about the additional container or the pods overall health.\n```\n\n### Summary\n* Readiness probes on a single container work as expected\n* Initially, deployments must become ready at least once to make the `Revision` progress. Without a revision, no `KIngress` is generated. So we can say, we depend on a full K8s initial ready state to progress with KService initialization\n* On additional containers, the readiness probes currently do not work, as Knative and ingress layer is not aware of the additional checks\n* The state we represent in the SKS is correct, but does not change routing.\n* Activator knows where to send requests to, even when endpoints on private service is not populated\n  * Activator uses `notReadyAddresses` field in `Endpoints` to do its own probing. If the main probe on QP is ok, activator will forward requests to that pod, even when Kubernetes does not populate it in `Endpoints` and is failing its overall `Pod` readiness.\n  * We definitely need to aggregate the readiness of every container in Queue-Proxy to make this consistent\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fretocode%2Fknative-multicontainer-probing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fretocode%2Fknative-multicontainer-probing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fretocode%2Fknative-multicontainer-probing/lists"}