Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sapcc/go-pmtud
go-pmtud is a simplified implementation of cloudflare/pmtud in Go.
https://github.com/sapcc/go-pmtud
bpg ecmp kubernetes path-mtu-discovery
Last synced: 11 days ago
JSON representation
go-pmtud is a simplified implementation of cloudflare/pmtud in Go.
- Host: GitHub
- URL: https://github.com/sapcc/go-pmtud
- Owner: sapcc
- License: apache-2.0
- Created: 2019-12-19T17:14:32.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-11-07T01:00:58.000Z (14 days ago)
- Last Synced: 2024-11-07T02:16:48.471Z (13 days ago)
- Topics: bpg, ecmp, kubernetes, path-mtu-discovery
- Language: Go
- Homepage:
- Size: 17.2 MB
- Stars: 8
- Watchers: 34
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# go-pmtud
[![CI](https://github.com/sapcc/go-pmtud/actions/workflows/ci.yaml/badge.svg)](https://github.com/sapcc/go-pmtud/actions/workflows/ci.yaml)
[![Go Report Card](https://goreportcard.com/badge/github.com/sapcc/go-pmtud)](https://goreportcard.com/report/github.com/sapcc/go-pmtud)`go-pmtud` is a simplified implementation of [cloudflare/pmtud](https://github.com/cloudflare/pmtud) in Go.
## Problem
Using ECMP (Equal Cost Multi Path) on bare metal Kubernetes clusters makes load sharing of traffic possible (e.g. by using service addresses of type `ExternalIP`).
Hosts (and Pods) try to leverage full MTU size that is derived from their interface configuration (e.g. 9000 bytes).
If `(1)` MTU is smaller somewhere in the path between sender and receiver and `(2)` packet has a DF (do-not-fragment) bit set, router sends ICMP Destination Unreachable message (type 3 code 4 message) to sender (originator of too large packets).
In case of ECMP it may not reach the original sender, thus breaking the communication.
More details in this blog post by Cloudflare: [Path MTU discovery in practice](https://blog.cloudflare.com/path-mtu-discovery-in-practice/).
go-pmtud replicates ICMP Destination Unreachable packets to all nodes in same Kubernetes cluster, so that the sender gets awareness that it has to use smaller packets for a particular destination.
## Concept
1. ICMP Destination Unreachable message (type 3 code 4 message) packets are filtered and sent to specific NFlog group by iptables.
2. go-pmtud replicates ICMP packets to all nodes in same Kubernetes cluster, so that the sender pod gets awareness that it has to use smaller packets for particular destination.
```
Exec into the pod:# ip route get 192.168.100.10
192.168.100.10 via 192.100.0.1 dev eth0 src 192.100.0.50
cache expires 484sec mtu 9000 <<<< connection is failing# ip route get 192.168.100.10
192.168.100.10 via 192.100.0.1 dev eth0 src 192.100.0.50
cache expires 484sec mtu 8996 <<<< correct MTU information, connection is working
```## Build
Build from source:
```
go mod download
go build -v -o /go-pmtud cmd/go-pmtud/main.go
```Build a Docker image:
```
docker build -t go-pmtud .
```## go-pmtud options
Following options are available:
1. peers - resend ICMP frag-needed packets to this peer list.
2. iface - interface that listens for ICMP packets and resends them to other peers.
3. nodename - node hostname, used for metric label.
4. nflog-group - NFLOG group, set to 33 in our case.
5. metrics-port - Port for Prometheus metrics (30040 by default).
6. ttl - TTL of replicated ICMP packets.
7. ignore-networks - Do not resend ICMP frag-needed packets originated from specified networksIf `iface` is empty, it finds out the outgoing interface based on the default route.
## Example - go-pmtud Daemonset
go-pmtud can run as a [Daemonset](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/), [example](https://github.com/sapcc/helm-charts/blob/master/system/go-pmtud).
Example values.yaml:
```
images:
iptables:
repository: sapcc/iptables
tag: v20191226161919
pmtud:
repository: sapcc/go-pmtud
tag: latestiptables:
nflogGroup: 33
ignoreSourceNetworks: 192.168.100.0/24pmtud:
ttl: 10
metricsPort: 30040
interface: eth0
peers: 192.168.100.2, 192.168.100.3, 192.168.100.4, 192.168.100.5, 192.100.0.50
```## Example - iptables and NFlog
There is an iptables rule on each node that redirects ICMP Destination Unreachable` packets to NFlog group nr. 33:
`iptables -t raw -D PREROUTING -i -p icmp -m icmp --icmp-type 3/4 --j NFLOG --nflog-group 33`
Important: we need ignore packets from summarized source networks of all nodes in the local cluster to avoid re-sending loops. Use `ignore-networks` option for this.
This means a node will not re-send already retransmitted ICMP messages. It will only resend messages that are usually originated by routers on the path.## License
This project is licensed under the Apache2 License - see the [LICENSE](LICENSE) file for details