{"id":20331133,"url":"https://github.com/comcast/ravel","last_synced_at":"2025-10-07T12:42:27.047Z","repository":{"id":38816924,"uuid":"233913039","full_name":"Comcast/Ravel","owner":"Comcast","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-27T21:20:08.000Z","size":54778,"stargazers_count":9,"open_issues_count":6,"forks_count":4,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-11T21:07:22.790Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Comcast.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-14T18:55:54.000Z","updated_at":"2024-12-03T12:12:44.000Z","dependencies_parsed_at":"2024-06-19T22:51:59.499Z","dependency_job_id":"a6dad053-661d-4ab5-aed0-c9d3139f678b","html_url":"https://github.com/Comcast/Ravel","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Comcast%2FRavel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Comcast%2FRavel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Comcast%2FRavel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Comcast%2FRavel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Comcast","download_url":"https://codeload.github.com/Comcast/Ravel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248480434,"owners_count":21110937,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T20:18:53.452Z","updated_at":"2025-10-07T12:42:22.015Z","avatar_url":"https://github.com/Comcast.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"#  Ravel Cluster Load Balancer\n\n![Ravel Logo](ravel_logo.png?raw=true)\n\nRavel is a high-performance cluster load balancer for bare-metal deployments of Kubernetes. It supports L2 direct-server-reply load balancing using LVS, as well as L3 load balancing through a BGP integration with GoBGP.\n\nRavel features include:\n\n- Dynamically updating configuration\n- Multiple persistent VIP addresses\n- Shared VIP addresses across multiple services\n- High availability with sub-millisecond failover\n- IPV4 load balancing\n- IPV6 load balancing\n- TCP \u0026 UDP traffic\n- Direct traffic injection to Kubernetes service chains\n- Direct reply mode for cluster-width egress bandwidth\n- Semantic configuration via configmap\n- Port forwarding\n- Operational metrics\n- Per-VIP usage Statistics\n- Default service configurations for unclaimed VIP addresses\n- In-cluster load balancing. No separate tier required\n- Automatic removal of Unschedulable or NotReady nodes from backends\n- Automatic updates to inbound load balancing rules in response to Configmap changes\n- Clean up on exit\n\nComing soon for Ravel:\n\n- Kubernetes LoadBalancer controller support\n- [MTU suppport](https://github.com/Comcast/Ravel/pull/20)\n- [Daemonset mode](https://github.com/Comcast/Ravel/pull/16)\n- Linux dev environment with minikube and a local binary\n- Public CI\n- Go modules\n\n## Architecture\n\nThe general idea is to make a set of kubernetes pods that can all do the same work\nlook like they're a single, very powerful machine,\nby getting the rest of the world to see a single IP address (a Virtual IP address, a \"VIP\")\nthat can be used to access any of the pods.\nKubernetes has services that programmers can use to group the pods by the work they do.\nBecause there's a limited number of IP addresses, RDEI assigns each service a port in the VIP.\nThere's not enough IP addresses to provide every service with its own exclusive VIP.\nYou'd also still need to assign a TCP port number.\nThis drives the balancing to be done with a VIP:port per load balanced service.\n\nA load balancer has to respond to 2 sets of dynamic inputs:\n\n1. Respond to kubernetes changes - A set of VIPs, pods moving, services appearing and disappearing.\n2. Respond to RDEI - kubernetes services gettting matched to a virtual IP address (VIP) and port number.\n\nThe ultimate goal is getting packets to their desired destination,\na pod that can do the work.\n\nThis particular load balancer has 3 tasks that it has to do to respond to inputs,\nand obtain its goal.\n\n1. Get packets arriving from a load balanced VIP:port to a pod that matches.\nKubernetes pod and service information is the key piece here,\nalong with what VIP:port matches what service.\n2. Get packets from the rest of the world arriving with a load balanced VIP:port\nto a compute node that's running a pod or pods in the service that matches that VIP:port.\nVIP:port to service, from RDEI, compute node and which services and pods run on compute nodes\nfrom kubernetes API are the key inputs.\n3. Tell a router that some machine(s) can and should receive packets and TCP connections\nfor particular VIPs. This falls out of the other two requirements,\nand the way Internet Protocol is routed.\n\nWe did this with 2 levels or tiers,\none that directs traffic from VIP:port to a compute node that runs appropriate pods,\nand one tier that gets packets from VIP:port to an appropriate pod on the same machine.\nThe firsts level is a \"director\", the second is a \"realserver\", borrowing terminology\nfrom the IPVS project.\n\n\n### Attract Packets and connections\n\nThis load balancer has two \"directors\", one that can use ARP to tell the top-of-rack-router\nthat its machine can receive packets for a list of IP addresses (the VIPs),\nand one that uses a subsidiary BGP daemon ([gobgpd](https://osrg.github.io/gobgp/))\nto tell the top-of-rack-router what VIPs can be received.\n\nThe advantage of using ARP is its simplicity.\nARP is traditionally how a router figures out which MAC address matches a destination IP address.\nARP is a simple, well-known protocol with user level APIs and administrative utilities.\nThe director process can issue gratuitous ARPs easily without extra configuration.\n\nThe disadvantage of using ARP to tell a router which VIPs can be handled, is that by design, \nARP matches 1 MAC address to 1 IP address. There is really no way to have \"multi-headed\" load\nbalancers using ARP to attract packets. A secondary disadvantage is that the load balancer director\nmachine has to be on the same subnet as the top-of-rack-router, and so do VIPs.\nSince ARP uses MAC broadcast addresses, the director machine has to be in the same broadcast\ndomain as the top-of-rack router.\n\nUsing Border Gateway Protocol (BGP) also has pros and cons.\nBGP is a complicated protocol.\nIt needs a process/thread to continually tell the router it's alive,\nand the connection is up.\nIt requires more configuration,\nlike what's the IP address of the router and what Address Space the VIP is in,\nand maybe even more,\nif you want to do graceful shutdown,\nand graceful restart.\n\nUsing BGP to attact packets has advantages.\nThe machine(s) don't have to be in the same subnet or broadcast domain as the VIPs or the router.\nA director could possibly run in an adjacent rack.\nVIPs can be arbitrary, instead of having to be in the same subnet as the IP addresses of the compute nodes.\n\nMore than one machine can be a director:\nBGP doesn't make a strict association between 1 IP address and 1 MAC address.\nUsing BGP means that multiple director machines can work in parallel.\nThis gives us much less distruptive fail-over.\nRather than having to detect that a single gratuitiously-ARP-ing machine has failed,\nand either needs to be restarted,\nor the service moved elsewhere,\nsome fraction of ongoing TCP connections fail, the top-of-rack router adjusts its connection hashing,\nand the other director machine(s) get more connections.\n\nThe value relative to the ARP-based director, is horizontal scaling.\nThe cost is another pod running `gobpgd`,\nsome extra config files and command line options,\nand the extra knowledge to [administer and debug](TROUBLESHOOTING.md) `gobgdp`.\n\n### Get packets arriving from anywhere to a compute node\n\nThis load balancer uses [IPVS](http://www.linuxvirtualserver.org/software/ipvs.html)\nto distribute packets to compute nodes.\nThe program is a \"director\", either ARP or BGP based packet attracting\n(`rkt` container running `kube2ipvs director` or `kube2ipvs bgp` respectively).\nThe compute nodes that are routed-to should run pod(s) that are members of a service matched to a VIP:port in RDEI.\nBoth the ARP-using and BPG-using directors calculate IPVS rules based on what kubernetes API\ntells them is the list of nodes in the cluster, the pods running in the cluster,\nwhat service the pods are in, and the VIP:port that matches a given service from RDEI.\nThe IPVS rules receive packets from a VIP:port, send them to the IP address of a compute node.\n\nThe director process creates/deletes/edits IPVS rules.\nThe Linux kernel follows the rules,\nrouting the packets and tracking the connections.\nThe director process can exit, the Linux kernel will keep routing the packets \u0026 etc.\nThe director process exec's an `ipvsadm` process to either create or delete rules\nthat make the kernel route VIP:port to various compute node IP addresses.\nThe code goes to great effort to not have small intervals where no rules are in place:\nit either deletes or adds rules, and it edits rules where the \"weight\"\n(derived from number of pods running on a machine) might have changed.\n\n### Get packets arriving from a VIP:port to a pod\n\nFinally, the last step: getting packets with a VIP:port source address to a pod that\ncan handle them.\nEvery compute node in a RDEI kubernetes cluster has a \"realserver\" process running on it.\nThat's a `rkt` container running `kube2ipvs realserver`.\nThis particular pod listens to Kubernetes API for pods and endpoints, and to RDEI for VIPs, ports and services.\nIt combines information from them so that it can write `iptables` rules\nthat send packets from a VIP:port to a pod that can handle the packets.\nInformation from RDEI determines if a pod can \"handle\" packets from a VIP:port.\nThis gets complicated because the compute nodes also run [Calico](https://docs.projectcalico.org/v2.0/getting-started/kubernetes/),\nwhich sets up inscrutable `iptables` rules to allow intra-cluster communication\non 192.168.x.y addresses.\nThe `iptables` rules that the realserver sets up direct packets from a VIP:port to a pod's IP address,\na 192.168.a.b address assigned by Kuberenetes (or maybe Docker) when the pod starts.\n\nThe realserver does the same sort of work as a director does,\nexcept it does iptables rules, not IPVS rules.\nIt only adds or deletes rules,\nnever leaving an interval where no `iptables` rules exist.\nOn a machine with more than a single pod for a load balanced service,\nthe realserver adds a probability so that multiple packets on a node get their fair share of\nconnections, if not actual CPU-consuming load.\nIf the pod count on a compute node changes, these probabilities get re-calculated.\nFinally, the rules realservers generate a rule\nthat ends up using the Calico rule for \"SNAT\", \"Source Network Address Translation\".\nThis sets the *source* IP address and port of any packets returning from pod to client,\nto the VIP:port being load balanced.\nMAC address remains that of the compute node.\nThe compute node sends packets returning to clients directly to them - Direct Server Return.\nThis makes the return bandwidth from pods to any clients calling on them a boost:\nany data returned does not have to return through the director's IPVS system.\nSince most client requests are small amounts of bytes relative to the returned data,\nthe system works.\n\u003c!-- -A RAVEL-MASQ -j MARK --set-xmark 0x4000/0x4000 --\u003e\n\n## Statistics\n\nThe RDEI Load Balancer emits metrics about its internal state and optionally emits metrics about the traffic that is being load balanced for each configured VIP.\n\n\n```\n    # HELP rdei_lb_channel_depth is a gauge denoting the number of inbound clusterconfig objects in the configchan. a value greater than 1 indicates a potential slowdown or deadlock\n    # TYPE rdei_lb_channel_depth gauge\n    rdei_lb_channel_depth{lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\"} 0\n\n    --\n\n    # HELP rdei_lb_cluster_config_info contains the current cluster config and a sha has of the config\n    # TYPE rdei_lb_cluster_config_info gauge\n    rdei_lb_cluster_config_info{date=\"2019-02-15T00:11:40Z\",info=\"\u003ccurrent-config\u003e\",lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\",sha=\"PcBJZC0Xt/PH+HUFyK0SPQecQuA=\"} 1\n\n    --\n\n    # HELP rdei_lb_flows_count a counter to measure the increase in active tcp and udp connections\n    # TYPE rdei_lb_flows_count counter\n    rdei_lb_flows_count{lb=\"realserver\",namespace=\"cadieuxtest1\",port=\"8012\",port_name=\"http\",protocol=\"TCP\",service=\"nginx\",vip=\"10.54.213.247\"} 0\n\n    --\n\n    # HELP rdei_lb_info version information for rdei lb\n    # TYPE rdei_lb_info gauge\n    rdei_lb_info{arch=\"linux/amd64\",buildDate=\"2019-02-14T23:57:47Z\",commit=\"a7f58c20ae765ca07bcaa0d7a32158c068702799\",configName=\"kube2ipvs\",configNamespace=\"platform-load-balancer\",goVersion=\"go1.11.2\",lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\",startTime=\"2019-02-15T00:11:43Z\",version=\"0.0.0\"} 0\n\n    --\n\n    # HELP rdei_lb_iptables_chain_size is twi guages, one for the inbound/calculated chain size, and one for the configured size.\n    # TYPE rdei_lb_iptables_chain_size gauge\n    rdei_lb_iptables_chain_size{kind=\"applied\",lb=\"bgp\",seczone=\"green-786-10.54.213.128_25\"} 26\n\n    --\n\n    # HELP rdei_lb_iptables_latency_microseconds is a histogram denoting the amount of time it takes to perform various iptables operations. labels for operation save|restore|flush and for outcome error|success\n    # TYPE rdei_lb_iptables_latency_microseconds histogram\n    rdei_lb_iptables_latency_microseconds_bucket{attempts=\"0\",lb=\"bgp\",operation=\"flush\",outcome=\"success\",seczone=\"green-786-10.54.213.128_25\",le=\"100\"} 0\n\n    --\n\n    # HELP rdei_lb_iptables_operation_count is a count of operations performed against iptables and the status\n    # TYPE rdei_lb_iptables_operation_count counter\n    rdei_lb_iptables_operation_count{attempts=\"0\",lb=\"bgp\",operation=\"flush\",outcome=\"success\",seczone=\"green-786-10.54.213.128_25\"} 2\n\n    --\n\n    # HELP rdei_lb_reconfigure_count is a count of reconfiguration events with labels denoting a success|error|noop\n    # TYPE rdei_lb_reconfigure_count counter\n    rdei_lb_reconfigure_count{lb=\"realserver\",outcome=\"complete\",seczone=\"green-786-10.54.213.128_25\"} 1\n\n    --\n\n    # HELP rdei_lb_reconfigure_latency_microseconds is a histogram denoting the amount of time an end-to-end reconfiguration took, split out by labels on the outcome.\n    # TYPE rdei_lb_reconfigure_latency_microseconds histogram\n    rdei_lb_reconfigure_latency_microseconds_bucket{lb=\"realserver\",outcome=\"complete\",seczone=\"green-786-10.54.213.128_25\",le=\"100\"} 0\n\n    --\n\n    # HELP rdei_lb_rx_bytes a counter to measure the bytes received\n    # TYPE rdei_lb_rx_bytes counter\n    rdei_lb_rx_bytes{lb=\"realserver\",namespace=\"cadieuxtest1\",port=\"8012\",port_name=\"http\",protocol=\"TCP\",service=\"nginx\",vip=\"10.54.213.247\"} 0\n\n    --\n\n    # HELP rdei_lb_tcp_state_count A counter variable that measures protocol, port name, namespace, service, state events like rst or synack, and counts for respective event types\n    # TYPE rdei_lb_tcp_state_count counter\n    rdei_lb_tcp_state_count{lb=\"realserver\",namespace=\"cadieuxtest1\",port=\"8012\",port_name=\"http\",protocol=\"TCP\",service=\"nginx\",state_event=\"fin\",vip=\"10.54.213.247\"} 0\n\n    --\n\n    # HELP rdei_lb_tx_bytes a counter to measure the bytes transmitted\n    # TYPE rdei_lb_tx_bytes counter\n    rdei_lb_tx_bytes{lb=\"realserver\",namespace=\"cadieuxtest1\",port=\"8012\",port_name=\"http\",protocol=\"TCP\",service=\"nginx\",vip=\"10.54.213.247\"} 0\n\n    --\n\n    # HELP rdei_lb_watch_backoff_duration returns the current value of the watch backoff duration. a non-1s duration indicates that the backoff is present and the load balancer is unable to communicate with the api server\n    # TYPE rdei_lb_watch_backoff_duration gauge\n    rdei_lb_watch_backoff_duration{lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\"} 1\n\n    --\n\n    # HELP rdei_lb_watch_cluster_config_count is a count of how often a cluster config is regenerated, broken out by event - noop|publis|error\n    # TYPE rdei_lb_watch_cluster_config_count counter\n    rdei_lb_watch_cluster_config_count{event=\"noop\",lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\"} 108553\n\n    --\n\n    # HELP rdei_lb_watch_data_count is a count of data inbound from the kuberntes watch events, broken out by endpoint\n    # TYPE rdei_lb_watch_data_count counter\n    rdei_lb_watch_data_count{endpoint=\"configmaps\",lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\"} 88\n\n    --\n\n    # HELP rdei_lb_watch_init_count is a count of watch init events.\n    # TYPE rdei_lb_watch_init_count counter\n    rdei_lb_watch_init_count{lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\"} 27\n\n    --\n\n    # HELP rdei_lb_watch_init_latency_microseconds is a histogram denoting the amount of time it took to reestablish all of the watches\n    # TYPE rdei_lb_watch_init_latency_microseconds histogram\n    rdei_lb_watch_init_latency_microseconds_bucket{lb=\"realserver\",seczone=\"green-786-10.54.213.128_25\",le=\"100\"} 0\n\n```\n\n\n\n## TODOS:\n\n- rename the stats-enable flag to stats-pcap-enable\n- add validation for the various subcommands\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomcast%2Fravel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcomcast%2Fravel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomcast%2Fravel/lists"}