{"id":18976292,"url":"https://github.com/quickwit-oss/chitchat","last_synced_at":"2025-05-14T10:10:03.449Z","repository":{"id":39866950,"uuid":"433280934","full_name":"quickwit-oss/chitchat","owner":"quickwit-oss","description":"Cluster membership protocol with failure detection inspired by Cassandra and DynamoDB","archived":false,"fork":false,"pushed_at":"2025-03-19T10:05:59.000Z","size":512,"stargazers_count":299,"open_issues_count":26,"forks_count":49,"subscribers_count":17,"default_branch":"main","last_synced_at":"2025-04-13T04:55:57.438Z","etag":null,"topics":["anti-entropy","gossip-protocol","membership"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/quickwit-oss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-11-30T03:34:27.000Z","updated_at":"2025-04-03T15:15:07.000Z","dependencies_parsed_at":"2023-12-22T06:33:41.521Z","dependency_job_id":"97c8153d-ada2-4358-a448-a11c82a9ebe6","html_url":"https://github.com/quickwit-oss/chitchat","commit_stats":{"total_commits":65,"total_committers":9,"mean_commits":7.222222222222222,"dds":0.6153846153846154,"last_synced_commit":"cfaa7bbaa800c68d05548ad7dcf9666a1e62d329"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fchitchat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fchitchat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fchitchat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quickwit-oss%2Fchitchat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/quickwit-oss","download_url":"https://codeload.github.com/quickwit-oss/chitchat/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254120170,"owners_count":22017953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anti-entropy","gossip-protocol","membership"],"created_at":"2024-11-08T15:23:30.798Z","updated_at":"2025-05-14T10:10:03.419Z","avatar_url":"https://github.com/quickwit-oss.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# chitchat\n\nThis crate is used at the core of Quickwit for\n- cluster membership\n- failure detection\n- sharing configuration, and extra metadata values\n\nThe idea of relying on scuttlebutt reconciliation and phi-accrual detection is borrowed from Cassandra, itself borrowing it from DynamoDB.\n\nA anti-entropy gossip algorithm called scuttlebutt is in charge of spreading\na common state to all nodes.\n\nThis state is actually divided into namespaces associated to each node.\nLet's call them node state.\n\nA node can only edit its own node state.\n\nRather than sending the entire state, the algorithm makes it possibly to\nonly transfer updates or deltas of the state.\nIn addition, delta can be partial in order to fit a UDP packet.\n\nAll nodes keep updating an heartbeat key,\nso that any node should keep receiving updates from about\nany live nodes.\n\nNot receiving any update from node for a given amount of time can therefore be\nregarded as a sign of failure. Rather than using a hard threshold,\nwe use phi-accrual detection to dynamically compute a threshold.\n\nWe also abuse `chitchat` in Quickwit and use it like a reliable broadcast,\nwith different caveats.\n\n# References\n\n- ScuttleButt paper: https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf\n- Phi Accrual error detection: https://www.researchgate.net/publication/29682135_The_ph_accrual_failure_detector\n- Cassandra details:\nhttps://www.youtube.com/watch?v=FuP1Fvrv6ZQ\n- https://docs.datastax.com/en/articles/cassandra/cassandrathenandnow.html\n- https://github.com/apache/cassandra/blob/f5fb1b0bd32b5dc7da13ec66d43acbdad7fe9dbf/src/java/org/apache/cassandra/gms/Gossiper.java#L1749\n\n# Heartbeat\n\nIn order to get a constant flow of updates to feed into phi-accrual detection,\nchitchat's node state includes a key-value called `heartbeat`. The heartbeat of a given node,  starts at 0, and is incremented once after each round of gossip initiated.\n\nNodes then report all heartbeat updates to a phi-accrual detector to\nassess the liveness of this node. Liveness is a local concept. Every single\nnode computes its own vision of the liveness of all other nodes.\n\n# KV deletion\n\nThe deletion of a KV is a just another type of mutation: it is\nassociated with a version, and replicated using the same mechanism as a KV update.\n\nThe library will then interpret this versioned tombstone before exposing kv to\nthe user.\n\nTo avoid keeping deleted KV indefinitely, the library includes a GC mechanism.\nEvery tombstone is associated with a monotonic timestamp.\nIt is local in the sense that it is computed locally to the given node, and never shared with other servers.\n\nAll KV with a timestamp older than a given `marked_for_deletion_grace_period` will be deleted upon delete operations. (Note for a given KV, GC can happen at different\ntimes on different nodes.)\n\nThis yields the following problem. If a node was disconnected for more than\n`marked_for_deletion_grace_period`, they could have missed the deletion of a KV and never be aware of it.\n\nTo address this problem, nodes keep a record of the version of the last KV they\nhave GCed. Here is how it works:\n\nLet's assume a Node A sends a Syn message to a Node B. The digest expresses that A want for updates about Node N with a version stricly greater than `V`.\nNode B will compare the version `V` of the digest with its `max_gc_version` for the node N.\n\nIf `V \u003e max_gc_version`, Node B knows that no GC has impacted Key values with a version above V. It can safely emit a normal delta to A.\n\nIf however V is older, a GC could have been executed. Instead of sending a delta to Node A, Node B will instruct A to reset its state.\n\nNode A will then wipe-off whatever information it has about N, and will start syncing from a blank state.\n\n# Node deletion\n\nIn Quickwit, we also use chitchat as a \"reliable broadcast with caveats\".\nThe idea of reliable broadcast is that the emission of a message is supposed\nto eventually be received by all or none of the correct nodes. Here, a node is called \"correct\" if it does not fail at any point during its execution.\n\nOf course, if the emitter starts failing before emitting its message, one cannot expect the message to reach anyone.\nHowever, if at least one correct nodes receives the message, it will\neventually reach all correct nodes (assuming the node stays correct).\n\nFor this reason, we keep emitting KVs from dead nodes too.\n\nTo avoid keeping the state of dead nodes indefinitely, we make\na very important trade off.\n\nIf a node is marked as dead for more than `DEAD_NODE_GRACE_PERIOD`, we assume that its state can be safely removed from the system. The grace period is\ncomputed from the last time we received an update from the dead node.\n\nJust deleting the state is of course impossible. After the given `DEAD_NODE_GRACE_PERIOD / 2`, we will mark the dead node as `ScheduledForDeletion`.\n\nWe first stop sharing data about nodes in the `ScheduledForDeletion` state,\nnor listing them node in our digest.\n\nWe also ignore any updates received about the dead node. For simplification, we do not even keep track of the last update received. Eventually, all the nodes of the cluster will have marked the dead node as `ScheduledForDeletion`.\n\nAfter another `DEAD_NODE_GRACE_PERIOD` / 2 has elapsed since the last update received, we delete the dead node state.\n\nIt is important to set `DEAD_NODE_GRACE_PERIOD` with a value such `DEAD_NODE_GRACE_PERIOD / 2` is much greater than the period it takes to detect a faulty node.\n\nNote that we are here breaking the reliable broadcast nature of chitchat.\nNew nodes joining after `DEAD_NODE_GRACE_PERIOD` for instance, will never know about the state of the dead node.\n\nAlso, if a node was disconnected from the cluster for more than `DEAD_NODE_GRACE_PERIOD / 2` and reconnects, it is likely to spread information\nabout the dead node again. Worse, it could not know about the deletion\nof some specific KV and spread them again.\n\nThe chitchat library does not include any mechanism to prevent this from happening. They should however eventually get deleted (after a bit more than `DEAD_NODE_GRACE_PERIOD`) if the node is really dead.\n\nIf the node is alive, it should be able to fix everyone's state via reset or regular delta.\n\n\u003c!--\nAlternative, more concise naming / explanation:\n\nNode deletion\n\nHeartbeats are fed into a phi-accrual detector.\nDetector tells live nodes from failed nodes apart.\nFailed nodes are GCed after GC_GRACE_PERIOD.\nReliable broadcast\n\nIn order to ensure reliable broadcast, we must propagate info about failed nodes for some time shorter than GC_GRACE_PERIOD before deleting them.\nTo do so, failed nodes are split into two categories: zombie and dead.\nFirst, upon failure, failed nodes become zombie nodes, and we keep sharing data about them.\nAfter ZOMBIE_GRACE_PERIOD, zombie nodes transition to dead nodes, and we stop sharing data about them.\nZOMBIE_GRACE_PERIOD is set to GC_GRACE_PERIOD / 2\n--\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquickwit-oss%2Fchitchat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquickwit-oss%2Fchitchat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquickwit-oss%2Fchitchat/lists"}