{"id":15678268,"url":"https://github.com/adammck/ranger","last_synced_at":"2025-05-07T02:29:04.064Z","repository":{"id":39707841,"uuid":"417486250","full_name":"adammck/ranger","owner":"adammck","description":"Generic range-based sharding prototype","archived":false,"fork":false,"pushed_at":"2023-07-18T01:19:55.000Z","size":1038,"stargazers_count":21,"open_issues_count":7,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-07T02:28:51.505Z","etag":null,"topics":["distributed-systems","sharding"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adammck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-15T12:09:10.000Z","updated_at":"2025-03-24T02:37:55.000Z","dependencies_parsed_at":"2024-06-19T06:12:08.325Z","dependency_job_id":"169d0634-12a1-4707-b33f-dc6b3f3dabec","html_url":"https://github.com/adammck/ranger","commit_stats":{"total_commits":418,"total_committers":5,"mean_commits":83.6,"dds":0.009569377990430672,"last_synced_commit":"ba277721da5bda6b160068885a7eec8bb37633fc"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adammck%2Franger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adammck%2Franger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adammck%2Franger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adammck%2Franger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adammck","download_url":"https://codeload.github.com/adammck/ranger/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252801243,"owners_count":21806282,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-systems","sharding"],"created_at":"2024-10-03T16:19:20.754Z","updated_at":"2025-05-07T02:29:04.017Z","avatar_url":"https://github.com/adammck.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ranger\n\nRanger is a generic range-based sharding framework, inspired by Facebook's\n[Shard Manager][sm] and Google's [Slicer][slcr]. It provides a client library\n(rangelet) for sharded services to embed, a controller (rangerd) to assign key\nranges to them, and a client (rangerctl) to interact with the cluster. It's\ndesigned in particular to support stateful workloads, which need to move around\nlarge amounts of data in order to rebalance, but should be useful to stateless\nworkloads wanting key affinity, too.\n\nThe goal of Ranger is for generic automatic sharding to be the default choice:\nit should be easier to build and operate an automatically sharded service using\nRanger than one which is manually sharded. I'm working on it because I'm\nacquainted with many systems which would benefit from such a thing existing.\n\n## Examples\n\n- [cache](examples/cache)\n- [key-value store](examples/kv)\n\n## Usage\n\n### Terminology\n\nThe terms used by Ranger are standard, but necessarily vague:\n\n- **Key**: The atomic unit of sharding, which cannot be split further. An opaque\n  string of arbitrary bytes.\n- **Keyspace**: The set of every possible key, from negative to positive\n  infinity.\n- **Range**: A set of keys from _start_ (inclusive) to _end_ (exclusive) within\n  a keyspace.\n- **Node**: Any service implementing the [Node interface](#interface) and\n  being discoverable by the controller.\n- **Placement**: An instance of a Range on a Node. Depending on the replication\n  config, might be the only instance of a range, or might be one of many.\n\n### Interface\n\nServices implement the Node interface, via [Go](pkg/api/node.go) or\n[gRPC](pkg/proto/node.proto#L9):\n\n- `Prepare(RangeMeta, []Parent) error`  \n  Prepare to own the given range. This can take as long as necessary, while the\n  node allocates relevant data structures, fetches state, replays logs, warms up\n  caches, etc. `Meta` contains the range ID, start key, and end key.\n- `Activate(RangeID) error`  \n  Accept ownership of the given range. This will only be received after the node\n  has finished preparing or (in error conditions) after deactivating the range.\n  It should be fast and unlikely to fail.\n- `Deactivate(RangeID) error`  \n  Relinquish ownership of the given range. This will only be received for active\n  ranges. It should be fast and unlikely to fail, and where possible, easy to\n  roll back.\n- `Drop(RangeID) error`  \n  Forget the given range, and release any resources associated with it. All of\n  the keys in the range have been successfully activated on some other node(s),\n  so this node can safely discard it.\n- `GetLoadInfo(RangeID) (LoadInfo, error)`  \n  Return standard information about how much load the given range is exerting on\n  the node, and optionally suggest where to split it.\n\nThe `RangeMeta`, `Parent`, `RangeID`, and `LoadInfo` types are pretty simple,\nand can be found in the [pkg/api](pkg/api) package. Once a service implements\nthese methods (and makes itself discoverable), the Ranger controller will take\ncare of assigning ranges of keys, including the tricky workflows of moving,\nsplitting, and joining ranges, and automatically recovering when individual\nsteps fail.\n\nThis is a Go interface, but it's all gRPC+protobufs under the hood. There are no\nother implementations today, but it's a goal to avoid doing anything which would\nmake it difficult to implement Rangelets in other languages.\n\n### Example\n\n```golang\n// locking and error handling omitted for brevity\n// see examples dir for more complete examples\n\nfunc main() {\n  node := MyService{}\n  srv := grpc.NewServer(opts...)\n  rglt := rangelet.New(node, srv)\n  srv.Serve()\n}\n\ntype MyService struct {\n  data [api.RangeID]PerRangeData\n}\n\ntype PerRangeData struct {\n  hits map[api.Key]uint64\n  writable bool\n}\n\nfunc (s *MyService) Prepare(meta api.Meta, parents []api.Parent) error {\n  data := PerRangeData{\n    hits: map[api.Key]uint64{},\n  }\n  // Load data here; maybe from a recent S3 snapshot.\n  s.data[meta.Ident] = data\n  return nil\n}\n\nfunc (s *MyService) Activate(rID api.RangeID) error {\n  // Complete data load here; maybe fetch delta from S3 or other node.\n  s.data[rID].writable = true\n  return nil\n}\n\nfunc (s *MyService) Deactivate(rID api.RangeID) error {\n  // Prepare for range to be activated on some other node; maybe stop accepting\n  // writes and push delta since last snapshot to S3.\n  s.data[rID].writable = false\n  return nil\n}\n\nfunc (s *MyService) Drop(rID api.RangeID) error {\n  // Discard data here; it's been activated on some other node.\n  delete(s.data, rID)\n  return nil\n}\n\nfunc (s *MyService) GetLoadInfo(rID api.RangeID) (api.LoadInfo, error) {\n  // Return stats about a range, so controller can decide when to split/join it.\n  return api.LoadInfo{\n    Keys: len(s.data[rID]),\n  }, nil\n}\n```\n\n### Client\n\nRanger includes a command line client, `rangerctl`, which is a thin wrapper\naround the gRPC interface to the controller. This is currently the primary\nmeans of inspecting and balancing data across a cluster.\n\n```console\n$ ./rangerctl -h\nUsage: ./rangerctl [-addr=host:port] \u003caction\u003e [\u003cargs\u003e]\n\nAction and args must be one of:\n  - ranges\n  - range \u003crangeID\u003e\n  - nodes\n  - node \u003cnodeID\u003e\n  - move \u003crangeID\u003e [\u003cnodeID\u003e]\n  - split \u003crangeID\u003e \u003cboundary\u003e [\u003cnodeID\u003e] [\u003cnodeID\u003e]\n  - join \u003crangeID\u003e \u003crangeID\u003e [\u003cnodeID\u003e]\n\nFlags:\n  -addr string\n        controller address (default \"localhost:5000\")\n  -request\n        print gRPC request instead of sending it\n```\n\nHere are some typical examples of using `rangerctl` to move data around. The\noutputs are shown from an [R1](pkg/ranje/replication_config.go#L34) service,\nwhich only wants a single active replica of each key. Production services\ngenerally want more than that.\n\n**Show the ident of each node known to the controller**:  \n(Nodes can identify themselves however they like, since actual communication\nhappens via host:port, not ident. But idents generally come straight from\nservice discovery.)\n\n```console\n$ rangerctl nodes | jq -r '.nodes[].node.ident'\nfoo\nbar\nbaz\n```\n\n**Move range 101 to node bar**:  \n(See [the docs](docs/move.md) to find out what happens when any of these steps\nfail.)\n\n```console\n$ rangerctl move 101 bar\nR101-P1: PsPending -\u003e PsInactive\nR101-P0: PsActive -\u003e PsInactive\nR101-P1: PsInactive -\u003e PsActive\nR101-P0: PsInactive -\u003e PsDropped\n```\n\n**Split range 202 at key `beefcafe`**:  \n(As above, see [the docs](docs/split.md) for info on failure and recovery. This\none is tricky!)\n\n```console\n$ rangerctl split 1 beefcafe\nR101: RsActive -\u003e RsSubsuming\nR102: nil -\u003e RsActive\nR102-P0: PsPending -\u003e PsInactive\nR103-P0: PsPending -\u003e PsInactive\nR101-P0: PsActive -\u003e PsInactive\nR102-P0: PsInactive -\u003e PsActive\nR103-P0: PsInactive -\u003e PsActive\nR101-P0: PsInactive -\u003e PsDropped\nR101: RsSubsuming -\u003e RsObsolete\n```\n\n## Design\n\n![ranger-diagram-v1](https://user-images.githubusercontent.com/19543/167534758-82124dab-c12e-4920-869c-63165160dffb.png)\n\nHere's how the controller works, at a high level.  \nThe main components are:\n\n- **Keyspace**: Stores the desired state of ranges and placements. Provides an\n  interface to create new ranges by splitting and joining. (Ranges cannot\n  currently be destroyed; only obsoleted, in case the history is needed.)\n  Provides an interface to create and destroy placements, in order to designate\n  which node(s) each range should be placed on.\n- **Roster**: Watches (external) service discovery to maintain a list of nodes\n  (on which ranges can be placed). Relays messages from other components (e.g.\n  the orchestrator) to the nodes. Periodically probes those nodes to monitor\n  their health, and the state of the ranges placed on them. For now, provides an\n  interface for other components to find a node suitable for range placement.\n- **Orchestrator**: Reconciles the difference between the desired state (from\n  the keyspace) and the current state (from the roster), somewhat like a\n  Kubernetes controller.\n- **Rangelet**: Runs inside of nodes. Receives RPCs from the roster, and calls\n  methods of the rangelet.Node interface to notify nodes of changes to the set\n  of ranges placed on them. Provides some useful helper methods to simplify node\n  development.\n- **Balancer**: External component. Simple implementation(s) provided, but can\n  be replaced for more complex services. Fetches state of nodes, ranges, and\n  placements from orchestrator, and sends split and join RPCs in order to spread\n  ranges evenly across nodes.\n\nBoth **Persister** and **Discovery** are simple interfaces to pluggable storage\nsystems. Only Consul is supported for now, but adding support for other systems\n(e.g. ZooKeeper, etcd) should be easy enough in future.\n\nThe **green boxes** are storage nodes. These are implemented entirely (except\nthe rangelet) by the service owner, to perform the _actual work_ that Ranger is\nsharding and balancing. Services may receive their data via HTTP or RPC, and so\nmay provide a client library to route requests to the appropriate node(s), or\nmay forward requests between themselves. (Ranger doesn't provide any help with\nthat part today, but likely will in future.) Alternatively, services may pull\nrelevant work from e.g. a message queue.\n\nFor example node implementations, see the [examples](/examples) directory.  \nFor more complex examples, read the _Slicer_ and _Shard Manager_ papers.\n\n### State Machines\n\nRanger has three state machines: [RangeState][rs], [PlacementState][ps], and\n[RemoteState][ns].\n\n#### RangeState\n\nRanges are simple. They are born Active, become Subsuming when they are split or\njoined, and then become Obsolete once the split/join operation is completed.\nThere are no backwards transitions; to keep the keyspace history linear, once a\nrange begins begin subsumed, there is no turning back. (But note that the\ntransition may take as long as necessary, and the _placements_ may be rolled\nback to recover from failures. But they will eventually be rolled forwards again\nto complete the operation.)\n\n```mermaid\nstateDiagram-v2\n    direction LR\n    [*] --\u003e RsActive\n    RsActive --\u003e RsSubsuming\n    RsSubsuming --\u003e RsObsolete\n    RsObsolete --\u003e [*]\n```\n\nThese states are owned by the Keyspace in the controller, and persisted across\nrestarts by the Persister. It would be a catastrophe to lose the range state.\n\n#### PlacementState\n\nPlacements are more complex, because this state machine is really the core of\nRanger. To maximize availability and adherence to the replication config, the\nKeyspace, Orchestrator, and Actuator components carefully coordinate these\nstate changes and convey them to the remote nodes via their Rangelets.\n\n```mermaid\nstateDiagram-v2\n    direction LR\n    [*] --\u003e PsPending\n    PsPending --\u003e PsMissing\n    PsInactive --\u003e PsMissing\n    PsActive --\u003e PsMissing\n    PsMissing --\u003e PsDropped\n    PsPending --\u003e PsInactive: Prepare\n    PsInactive --\u003e PsActive: Activate\n    PsActive --\u003e PsInactive: Deactivate\n    PsInactive --\u003e PsDropped: Drop\n    PsDropped --\u003e [*]\n```\n\nNote that the PsMissing state is an odd one here, because most other states can\ntransition into it with no command RPC (Activate, Drop, etc) being involved. It\nhappens when a placement is expected to be in a state, but the Rangelet reports\nthat the node doesn't have it. This may be because of a bug whatever, but the\norchestrator responds by moving the placement into PsMissing so it can be\nreplaced.\n\n#### RemoteState\n\nIn addition to the controller-side placement state, the Roster keeps track of\nthe **remote state** of each placement, which is whatever the Rangelet says it\nis. This one isn't a real state machine: there's no enforcement at all, so any\nstate can transition into any other. (The normal/expected transitions are shown\nbelow.) We allow this mostly to ensure that the controller can handle unexpected\ntransitions, e.g. if a node unilaterally decides to drop a placement because\nit's overloaded.\n\n```mermaid\nstateDiagram-v2\n    direction LR\n    [*] --\u003e NsPreparing: Prepare\n    NsPreparing --\u003e NsInactive\n    NsInactive --\u003e NsActivating: Activate\n    NsActivating --\u003e NsActive\n    NsActive --\u003e NsDeactivating: Deactivate\n    NsDeactivating --\u003e NsInactive\n    NsInactive --\u003e NsDropping: Drop\n    NsDropping --\u003e NsNotFound\n    NsNotFound --\u003e [*]\n```\n\nNote that the remote states are a superset of placement states, but include\nintermediate states like `NsActivating`. These are used to signal back to the\ncontroller that, for example, the Rangelet has called the `Activate` method but\nit hasn't returned yet.\n\nThese states are owned by the Rangelet on each node, and are reported back to\nthe controller in response to command RPCs (Prepare, Activate, etc) and periodic\nstatus probes. They're cached by the Roster, and are not currently persisted\nbetween controller restarts.\n\n## Development\n\nTo run the tests:\n\n```sh\n$ bin/test.sh\n```\n\nOr use [act][] to run the [CI checks][ci] locally.\n\n## Related Work\n\nI've taken ideas from most of these systems. I'll expand this doc soon to\nclarify what came from each. But for now, here are some links:\n\n- [Shard Manager][sm] (Facebook, 2021)\n- [Monarch](https://www.vldb.org/pvldb/vol13/p3181-adams.pdf) (Google, 2020)\n- [Service Fabric](https://dl.acm.org/doi/pdf/10.1145/3190508.3190546) (Microsoft, 2018)\n- [Slicer][slcr] (Google, 2016)\n- [Ringpop](https://ringpop.readthedocs.io/en/latest/index.html) (Uber, 2016)\n- [Helix](https://sci-hub.ru/10.1145/2391229.2391248) (LinkedIn, 2012)\n\n## License\n\nMIT\n\n[sm]: https://dl.acm.org/doi/pdf/10.1145/3477132.3483546\n[slcr]: https://www.usenix.org/system/files/conference/osdi16/osdi16-adya.pdf\n[rs]: pkg/api/range_state.go\n[ps]: pkg/api/placement_state.go\n[ns]: pkg/api/remote_state.go\n[act]: https://github.com/nektos/act\n[ci]: https://github.com/adammck/ranger/blob/master/.github/workflows/go.yml\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadammck%2Franger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadammck%2Franger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadammck%2Franger/lists"}