{"id":18319314,"url":"https://github.com/anza-xyz/validator-lab","last_synced_at":"2025-04-05T21:33:33.062Z","repository":{"id":229974188,"uuid":"778084146","full_name":"anza-xyz/validator-lab","owner":"anza-xyz","description":"Deploy and test your new Agave validator features in a kubernetes-based cluster  ","archived":false,"fork":false,"pushed_at":"2024-08-16T22:03:38.000Z","size":2555,"stargazers_count":8,"open_issues_count":6,"forks_count":5,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-03-21T12:57:37.083Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.anza.xyz/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anza-xyz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-27T03:28:22.000Z","updated_at":"2025-01-10T02:13:38.000Z","dependencies_parsed_at":"2024-03-27T07:24:54.385Z","dependency_job_id":"44dee482-6b39-4de2-a23b-061c729f148b","html_url":"https://github.com/anza-xyz/validator-lab","commit_stats":null,"previous_names":["anza-xyz/validator-lab"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anza-xyz%2Fvalidator-lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anza-xyz%2Fvalidator-lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anza-xyz%2Fvalidator-lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anza-xyz%2Fvalidator-lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anza-xyz","download_url":"https://codeload.github.com/anza-xyz/validator-lab/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247406076,"owners_count":20933803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T18:13:00.490Z","updated_at":"2025-04-05T21:33:31.333Z","avatar_url":"https://github.com/anza-xyz.png","language":"Rust","funding_links":[],"categories":["Validator Tools and Resources"],"sub_categories":["Notes"],"readme":"# Validator Lab\n### Deploy Validator Clusters for Testing\n\n#### About\nIn Validator Lab we can deploy and test new validator features quickly and easily. Validator Lab will take your modified validator code and deploy a cluster of validators running in Kubernetes pods on nodes all around the world. This allows us to spin up and tear down over a thousand validators with ease and with little user intervention.\n\n### Disclaimer:\n- This library is a work in progress. It will be built over a series of PRs. See [PROGRESS.md](PROGRESS.md) for roadmap and progress\n\n## How to run\n\n### Setup\nEnsure you have the proper permissions to connect to the Monogon Kubernetes endpoint. Reach out to Leo on slack if you need the key (you do if you haven't asked him in the past).\n\nFrom your local build host, login to Docker for pushing/pulling repos. Currently we just use the users own Docker login. This will likely change in the future.\n```\ndocker login\n```\n\n```\nkubectl create ns \u003cnamespace\u003e\n```\n\n### Run\n#### Build Agave from local agave repo\n```\ncargo run --bin cluster --\n    -n \u003cnamespace\u003e\n    --local-path \u003cpath-to-local-agave-monorepo\u003e\n    --cluster-data-path \u003cpath-to-directory-to-store-cluster-accounts-genesis-etc\u003e\n```\n\n#### Build specific Agave release\n```\ncargo run --bin cluster --\n    -n \u003cnamespace\u003e\n    --release-channel \u003cagave-version: e.g. v1.17.28\u003e # note: MUST include the \"v\"\n    --cluster-data-path \u003cpath-to-directory-to-store-cluster-accounts-genesis-etc\u003e\n```\n\n#### Build specific Agave commit\n```\ncargo run --bin cluster --\n    -n \u003cnamespace\u003e\n    --commit \u003cgit commit: e.g. 8db8e60c48ab064c88a76013597f99c9eb25ed74\u003e # must be full string\n    --github-username \u003cgithub username: e.g. gregcusack\u003e\n    --repo-name \u003crepository to build: e.g. solana | agave. default: solana\u003e\n    --cluster-data-path \u003cpath-to-directory-to-store-cluster-accounts-genesis-etc\u003e\n```\n\n#### Note on `--cluster-data-path`:\n`--cluster-data-path` can just be an empty directory. It will be used to store:\n1) Validator, client, rpc, and faucet account(s)\n2) Genesis\n3) Validator, client, and rpc Dockerfiles\n\nAfter deploying a cluster with a bootstrap, 2 clients, 2 validators, and 3 rpc nodes all running v1.18.13, your `\u003ccluster-data-path\u003e` directory will look something like:\n\n![Cluster Data Path Directory](cluster_data_path_tree.png)\n\n#### Build from Local Repo and Configure Genesis and Bootstrap and Validator Image\nExample:\n```\ncargo run --bin cluster -- \n    -n \u003cnamespace\u003e \n    --local-path /home/sol/solana\n    --cluster-data-path /home/sol/validator-lab-build\n    --num_validators \u003cnumber-of-non-bootstrap-voting-validators\u003e\n    # genesis config. Optional: Many of these have defaults\n    --hashes-per-tick \u003chashes-per-tick\u003e\n    --faucet-lamports \u003cfaucet-lamports\u003e\n    --bootstrap-validator-sol \u003cvalidator-sol\u003e\n    --bootstrap-validator-stake-sol \u003cvalidator-stake\u003e\n    --max-genesis-archive-unpacked-size \u003csize in bytes\u003e\n    --target-lamports-per-signature \u003clamports-per-signature\u003e\n    --slots-per-epoch \u003cslots-per-epoch\u003e\n    # docker config\n    --registry \u003cdocker-registry\u003e        # e.g. gregcusack \n    --base-image \u003cbase-image\u003e           # e.g. ubuntu:20.04\n    --image-name \u003cdocker-image-name\u003e    # e.g. cluster-image\n    # validator config\n    --full-rpc\n    --internal-node-sol \u003cSol\u003e\n    --internal-node-stake-sol \u003cSol\u003e\n    # kubernetes config\n    --cpu-requests \u003ccores\u003e\n    --memory-requests \u003cmemory\u003e\n    # deploy with clients\n    -c \u003cnum-clients\u003e\n    --client-type \u003cclient-type e.g. tpu-client\u003e\n    --client-to-run \u003ctype-of-client e.g. bench-tps\u003e\n    --client-wait-for-n-nodes \u003cwait-for-N-nodes-to-converge-before-starting-client\u003e\n    --bench-tps-args \u003cbench-tps-args e.g. tx-count=25000\u003e\n```\n\n#### Client bench-tps-args\nClient accounts are funded on deployment of the client.\n\nCommand Examples:\nFor client version \u003c 2.0.0 \u0026\u0026 client version \u003e 1.17.0\n```\n--bench-tps-args 'tx-count=5000 keypair-multiplier=4 threads=16 num-lamports-per-account=200000000 sustained tpu-connection-pool-size=8 thread-batch-sleep-ms=0'\n```\n\nFor client Version \u003e= 2.0.0\n```\n--bench-tps-args 'tx-count=5000 keypair-multiplier=4 threads=16 num-lamports-per-account=200000000 sustained tpu-connection-pool-size=8 thread-batch-sleep-ms=0 commitment-config=processed'\n```\n\n## Metrics\n1) Setup metrics database:\n```\n./init-metrics -c \u003cdatabase-name\u003e -u \u003cmetrics-username\u003e\n# enter password when prompted\n```\n2) add the following to your `cluster` command from above\n```\n--metrics-host https://internal-metrics.solana.com # need the `https://` here\n--metrics-port 8086\n--metrics-db \u003cdatabase-name\u003e            # from (1)\n--metrics-username \u003cmetrics-username\u003e   # from (1)\n--metrics-password \u003cmetrics-password\u003e   # from (1)\n```\n\n#### RPC Nodes\nYou can add in RPC nodes. These sit behind a load balancer. Load balancer distributed loads across all RPC nodes and the bootstrap. Set the number of RPC nodes with:\n```\n--num-rpc-nodes \u003cnum-nodes\u003e\n```\n\n## Heterogeneous Clusters\nYou can deploy a cluster with heterogeneous validator versions\nFor example, say you want to deploy a cluster with the following nodes:\n* 1 bootstrap, 3 validators, 1 rpc-node, and 1 client running some agave-repo local commit\n* 5 validators and 4 rpc nodes running v1.18.15\n* 20 clients running v1.18.14\n\nEach set of validators and clients get deployed individually by version. But they will all run in the same cluster\n\n1) Deploy a local cluster as normal:\n   * Specify how many validators, rpc nodes, and clients you want running v1.18.14\n```\ncargo run --bin cluster -- -n \u003cnamespace\u003e --registry \u003cregistry\u003e --local-path /home/sol/solana --num-validators 3 --num-rpc-nodes 1 --cluster-data-path /home/sol/validator-lab-build/ --num-clients 1 --client-type tpu-client --client-to-run bench-tps --bench-tps-args 'tx-count=5000 threads=4 thread-batch-sleep-ms=0'\n```\n2) Deploy a set of 5 validators running a different validator version (e.g. v1.18.15)\n    * Must pass in `--no-bootstrap` so we don't recreate the genesis and deploy another bootstrap\n```\ncargo run --bin cluster -- -n \u003cnamespace\u003e --registry \u003cregistry\u003e --release-channel v1.18.15 --num-validators 5 --num-rpc-nodes 4 --cluster-data-path /home/sol/validator-lab-build/ --no-bootstrap\n```\n3) Deploy the final set of clients running v1.18.14 these 20 clients will load the cluster you deployed in (1) and (2)\n    * Must pass in `--no-bootstrap` so we don't recreate the genesis and deploy another bootstrap\n```\ncargo run --bin cluster -- -n \u003cnamespace\u003e --registry \u003cregistry\u003e --release-channel v1.18.14 --cluster-data-path /home/sol/validator-lab-build/ --num-clients 20 --client-type tpu-client --client-to-run bench-tps --bench-tps-args 'tx-count=10000 threads=16 thread-batch-sleep-ms=0' --no-bootstrap\n```\n\nFor steps (2) and (3), when using `--no-bootstrap`, we assume that the directory at `--cluster-data-path \u003cdirectory\u003e` has the correct genesis, bootstrap identity, and faucet account stored. These are all created in step (1).\n\nNote: We can't deploy heterogeneous clusters across v1.17 and v1.18 due to feature differences. Hope to fix this in the future. Have something where we can specifically define which features to enable.\n\n## Querying the RPC from outside the cluster\nThe cluster now has an external IP/port that can be queried to reach the cluster RPC. The external RPC port will be logged during cluster boot, e.g.:\n```\nDeploying Load Balancer Service with external port: 30000\n```\n1) Get any one of the node IPs in the cluster. Querying the RPC will work with any node IP in the cluster, this includes nodes that are NOT running any of your pods:\n```\nkubectl get nodes -o wide\n```\n2) Run your query. e.g.\n```\ncurl -X POST \\\n-H \"Content-Type: application/json\" \\\n-d '{\n    \"jsonrpc\": \"2.0\",\n    \"id\": 1,\n    \"method\": \"getClusterNodes\"\n    }' \\\nhttp://\u003cnode-ip\u003e:\u003cexternal-port\u003e\n```\nNote: you can deploy any client through validator-lab or just completely separately and have the client send TXs or query this RPC through the `http://\u003cnode-ip\u003e:\u003cexternal-port\u003e`. \n\n## Generic Clients\nBring your own client and deploy it in a Validator Lab cluster!\nAll you need is a containerized version of your client in an accessible docker registry. \n\nKey points/steps:\n1) [Containerize your client](#Containerize-your-Client)\n2) Any client accounts should be built into the client container image\n3) Client arguments are passed in similar to how they are passed into the bench-tps client. For the generic client, use `--generic-client-args`. \n\nFor example, let's assume we have a client sending spam. And it takes the following arguments:\n```\n/home/solana/spammer-executable --target-node \u003ckubernetes_domain_name\u003e:\u003cport\u003e --thread-sleep-ms \u003cms-between-spam-batches\u003e --spam-mode \u003cclient-specific-mode\u003e\n```\nwhere `\u003ckubernetes_domain_name\u003e:\u003cport\u003e` is the domain name and port of the kubernetes service running the validator you want to target. See: [Node Naming Conventions](#kubernetes_domain_name)\n\nWhen we go to deploy the generic client, we deploy it in a similar manner to how we deploy the bench-tps client:\n```\ncargo run --bin cluster -- -n \u003cnamespace\u003e\n...\ngeneric-client --docker-image \u003cclient-docker-image\u003e --executable-path \u003cpath-to-executable-in-docker-image\u003e --delay-start \u003cseconds-after-cluster-is-deployed-before-deploying-client\u003e --generic-client-args 'target-node=\u003ckubernetes_domain_name\u003e:\u003cport\u003e thread-sleep-ms=\u003cms-between-spam-batches\u003e spam-mode=\u003cclient-specific-mode\u003e' \n```\n\n4) Any flag or value the client needs that is cluster specific should be read in from an environment variable. For example, say the client requires the following arguments:\n```\n/home/solana/spammer-executable --target-node \u003ckubernetes_domain_name\u003e:\u003cport\u003e --shred-version \u003cversion\u003e\n```\nShred-version is cluster specific; it is not known when you deploy a cluster. Modify the shred-version argument in the client code to read in the environment variable `SHRED_VERSION` from the host.\nExample:\n```\nlet default_shred_version = env::var(\"SHRED_VERSION\").unwrap_or_else(|_| \"0\".to_string());\n...\n.arg(\n    Arg::with_name(\"shred_version\")\n        .long(\"shred-version\")\n        .takes_value(true)\n        .default_value(\u0026default_shred_version)\n        .help(\"Shred version of cluster to spam\"),\n)\n...\n```\nWhen you deploy a cluster with your client, leave the `--shred-version` command out since it will be read via environment variable:\n```\ncargo run --bin cluster -- -n \u003cnamespace\u003e\n...\ngeneric-client --docker-image \u003cclient-docker-image\u003e --executable-path \u003cpath-to-executable-in-docker-image\u003e --delay-start \u003cseconds-after-cluster-is-deployed-before-deploying-client\u003e --generic-client-args 'target-node=\u003cip:port\u003e' \n```\n\nThe following environment variables are available to each non-bootstrap pod:\n```\nNAMESPACE                   # cluster namespace\nBOOTSTRAP_RPC_ADDRESS       # rpc address of bootstrap node\nBOOTSTRAP_GOSSIP_ADDRESS    # gossip address of bootstrap node\nBOOTSTRAP_FAUCET_ADDRESS    # faucet address of bootstrap node\nSHRED_VERSION               # cluster shred version\n```\n^ More environment variables to come!\n\n\u003ca name=\"kubernetes_domain_name\"\u003e\u003c/a\u003e\n### Node Naming Conventions in Kubernetes\nSay you want to launch your client and send transactions to a specific validator. Kubernetes makes it easy to identify deployed nodes via `\u003ckubernetes_domain_name\u003e:\u003cport\u003e`. Node naming conventions:\n```\n\u003cnode-name\u003e-service.\u003cnamespace\u003e.svc.cluster.local:\u003cport\u003e\n```\ne.g. bootstrap validator RPC port can be reached with:\n```\nbootstrap-validator-service.\u003cnamespace\u003e.svc.cluster.local:8899\n```\nand a standard validator can be reached with:\n```\nvalidator-service-\u003c8-char-commit-or-version\u003e-\u003cvalidator-index\u003e.\u003cnamespace\u003e.svc.cluster.local:\u003cport\u003e\n```\nexamples:\n```\n# w/ commit\nvalidator-service-bd1a5dfb-7.greg.svc.cluster.local:8001\n# or with version\nvalidator-service-v1.18.16-4.greg.svc.cluster.local:8001\n```\nSay you want to deploy your client with `--target-node \u003cvalidator-4\u003e` which is running v1.18.16:\n```\ncargo run --bin cluster -- -n \u003cnamespace\u003e\n...\ngeneric-client --docker-image \u003cregistry\u003e/\u003cimage-name\u003e:\u003ctag\u003e --executable-path \u003cpath-to-executable-in-docker-image\u003e --delay-start \u003cseconds-after-cluster-is-deployed-before-deploying-client\u003e --generic-client-args 'target-node=validator-service-v1.18.16-4.greg.svc.cluster.local:8001' \n```\n\n## Kubernetes Cheatsheet\nCreate namespace:\n```\nkubectl create ns \u003cnamespace\u003e\n```\n\nDelete namespace:\n```\nkubectl delete ns \u003cnamespace\u003e\n```\n\nGet running pods:\n```\nkubectl get pods -n \u003cnamespace\u003e\n```\n\nGet pod logs:\n```\nkubectl logs -n \u003cnamespace\u003e \u003cpod-name\u003e\n```\n\nExec into pod:\n```\nkubectl exec -it -n \u003cnamespace\u003e \u003cpod-name\u003e -- /bin/bash\n```\n\nGet information about pod:\n```\nkubectl describe pod -n \u003cnamespace\u003e \u003cpod-name\u003e\n```\n\n## Containerize your Client\n### Dockerfile Template\n```\nFROM ubuntu:22.04\nRUN apt-get update \u0026\u0026 apt-get install -y iputils-ping curl vim \u0026\u0026 \\\n    rm -rf /var/lib/apt/lists/* \u0026\u0026 \\\n    useradd -ms /bin/bash solana \u0026\u0026 \\\n    adduser solana sudo\n\nUSER solana\nCOPY --chown=solana:solana ./target/release/\u003cclient-executable\u003e /home/solana/\nCOPY --chown=solana:solana ./client-accounts/ /home/solana/client-accounts/\nRUN chmod +x /home/solana/\u003cclient-executable\u003e\nWORKDIR /home/solana\n```\n\n### Build client image\n```\ncd \u003cclient-directory\u003e\ndocker build -t \u003cregistry\u003e/\u003cimage-name\u003e:\u003ctag\u003e -f \u003cpath-to-Dockerfile\u003e/Dockerfile \u003ccontext-path\u003e\n\n# e.g.\ncd client-spam/\ndocker build -t test-registry/client-spam:latest -f docker/Dockerfile .\n```\n\n### Push client image to registry\n```\ndocker push \u003cregistry\u003e/\u003cimage-name\u003e:\u003ctag\u003e\n\n# e.g.\ndocker push test-registry/client-spam:latest\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanza-xyz%2Fvalidator-lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanza-xyz%2Fvalidator-lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanza-xyz%2Fvalidator-lab/lists"}