{"id":13439372,"url":"https://github.com/jepsen-io/jepsen","last_synced_at":"2025-12-12T01:08:35.854Z","repository":{"id":8024797,"uuid":"9433979","full_name":"jepsen-io/jepsen","owner":"jepsen-io","description":"A framework for distributed systems verification, with fault injection","archived":false,"fork":false,"pushed_at":"2025-05-07T21:51:35.000Z","size":47071,"stargazers_count":7038,"open_issues_count":67,"forks_count":729,"subscribers_count":191,"default_branch":"main","last_synced_at":"2025-05-13T11:09:23.837Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jepsen-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"contributing.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2013-04-14T19:20:27.000Z","updated_at":"2025-05-11T22:29:56.000Z","dependencies_parsed_at":"2024-06-18T13:59:31.997Z","dependency_job_id":"6b62f11e-596a-4d40-878f-7f72c4897c70","html_url":"https://github.com/jepsen-io/jepsen","commit_stats":{"total_commits":2578,"total_committers":128,"mean_commits":20.140625,"dds":0.3083785880527541,"last_synced_commit":"9ae4ce3ffe1dc23c19a123bb9df9183c40f91cde"},"previous_names":["aphyr/jepsen"],"tags_count":40,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jepsen-io%2Fjepsen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jepsen-io%2Fjepsen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jepsen-io%2Fjepsen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jepsen-io%2Fjepsen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jepsen-io","download_url":"https://codeload.github.com/jepsen-io/jepsen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253929367,"owners_count":21985802,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:13.366Z","updated_at":"2025-12-12T01:08:30.803Z","avatar_url":"https://github.com/jepsen-io.png","language":"Clojure","funding_links":[],"categories":["HarmonyOS","Clojure","Uncategorized","分布式开发","Papers","\u003ca name=\"Clojure\"\u003e\u003c/a\u003eClojure"],"sub_categories":["Windows Manager","Uncategorized","Verification of Distributed Systems"],"readme":"# Jepsen\n\nBreaking distributed systems so you don't have to.\n\nJepsen is a Clojure library. A test is a Clojure program which uses the Jepsen\nlibrary to set up a distributed system, run a bunch of operations against that\nsystem, and verify that the history of those operations makes sense. Jepsen has\nbeen used to verify everything from eventually-consistent commutative databases\nto linearizable coordination systems to distributed task schedulers. It can\nalso generate graphs of performance and availability, helping you characterize\nhow a system responds to different faults. See\n[jepsen.io](https://jepsen.io/analyses) for examples of the sorts of analyses\nyou can carry out with Jepsen.\n\n[![Clojars Project](https://img.shields.io/clojars/v/jepsen.svg)](https://clojars.org/jepsen)\n[![Build Status](https://travis-ci.com/jepsen-io/jepsen.svg?branch=main)](https://travis-ci.com/jepsen-io/jepsen)\n\n## Design Overview\n\nA Jepsen test runs as a Clojure program on a *control node*. That program uses\nSSH to log into a bunch of *db nodes*, where it sets up the distributed system\nyou're going to test using the test's pluggable *os* and *db*.\n\nOnce the system is running, the control node spins up a set of logically\nsingle-threaded *processes*, each with its own *client* for the distributed\nsystem. A *generator* generates new operations for each process to perform.\nProcesses then apply those operations to the system using their clients. The\nstart and end of each operation is recorded in a *history*. While performing\noperations, a special *nemesis* process introduces faults into the system--also\nscheduled by the generator.\n\nFinally, the DB and OS are torn down. Jepsen uses a *checker* to analyze the\ntest's history for correctness, and to generate reports, graphs, etc. The test,\nhistory, analysis, and any supplementary results are written to the filesystem\nunder `store/\u003ctest-name\u003e/\u003cdate\u003e/` for later review. Symlinks to the latest\nresults are maintained at each level for convenience.\n\n## Documentation\n\nThis [tutorial](doc/tutorial/index.md) walks you through writing a Jepsen test\nfrom scratch.\n\nFor reference, see the [API documentation](http://jepsen-io.github.io/jepsen/).\n\nAn independent translation is available in [Chinese](https://jaydenwen123.gitbook.io/zh_jepsen_doc/).\n\n## Setting up a Jepsen Environment\n\nSo, you've got a Jepsen test, and you'd like to run it! Or maybe you'd like to\nstart learning how to write tests. You've got several options:\n\n### AWS\n\nIf you have an AWS account, you can launch a full Jepsen cluster---control and\nDB nodes---from the [AWS\nMarketplace](https://aws.amazon.com/marketplace/pp/Jepsen-LLC-Jepsen/B01LZ7Y7U0).\nClick \"Continue to Subscribe\", \"Continue to Configuration\", and choose\n\"CloudFormation Template\". You can choose the number of nodes you'd like to\ndeploy, adjust the instance types and disk sizes, and so on. These are full\nVMs, which means they can test clock skew.\n\nThe AWS marketplace clusters come with an hourly fee (generally $1/hr/node),\nwhich helps fund Jepsen development.\n\n### LXC\n\nYou can set up your DB nodes as LXC containers, and use your local machine as\nthe control node. See the [LXC documentation](doc/lxc.md) for guidelines. This\nmight be the easiest setup for hacking on tests: you'll be able to edit source\ncode, run profilers, etc on the local node. Containers don't have real clocks,\nso you generally can't use them to test clock skew.\n\n### VMs, Real Hardware, etc.\n\nYou should be able to run Jepsen against almost any machines which have:\n\n- A TCP network\n- An SSH server\n- Sudo or root access\n\nEach DB node should be accessible from the control node via SSH: you need to be\nable to run `ssh myuser@some-node`, and get a shell. By default, DB nodes are\nnamed n1, n2, n3, n4, and n5, but that (along with SSH username, password,\nidentity files, etc) is all definable in your test, or at the CLI. The account\nyou use on those boxes needs sudo access to set up DBs, control firewalls, etc.\n\nBE ADVISED: tests may mess with clocks, add apt repos, run killall -9 on\nprocesses, and generally break things, so you shouldn't, you know, point Jepsen\nat your prod machines unless you like to live dangerously, or you wrote the\ntest and know exactly what it's doing.\n\nNOTE: Most Jepsen tests are written with more specific requirements in\nmind---like running on Debian, using `iptables` for network manipulation, etc.\nSee the specific test code for more details.\n\n### Docker (Unsupported)\n\nThere is a [Docker Compose setup](/docker) for running a Jepsen cluster on a\nsingle machine. Sadly the Docker platform has been something of a moving\ntarget; this environment tends to break in new and exciting ways on various\nplatforms every few months. If you're a Docker whiz and can get this going\nreliably on Debian \u0026 OS X that's great--pull requests would be a big help.\n\nLike other containers Docker containers don't have real clocks--that means you\ngenerally can't use them to test clock skew.\n\n### Setting Up Control Nodes\n\nFor AWS and Docker installs, your control node comes preconfigured with all the\nsoftware you'll need to run most Jepsen tests. If you build your own control\nnode (or if you're using your local machine as a control node), you'll need a\nfew things:\n\n- A [JVM](https://openjdk.java.net/install/)---version 1.8 or higher.\n- JNA, so the JVM can talk to your SSH.\n- [Leiningen](https://leiningen.org/): a Clojure build tool.\n- [Gnuplot](http://www.gnuplot.info/): how Jepsen renders performance plots.\n- [Graphviz](https://graphviz.org/): how Jepsen renders transactional anomalies.\n\nOn Debian, try:\n\n```\nsudo apt install openjdk-17-jdk libjna-java gnuplot graphviz\n```\n\n... to get the basic requirements in place. Debian's Leiningen packages are\nancient, so [download lein from the web instead](https://leiningen.org/).\n\n## Running a Test\n\nOnce you've got everything set up, you should be able to run `cd aerospike;\nlein test`, and it'll spit out something like\n\n```clj\nINFO  jepsen.core - Analysis invalid! (ﾉಥ益ಥ）ﾉ ┻━┻\n\n{:valid? false,\n :counter\n {:valid? false,\n  :reads\n  [[190 193 194]\n   [199 200 201]\n   [253 255 256]\n   ...}}\n```\n\n## Working With the REPL\n\nJepsen tests emit `.jepsen` files in the `store/` directory. You can use these\nto investigate a test at the repl. Run `lein repl` in the test directory (which\nshould contain `store...`, then load a test using `store/test`:\n\n```clj\nuser=\u003e (def t (store/test -1))\n```\n\n-1 is the last test run, -2 is the second-to-last. 0 is the first, 1 is the\nsecond, and so on. You can also load a by the string directory name. As a handy\nshortcut, clicking on the title of a test in the web interface will copy its\npath to the clipboard.\n\n```clj\nuser=\u003e (def t (store/test \"/home/aphyr/jepsen.etcd/store/etcd append etcdctl kill/20221003T124714.485-0400\"))\n```\n\nThese have the same structure as the test maps you're used to working with in\nJepsen, though without some fields that wouldn't make sense to serialize--no\n`:checker`, `:client`, etc.\n\n```clj\njepsen.etcd=\u003e (:name t)\n\"etcd append etcdctl kill\"\njepsen.etcd=\u003e (:ops-per-key t)\n200\n```\n\nThese test maps are also lazy: to speed up working at the REPL, they won't load\nthe history or results until you ask for them. Then they're loaded from disk\nand cached.\n\n```clj\njepsen.etcd=\u003e (count (:history t))\n52634\n```\n\nYou can use all the usual Clojure tricks to introspect results and histories.\nHere's an aborted read (G1a) anomaly--we'll pull out the ops which wrote and\nread the aborted read:\n\n```clj\njepsen.etcd=\u003e (def writer (-\u003e t :results :workload :anomalies :G1a first :writer))\n#'jepsen.etcd/writer\njepsen.etcd=\u003e (def reader (-\u003e t :results :workload :anomalies :G1a first :op))\n#'jepsen.etcd/reader\n```\n\nThe writer appended 11 and 12 to key 559, but failed, returning a duplicate key\nerror:\n\n```clj\njepsen.etcd=\u003e (:value writer)\n[[:r 559 nil] [:r 558 nil] [:append 559 11] [:append 559 12]]\njepsen.etcd=\u003e (:error writer)\n[:duplicate-key \"rpc error: code = InvalidArgument desc = etcdserver: duplicate key given in txn request\"]\n```\n\nThe reader, however, observed a value for 559 beginning with 12!\n\n```clj\njepsen.etcd=\u003e (:value reader)\n[[:r 559 [12]] [:r 557 [1]]]\n```\n\nLet's find all successful transactions:\n\n```clj\njepsen.etcd=\u003e (def txns (-\u003e\u003e t :history (filter #(and (= :txn (:f %)) (= :ok (:type %)))) (map :value)))\n#'jepsen.etcd/txns\n```\n\nAnd restrict those to just operations which affected key 559:\n\n```clj\njepsen.etcd=\u003e (-\u003e\u003e txns (filter (partial some (comp #{559} second))) pprint)\n([[:r 559 [12]] [:r 557 [1]]]\n [[:r 559 [12]] [:append 559 1] [:r 559 [12 1]]]\n [[:append 556 32]\n  [:r 556 [1 18 29 32]]\n  [:r 556 [1 18 29 32]]\n  [:r 559 [12 1]]]\n [[:r 559 [12 1]]]\n [[:append 559 9] [:r 557 [1 5]] [:r 558 [1]] [:r 558 [1]]]\n [[:r 559 [12 1 9]] [:r 559 [12 1 9]]]\n [[:append 559 17]]\n [[:r 559 [12 1 9 17]] [:append 558 5]]\n [[:r 559 [12 1 9 17]]\n  [:append 557 22]\n  [:append 559 27]\n  [:r 557 [1 5 12 22]]])\n```\n\nSure enough, no OK appends of 12 to key 559!\n\nYou'll find more functions for slicing-and-dicing tests in `jepsen.store`.\n\n## FAQ\n\n### JSCH auth errors\n\nIf you see `com.jcraft.jsch.JSchException: Auth fail`, this means something\nabout your test's `:ssh` map is wrong, or your control node's SSH environment\nis a bit weird.\n\n0. Confirm that you can ssh to the node that Jepsen failed to connect to. Try\n   `ssh -v` for verbose information--pay special attention to whether it uses a\n   password or private key.\n1. If you intend to use a username and password, confirm that they're specified\n   correctly in your test's `:ssh` map.\n2. If you intend to log in with a private key, make sure your SSH agent is\n   running.\n   - `ssh-add -l` should show the key you use to log in.\n   - If your agent isn't running, try launching one with `ssh-agent`.\n   - If your agent shows no keys, you might need to add it with `ssh-add`.\n   - If you're SSHing to a control node, SSH might be forwarding your local\n     agent's keys rather than using those on the control node. Try `ssh -a` to\n     disable agent forwarding.\n\nIf you've SSHed to a DB node already, you might also encounter a jsch bug which\ndoesn't know how to read hashed known_hosts files. Remove all keys for the DB\nhosts from your `known_hosts` file, then:\n\n```sh\nssh-keyscan -t rsa n1 \u003e\u003e ~/.ssh/known_hosts\nssh-keyscan -t rsa n2 \u003e\u003e ~/.ssh/known_hosts\nssh-keyscan -t rsa n3 \u003e\u003e ~/.ssh/known_hosts\nssh-keyscan -t rsa n4 \u003e\u003e ~/.ssh/known_hosts\nssh-keyscan -t rsa n5 \u003e\u003e ~/.ssh/known_hosts\n```\n\nto add unhashed versions of each node's hostkey to your `~/.ssh/known_hosts`.\n\n### SSHJ auth errors\n\nIf you get an exception like `net.schmizz.sshj.transport.TransportException:\nCould not verify 'ssh-ed25519' host key with fingerprint 'bf:4a:...' for 'n1'\non port 22`, but you're sure you've got the keys in your `~/.ssh/known-hosts`,\nthis is because (I think) SSHJ tries to verify only the ed25519 key and\n*ignores* the RSA key. You can add the ed25519 keys explicitly via:\n\n```sh\nssh-keyscan -t ed25519 n1 \u003e\u003e ~/.ssh/known_hosts\n...\n```\n\n## Other Projects\n\nAdditional projects that may be of interest:\n\n- [Jecci](https://github.com/michaelzenz/jecci): A wrapper framework around Jepsen\n- [Porcupine](https://github.com/anishathalye/porcupine): a linearizability checker written in Go.\n- [elle-cli](https://github.com/ligurio/elle-cli): command-line frontend to\n  transactional consistency checkers for black-box databases.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjepsen-io%2Fjepsen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjepsen-io%2Fjepsen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjepsen-io%2Fjepsen/lists"}