{"id":19186827,"url":"https://github.com/juji-io/editscript","last_synced_at":"2025-05-14T11:13:22.493Z","repository":{"id":45174970,"uuid":"123881058","full_name":"juji-io/editscript","owner":"juji-io","description":"A library to diff and patch Clojure/ClojureScript data structures","archived":false,"fork":false,"pushed_at":"2025-01-25T22:45:27.000Z","size":767,"stargazers_count":505,"open_issues_count":14,"forks_count":23,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-05-13T17:59:51.577Z","etag":null,"topics":["algorithm","clojure","clojurescript-data","data","data-diffing","data-structures","diff","editscript","patch","tree-diffing"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/juji-io.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"huahaiy"}},"created_at":"2018-03-05T07:25:33.000Z","updated_at":"2025-05-04T08:54:08.000Z","dependencies_parsed_at":"2024-11-23T13:00:51.632Z","dependency_job_id":"5ab85800-104b-405a-9d74-ce7681092df0","html_url":"https://github.com/juji-io/editscript","commit_stats":{"total_commits":220,"total_committers":5,"mean_commits":44.0,"dds":"0.018181818181818188","last_synced_commit":"4bdefcdf7aa2551dd6d05bb3db36bce2938b148d"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juji-io%2Feditscript","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juji-io%2Feditscript/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juji-io%2Feditscript/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juji-io%2Feditscript/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/juji-io","download_url":"https://codeload.github.com/juji-io/editscript/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254129529,"owners_count":22019628,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","clojure","clojurescript-data","data","data-diffing","data-structures","diff","editscript","patch","tree-diffing"],"created_at":"2024-11-09T11:16:50.672Z","updated_at":"2025-05-14T11:13:22.443Z","avatar_url":"https://github.com/juji-io.png","language":"Clojure","readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"logo.png\" alt=\"editscript logo\" height=\"140\"\u003e\u003c/img\u003e\u003c/p\u003e\n\u003ch1 align=\"center\"\u003eEditscript\u003c/h1\u003e\n\u003cp align=\"center\"\u003e🔦  Diff and patch for Clojure/Clojurescript data. 🧩\t\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://cljdoc.org/d/juji/editscript\"\u003e\u003cimg src=\"https://cljdoc.org/badge/juji/editscript\" alt=\"editscript on cljdoc\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003ca href=\"https://badge.fury.io/js/clj-editscript\"\u003e\u003cimg src=\"https://badge.fury.io/js/clj-editscript.svg\" alt=\"npm\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003ca href=\"https://clojars.org/juji/editscript\"\u003e\u003cimg src=\"https://img.shields.io/clojars/v/juji/editscript.svg?color=sucess\" alt=\"clojars\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/juji-io/editscript/actions\"\u003e\u003cimg src=\"https://github.com/juji-io/editscript/actions/workflows/build.yml/badge.svg?branch=master\" alt=\"editscript build status\"\u003e\u003c/img\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n## :hear_no_evil: What is it?\n\nEditscript is a library designed to extract the differences between two\nClojure/Clojurescript data structures as an \"editscript\", which represents the\nminimal modification necessary to transform one to another.\n\nCurrently, this library can diff and patch any nested Clojure/Clojurescript data\nstructures consisting of regular maps, vectors, lists, sets and values. Custom\ndata can also be handled if you implement our protocols.\n\n## :satisfied: Status\n\nThis library is stable and has been in production use to power the core product\nof [Juji](https://juji.io) for several years now. If you are also using\nEditscript, please drop a line at issue\n[#17](https://github.com/juji-io/editscript/issues/17) so we may make a list of\nusers here:\n\n* [clerk](https://github.com/nextjournal/clerk) uses Editscript to improves\n  usability of synchronised atom by sending a minimal diff from the JVM to the\n  browser, achieving 60fps sync for updates from the browser to the JVM and\n  back.\n* [Evident Systems](https://www.evidentsystems.com/) uses Editscript as the main\n  way of evaluating changes within the convergent reference type in their CRDT\n  library, [Converge](https://github.com/evidentsystems/converge).\n* [microdata.no](https://microdata.no) uses Editscript to sync client state to\n  server so users can pick up their work where they left it.\n* [Oche](https://oche.com) uses Editscript to sync game state between client and server.\n* [Streetlinx](https://streetlinx.com) uses Editscript to capture deltas to\n  drive a newsfeed and generate alerts.\n\n## :tada: Usage\n\nSee my [Clojure/north 2020 Talk](https://youtu.be/n-avEZHEHg8): Data Diffing\nBased Software Architecture Patterns.\n\n```Clojure\n(require '[editscript.core :as e])\n\n;; Here are two pieces of data, a and b\n(def a [\"Hello word\" 24 22 {:a [1 2 3]} 1 3 #{1 2}])\n(def b [\"Hello world\" 24 23 {:a [2 3]} 1 3 #{1 2 3}])\n\n;; compute the editscript between a and b using the default options\n(def d (e/diff a b))\n\n;; look at the editscript\n(e/get-edits d)\n;;==\u003e\n;; [[[0] :r \"Hello world\"] [[2] :r 23] [[3 :a 0] :-] [[6 3] :+ 3]]\n\n;; diff using the quick algorithm and diff the strings by character\n;; there are other string diff levels: :word, :line, or :none (default)\n(def d-q (e/diff a b {:algo :quick :str-diff :character}))\n\n(e/get-edits d-q)\n;;=\u003e\n;; [[[0] :s [9 [:+ \"l\"] 1]] [[2] :r 23] [[3 :a 0] :-] [[6 3] :+ 3]]\n\n;; get the edit distance, i.e. number of edits\n(e/edit-distance d)\n;;==\u003e 4\n\n;; get the size of the editscript, i.e. number of nodes\n(e/get-size d)\n;;==\u003e 23\n\n;; patch a with the editscript to get back b, so that\n(= b (e/patch a d))\n;;==\u003e true\n(= b (e/patch a d-q))\n;;==\u003e true\n\n```\n\nAn Editscript contains a vector of edits, where each edit is a vector of two or\nthree elements.\n\nThe first element of an edit is the path, similar to the path vector in the\nfunction call `update-in`. However, `update-in` only works for associative data\nstructures (map and vector), whereas the editscript works for map, vector, list\nand set alike.\n\nThe second element of an edit is a keyword representing the edit operation,\nwhich is one of `:-` (deletion), `:+` (addition), `:r ` (data replacement) or\n`:s` (string edit).\n\nFor addition and replacement operation, the third element is the value of new data.\n\n```Clojure\n\n;; get the edits as a plain Clojure vector\n(def v (e/get-edits d))\n\nv\n;;==\u003e\n;;[[[0] :r \"Hello world\"] [[2] :r 23] [[3 :a 0] :-] [[6 3] :+ 3]]\n\n;; the plain Clojure vector can be passed around, stored, or modified as usual,\n;; then be loaded back as a new EditScript\n(def d' (e/edits-\u003escript v))\n\n;; the new EditScript works the same as the old one\n(= b (e/patch a d'))\n;;==\u003e true\n\n```\n\n## :green_book: Documentation\n\nPlease see [API Documentation](https://cljdoc.org/d/juji/editscript/CURRENT) for\nmore details.\n\n## :shopping: Alternatives\n\nDepending on your use cases, different libraries in this space may suit you\nneeds better. The `/bench` folder of this repo contains a benchmark comparing\nthe alternatives. The resulting charts of running [the benchmark](https://juji.io/blog/comparing-clojure-diff-libraries/) are included below:\n\n![Diff time benchmark](bench/diff-time-bench.png)\n![Diff size benchmark](bench/diff-size-bench.png)\n\n[deep-diff2](https://github.com/lambdaisland/deep-diff2) applies Wu et al. 1990\n[3] algorithm by first converting trees into linear structures. It is only\nfaster than A\\* algorithm of Editscript. Its results are the largest in size.\nAlthough unable to achieve optimal tree diffing with this approach, it has some\ninteresting use, e.g. visualization. So if you want to visualize the\ndifferences, use deep-diff2. This library does not do patch.\n\n[clojure.data/diff](https://clojuredocs.org/clojure.data/diff) and\n[differ](https://github.com/Skinney/differ) are similar to the quick algorithm\nof Editscript, in that they all do a naive walk-through of the data, so the\ngenerated diff is not going to be optimal.\n\nclojure.data/diff is good for detecting what part of the data have been changed\nand how. But it is slow and the results are also large. It does not do patch\neither.\n\ndiffer looks very good by the numbers in the benchmark. It does patch, is fast\nand the results the smallest (for it doesn't record editing operators).\nUnfortunately, it cuts corners. It fails all the property based tests, even if\nthe tests considered only vectors and maps. Use it if you understand its failing\npatterns and are able to avoid them in your data.\n\nEditscript is designed for data diffing, e.g. data preservation and recovery,\nnot for being looked at by humans. If speed is your primary concern, the quick\nalgorithm of Editscript is the fastest among all the alternatives, and its diff\nsize is reasonably small for the benchmarked data sets. If the diff size is your\nprimary concern, A\\* algorithm is the only available option that guarantees\noptimal data size, but it is also the slowest.\n\n## :zap: Diffing Algorithms\n\nAs mentioned, the library currently implements two diffing algorithms. The\ndefault algorithm produces diffs that are optimal in the number of editing\noperations and the resulting script size. A quick algorithm is also provided,\nwhich does not guarantee optimal results but is very fast.\n\n### A\\* diffing\n\nThis A\\* algorithm aims to achieve optimal diffing in term of minimal size of\nresulting editscript, useful for storage, query and restoration. This is an\noriginal algorithm that has some unique properties: unlike many other general\ntree differing algorithms such as Zhang \u0026 Shasha 1989 [4], our algorithm is\nstructure preserving.\n\nRoughly speaking, the edit distance is defined on sub-trees rather than nodes,\nsuch that the ancestor-descendant relationship and tree traversal order are\npreserved, and nodes in the original tree does not split or merge. These\nproperties are useful for diffing and patching Clojure's immutable data\nstructures because we want to leverage structure sharing and use `identical?`\nreference checks. The additional constraints also yield algorithms with better\nrun time performance than the general ones. Finally, these constraints feel\nnatural for a Clojure programmer.\n\nThe structure preserving properties were proposed in Lu 1979 [1] and Tanaka 1995\n[2]. These papers describe diffing algorithms with O(|a||b|) time and space\ncomplexity. We designed an A\\* based algorithm to achieve some speedup. Instead\nof searching the whole editing graph, we typically search a portion of it along\nthe diagonal.\n\nThe implementation is optimized for speed. Currently the algorithm spent most of\nits running time calculating the cost of next steps, perhaps due to the use of a\nvery generic heuristic. A more specialized heuristic for our case should reduce\nthe number of steps considered. For special cases of vectors and lists\nconsisting of leaves only, we also use the quick algorithm below to enhance the\nspeed.\n\nAlthough much slower than the non-optimizing quick algorithm below, the\nalgorithm is practical for common Clojure data that include lots of maps. Maps\nand sets do not incur the penalty of a large search space in the cases of\nvectors and lists. For a [drawing data\nset](https://github.com/justsml/json-diff-performance), the diffing time is less\nthan 3ms on a 2014 2.8 GHz Core i5 16GB MacBook Pro.\n\n### Quick diffing\n\nThis quick diffing algorithm simply does an one pass comparison of two trees so\nit is very fast. For sequence (vector and list) comparison, we implement Wu et\nal. 1990, an algorithm with O(NP) time complexity, where P is the number of\ndeletions if `b` is longer than `a`.  The same sequence diffing algorithm is\nalso implemented in [diffit](https://github.com/friemen/diffit). Using their\nbenchmark, our implementation has slightly better performance due to more\noptimizations. Keep in mind that our algorithm also handles nested Clojure data\nstructures. Compared  with our A\\* algorithm, our quick algorithm can be up to\ntwo orders of magnitude faster.\n\nThe Wu algorithm does not have replacement operations, and assumes each edit has\na unit cost. These do not work well for tree diffing. Consequently, the quick\nalgorithm does not produce optimal results in term of script size. In principle,\nsimply changing a pointer to point to `b` instead of `a` produces the fastest\n\"diffing\" algorithm of the world, but that is not very useful. The quick\nalgorithm has a similar problem.\n\nFor instances, when consecutive deletions involving nested elements occur in a\nsequence, the generated editscript can be large. For example:\n\n```Clojure\n(def a [2 {:a 42} 3 {:b 4} {:c 29}])\n(def b [{:a 5} {:b 5}])\n\n(diff a b {:algo :quick})\n;;==\u003e\n;;[[[0] :-]\n;; [[0] :-]\n;; [[0] :-]\n;; [[0 :b] :-]\n;; [[0 :a] :+ 5]\n;; [[1 :c] :-]\n;; [[1 :b] :+ 5]]\n\n(diff a b)\n;;==\u003e\n;; [[[] :r [{:a 5} {:b 5}]]]\n\n```\nIn this case, the quick algorithm seems to delete the original and then add\nnew ones back. The reason is that the quick algorithm does not drill down\n(i.e. do replacement) at the correct places. It currently drills down wherever it\ncan. In this particular case, replacing the whole thing produces a smaller diff.\nAn optimizing algorithm is needed if minimal diffs are desired.\n\n## :station: Platform\n\nThe library supports JVM Clojure and Clojurescript. The later has been tested\nwith node, nashorn, chrome, safari, firefox and lumo. E.g. run our test suite:\n\n```bash\n# Run Clojure tests\nlein test\n\n# Run Clojurescript tests on node.js\nlein doo node\n\n# Run Clojurescript tests on chrome\nlein doo chrome browser once\n\n```\n\n## :bulb: Rationale\n\nAt Juji, we send changes of UI states back to server for persistence [see blog\npost](https://juji.io/blog/this-is-how-we-revamped-the-ui-in-less-than-a-month/).\nSuch a use case requires a good diffing library for nested Clojure data\nstructures to avoid overwhelming our storage systems. I have not found such a\nlibrary in Clojure ecosystem, so I implemented my own. Hopefully this little\nlibrary could be of some use to further enhance the Clojure's unique strength of\n[Data-Oriented\nProgramming](https://livebook.manning.com/#!/book/the-joy-of-clojure-second-edition/chapter-14/1).\n\nEditscript is designed with stream processing in mind. An editscript should be\nconceptualized as a chunk in a potentially endless stream of changes. Individual\neditscripts can combine (concatenate) into a larger edistscript. I consider\neditscript as a part of a larger data-oriented effort, that tries to elevate\nthe level of abstraction of data from the granularity of characters, bytes or\nlines to that of maps, sets, vectors, and lists. So instead of talking about\nchange streams in bytes, we can talk about change streams in term of higher\nlevel data structures.\n\n## :roller_coaster: Roadmap\n\nThere are a few things I have some interest in exploring with this library. Of\ncourse, ideas, suggestions and contributions are very welcome.\n\n* Further speed up of the algorithms, e.g. better heuristic, hashing, and so on.\n* Globally optimize an editscript stream.\n\n## :green_book: References\n\n[1] Lu, S. 1979, A Tree-to-tree distance and its application to cluster\nanalysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol.\nPAMI-1 No.2. p219-224\n\n[2] Tanaka, E., 1995, A note on a tree-to-tree editing problem. International\n Journal of Pattern Recognition and Artificial Intelligence. p167-172\n\n[3] Wu, S. et al., 1990, An O(NP) Sequence Comparison Algorithm, Information\nProcessing Letters, 35:6, p317-23.\n\n[4] Zhang, K. and Shasha, D. 1989, Simple fast algorithms for the editing\ndistance between trees and related problems. SIAM Journal of Computing,\n18:1245–1262\n\n\n## License\n\nCopyright © 2018-2025 [Juji, Inc.](https://juji.io)\n\nDistributed under the Eclipse Public License either version 1.0 or (at\nyour option) any later version.\n","funding_links":["https://github.com/sponsors/huahaiy"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuji-io%2Feditscript","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuji-io%2Feditscript","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuji-io%2Feditscript/lists"}