{"id":16915860,"url":"https://github.com/zesterer/vm-perf","last_synced_at":"2025-03-23T17:30:45.381Z","repository":{"id":66134660,"uuid":"556913284","full_name":"zesterer/vm-perf","owner":"zesterer","description":"Performance comparisons between various virtual interpreter implementation strategies","archived":false,"fork":false,"pushed_at":"2024-04-06T22:36:01.000Z","size":40,"stargazers_count":41,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-18T22:10:10.206Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zesterer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-24T18:52:51.000Z","updated_at":"2025-03-17T15:51:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"44651655-80e4-497f-be12-7bccc6aff0cc","html_url":"https://github.com/zesterer/vm-perf","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zesterer%2Fvm-perf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zesterer%2Fvm-perf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zesterer%2Fvm-perf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zesterer%2Fvm-perf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zesterer","download_url":"https://codeload.github.com/zesterer/vm-perf/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245140719,"owners_count":20567432,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T19:23:07.623Z","updated_at":"2025-03-23T17:30:45.086Z","avatar_url":"https://github.com/zesterer.png","language":"Rust","readme":"# VM Performance Comparison\n\nThis repository exists as an accessible benchmark comparison between various strategies for implementing interpreters.\n\nThe benchmarks are not particularly scientific. Take them with a pinch of salt.\n\n## Benchmarks\n\nBenchmarks were performed on a 16 core AMD Ryzen 7 3700X.\n\n```\ntest bytecode_closures_compile           ... bench:         440 ns/iter (+/- 5)\ntest bytecode_closures_execute           ... bench:     272,928 ns/iter (+/- 6,565)\n\ntest bytecode_compile                    ... bench:         170 ns/iter (+/- 4)\ntest bytecode_execute                    ... bench:     133,028 ns/iter (+/- 18,037)\n\ntest closure_continuations_compile       ... bench:         400 ns/iter (+/- 12)\ntest closure_continuations_execute       ... bench:      38,501 ns/iter (+/- 1,162)\n\ntest closure_stack_continuations_compile ... bench:         407 ns/iter (+/- 13)\ntest closure_stack_continuations_execute ... bench:      55,571 ns/iter (+/- 800)\n\ntest closures_compile                    ... bench:         348 ns/iter (+/- 41)\ntest closures_execute                    ... bench:      80,409 ns/iter (+/- 547)\n\ntest register_closures_compile           ... bench:         158 ns/iter (+/- 3)\ntest register_closures_execute           ... bench:      83,567 ns/iter (+/- 3,280)\n\ntest stack_closures_compile              ... bench:         501 ns/iter (+/- 14)\ntest stack_closures_execute              ... bench:     274,446 ns/iter (+/- 3,623)\n\ntest tape_closures_compile               ... bench:         146 ns/iter (+/- 8)\ntest tape_closures_execute               ... bench:     199,621 ns/iter (+/- 1,932)\n\ntest tape_continuations_compile          ... bench:         148 ns/iter (+/- 2)\ntest tape_continuations_execute          ... bench:      42,476 ns/iter (+/- 1,298)\n\ntest walker_compile                      ... bench:           0 ns/iter (+/- 0)\ntest walker_execute                      ... bench:     242,722 ns/iter (+/- 6,891)\n\n\n\ntest rust_execute                        ... bench:      17,104 ns/iter (+/- 4,848)\ntest rust_opt_execute                    ... bench:           1 ns/iter (+/- 0)\n```\n\n`rust_execute` and `rust_opt_execute` are 'standard candles', implemented in native Rust code. The former has very few\noptimisations applied, whereas the latter is permitted to take advantage of the full optimising power of LLVM.\n\nThe fastest technique appears to be [`closure_continuations`](#closure_continuations). It manages to achieve very\nrespectable performance, coming within spitting difference of (deoptimised) native code.\n\n## Setup\n\nEach technique has two stages:\n\n- Compilation: The technique is given an expression AST and is permitted to generate whatever program it needs from it\n\n- Execution: The technique is given the program and told to run the program to completion\n\nFor the sake of a fair comparison, I've tried to avoid any techniques taking advantage of the structure of the AST to\nimprove performance.\n\nThe AST provided to the techniques is conceptually simple. The only data types are integers, the only arithmetic\ninstruction is addition, and the only control flow is `while`. Locals exist and can be created and mutated. Programs\nalso get provided a series of arguments at execution time to parameterise their execution.\n\n## Techniques\n\n### `walker`\n\nA simple AST walker. Compilation is an identity function. AST evaluation is done by recursively matching on AST nodes.\n\n### `bytecode`\n\nA naive stack 'bytecode' interpreter. Compilation takes the AST and translates it into a list of instructions. Execution\noperates upon the stack, pushing and popping values.\n\n### `closures`\n\nUses simple indirect threading, 'compiling' the entire program into a deeply nested closure. Execution simply evaluates\nthe closure.\n\n### `closure_continuations`\n\nShares much of the simplicity of `closures`, but passes the next instruction to be performed - if any - as a continuation,\nallowing for tail-call optimisation (TCO) to occur in a substantial number of cases.\n\n### `closure_stack_continuations`\n\nJust like `closure_continuations`, except it uses a stack to pass values around. This can improve the ability to perform\ntail-call optimisations (TCO), at the cost of needing to touch memory when manipulating values. It's possible that some\ncombination of both approaches might hit an even nicer sweet spot.\n\n### `bytecode_closures`\n\nA mix between `bytecode` and `closures`. The AST is compiled down to a series of instruction-like closures, which are\nthen executed in a loop and indexed via an instruction pointer.\n\n### `stack_closures`\n\nLike `closures`, except intermediate values are maintained on a `Vec` stack rather than the hardware stack of the\nclosures.\n\n### `tape_closures`\n\nLike `closures`, except each closure is permitted no environment at compilation time, and instead fetches it from a tape\nof static data at execution time.\n\n### `register_closures`\n\nLike `closures`, except the 2 highest most recently created locals are passed through the closures as arguments, rather\nthan being maintained on the locals stack.\n\n### `tape_continuations`\n\nSimilar to `tape_closures`, except the next function to be executed is called from within the previous, allowing the\ncompiler to perform TCO (Tail Call Optimisation) on the function. This significantly reduces the stack-bashing that\nneeds to occur to set up each function, resulting in a very significant performance boost: at the cost of complexity.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzesterer%2Fvm-perf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzesterer%2Fvm-perf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzesterer%2Fvm-perf/lists"}