{"id":13612594,"url":"https://github.com/segmentio/asm","last_synced_at":"2025-05-14T05:10:46.372Z","repository":{"id":39872034,"uuid":"357698637","full_name":"segmentio/asm","owner":"segmentio","description":"Go library providing algorithms optimized to leverage the characteristics of modern CPUs","archived":false,"fork":false,"pushed_at":"2023-11-07T21:27:52.000Z","size":445,"stargazers_count":888,"open_issues_count":11,"forks_count":37,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-04-11T00:05:19.017Z","etag":null,"topics":["arm","assembler","assembly","avo","branch-prediction","go","golang","simd","x86"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit-0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/segmentio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-13T21:57:03.000Z","updated_at":"2025-04-10T20:26:36.000Z","dependencies_parsed_at":"2022-07-09T12:00:24.385Z","dependency_job_id":"c1a39daa-a8da-4dc4-9683-2363d3a75267","html_url":"https://github.com/segmentio/asm","commit_stats":{"total_commits":294,"total_committers":11,"mean_commits":"26.727272727272727","dds":0.5204081632653061,"last_synced_commit":"d7e16ffc289d92df6e2c3df0ab55c67460825ac6"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segmentio%2Fasm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segmentio%2Fasm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segmentio%2Fasm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/segmentio%2Fasm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/segmentio","download_url":"https://codeload.github.com/segmentio/asm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254076850,"owners_count":22010611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arm","assembler","assembly","avo","branch-prediction","go","golang","simd","x86"],"created_at":"2024-08-01T20:00:32.136Z","updated_at":"2025-05-14T05:10:46.327Z","avatar_url":"https://github.com/segmentio.png","language":"Go","funding_links":[],"categories":["Go","Repositories"],"sub_categories":[],"readme":"# asm ![build status](https://github.com/segmentio/asm/actions/workflows/go.yml/badge.svg) [![GoDoc](https://godoc.org/github.com/segmentio/asm?status.svg)](https://godoc.org/github.com/segmentio/asm)\n\nGo library providing algorithms that use the full power of modern CPUs to get\nthe best performance.\n\n## Motivation\n\nThe cloud makes it easier than ever to access large scale compute capacity,\nand it's become common to run distributed systems deployed across dozens or\nsometimes hundreds of CPUs. Because projects run on so many cores now, program\nperformance and efficiency matters more today than it has ever before.\n\nModern CPUs are complex machines with performance characteristics that may\nvary by orders of magnitude depending on how they are used. Features like\nbranch prediction, instruction reordering, pipelining, or caching are all\ninput variables that determine the compute throughput that a CPU can achieve.\nWhile compilers keep being improved, and often employ micro-optimizations that\nwould be counter-productive for human developers to be responsible for, there\nare limitations to what they can do, and Assembly still has a role to play in\noptimizing algorithms on hot code paths of large scale applications.\n\nSIMD instruction sets offer interesting opportunities for software engineers.\nTaking advantage of these instructions often requires rethinking how the program\nrepresents and manipulates data, which is beyond the realm of optimizations that\ncan be implemented by a compiler. When renting CPU time from a Cloud provider,\nprograms that fail to leverage the full sets of instructions available are\ntherefore paying for features they do not use.\n\nThis package aims to provide such algorithms, optimized to leverage advanced\ninstruction sets of modern CPUs to maximize throughput and take the best\nadvantage of the available compute power. Users of the package will find\nfunctions that have often been designed to work on **arrays of values**,\nwhich is where SIMD and branchless algorithms shine.\n\nThe functions in this library have been used in high throughput production\nenvironments at Segment, we hope that they will be useful to other developers\nusing Go in performance-sensitive software.\n\n## Usage\n\nThe library is composed of multiple Go packages intended to act as logical\ngroups of functions sharing similar properties:\n\n| Package | Purpose |\n| ------- | ------- |\n| [ascii](ascii) | library of functions designed to work on ASCII inputs |\n| [base64](base64) | standard library compatible base64 encodings |\n| [bswap](bswap) | byte swapping algorithms working on arrays of fixed-size items |\n| [cpu](cpu) | definition of the ABI used to detect CPU features |\n| [mem](mem) | functions operating on byte arrays |\n| [qsort](qsort) | quick-sort implementations for arrays of fixed-size items |\n| [slices](slices) | functions performing computations on pairs of slices |\n| [sortedset](sortedset) | functions working on sorted arrays of fixed-size items |\n\nWhen no assembly version of a function is available for the target platform,\nthe package provides a generic implementation in Go which is automatically\npicked up by the compiler.\n\n## Showcase\n\nThe purpose of this library being to improve the runtime efficiency of Go\nprograms, we compiled a few snapshots of benchmark runs to showcase the\nkind of improvements that these code paths can expect from leveraging\nSIMD and branchless optimizations:\n\n```\ngoos: darwin\ngoarch: amd64\ncpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz\n```\n\n```\npkg: github.com/segmentio/asm/ascii\nname                  old time/op    new time/op     delta\nEqualFoldString/0512     276ns ± 1%       21ns ± 2%    -92.50%  (p=0.008 n=5+5)\n\nname                  old speed      new speed       delta\nEqualFoldString/0512  3.71GB/s ± 1%  49.44GB/s ± 2%  +1232.79%  (p=0.008 n=5+5)\n```\n\n```\npkg: github.com/segmentio/asm/bswap\nname    old time/op    new time/op     delta\nSwap64    11.2µs ± 1%      0.9µs ± 9%    -92.06%  (p=0.008 n=5+5)\n\nname    old speed      new speed       delta\nSwap64  5.83GB/s ± 1%  73.67GB/s ± 9%  +1162.98%  (p=0.008 n=5+5)\n```\n\n```\npkg: github.com/segmentio/asm/qsort\nname            old time/op    new time/op     delta\nSort16/1000000     269ms ± 2%       46ms ± 3%   -83.08%  (p=0.008 n=5+5)\n\nname            old speed      new speed       delta\nSort16/1000000  59.4MB/s ± 2%  351.2MB/s ± 3%  +491.24%  (p=0.008 n=5+5)\n```\n\n## Maintenance\n\nThe assembly code is generated with [AVO](https://github.com/mmcloughlin/avo),\nand orchestrated by a Makefile which helps maintainers rebuild the assembly\nsource code when the AVO files are modified.\n\nThe repository contains two Go modules; the main module is declared as\n`github.com/segmentio/asm` at the root of the repository, and the second\nmodule is found in the `build` subdirectory.\n\nThe `build` module is used to isolate build dependencies from programs that\nimport the main module. Through this mechanism, AVO does not become a\ndependency of programs using `github.com/segmentio/asm`, keeping the\ndependency management overhead minimal for the users, and allowing\nmaintainers to make modifications to the `build` package.\n\nVersioning of the two modules is managed independently; while we aim to provide\nstable APIs on the main package, breaking changes may be introduced on the\n`build` package more often, as it is intended to be ground for more experimental\nconstructs in the project.\n\n### Requirements\n\nSome libraries have custom purpose code for both amd64 and arm64. Others (qsort)\nhave only amd64. Search for a `.s` file matching your architecture to be sure\nyou are using the assembler optimized library instructions.\n\nThe Go code requires Go 1.17 or above. These versions contain significant\nperformance improvements compared to previous Go versions.\n\n`asm` version v1.1.5 and earlier maintain compatibility with Go 1.16.\n\n### purego\n\nPrograms in the `build` module should add the following declaration:\n\n```go\nfunc init() {\n\tConstraintExpr(\"!purego\")\n}\n```\n\nIt instructs AVO to inject the `!purego` tag in the generated files, allowing\nthe libraries to be compiled without any assembly optimizations with a build\ncommand such as:\n\n```\ngo build -tags purego ...\n```\n\nThis is mainly useful to compare the impact of using the assembly optimized\nversions instead of the simpler Go-only implementations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsegmentio%2Fasm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsegmentio%2Fasm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsegmentio%2Fasm/lists"}