{"id":31737341,"url":"https://github.com/mathaou/jai-simpd","last_synced_at":"2025-10-09T09:12:07.218Z","repository":{"id":212148356,"uuid":"730825596","full_name":"mathaou/jai-simpd","owner":"mathaou","description":"A Simple SIMD Library For Jai","archived":false,"fork":false,"pushed_at":"2023-12-15T17:31:52.000Z","size":763,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-12-15T18:29:50.219Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mathaou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-12-12T18:54:54.000Z","updated_at":"2023-12-15T07:11:16.000Z","dependencies_parsed_at":"2023-12-15T18:40:20.554Z","dependency_job_id":null,"html_url":"https://github.com/mathaou/jai-simpd","commit_stats":null,"previous_names":["mathaou/jai-simpd"],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/mathaou/jai-simpd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathaou%2Fjai-simpd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathaou%2Fjai-simpd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathaou%2Fjai-simpd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathaou%2Fjai-simpd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mathaou","download_url":"https://codeload.github.com/mathaou/jai-simpd/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mathaou%2Fjai-simpd/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001124,"owners_count":26083021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-09T09:09:20.519Z","updated_at":"2025-10-09T09:12:07.213Z","avatar_url":"https://github.com/mathaou.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Simpd\n\n## Simple SIMD in Jai\n\nAs of compiler version 0.1.078 and the current state of the codebase, here's the ranking on average time to complete multiple runs of 2048 random numbers and each op. See `tests/benchmark_convert.jai` to run yourself.\n\nRan on AMD Ryzen 7 7730U, single thread.\n\n| OP | CPU | AVX2 | AVX | SSE | Best |\n| --- | --- | --- | --- | --- | --- |\n| Convert | 7404ns | 2054ns | 2054ns | 2776ns | AVX2 + AVX |\n| Divide | 32250ns | 244839ns | 198692ns | 19256ns | SSE |\n| Add | 11602ns | 359394ns | 358683ns | 6161ns | SSE |\n| Subtract | 11642ns | 356789ns | 357971ns | 6502ns | SSE |\n| Multiply | 12243ns | 597991ns | 714280ns | 9728ns | SSE |\n\nI know that AVX + LLVM is an area that is currently being worked on by compiler team, so I would expect these numbers to shift over time. In the mean time, unless you're converting, the SSE backing should give you a speed boost.\n\n**SIMD may generally improve performance across the board compared to loop unrolling, but different backings may be better or worse than each other depending on the task and data. Going with newest instructions (AVX2) does not mean it will perform better across the board compared to SSE (earliest this library supports). There are instances where SSE still outperforms AVX2. Do your homework if performance is critical to you. The goal of this library is to provide a reasonable speed increase for common operations overall, not to achieve optimal performance.**\n\n---\n\nAll two parameter functions use the last parameter as the destination to store the result.\n\n```\nwe put result here\n~~~~~~~~~~~~~~~~~~~~~~~~v\nsimd_add :: (src: []$T, dst: []T)\n```\n\nOne parameter functions are performed **in place**.\n\n#### Hardware Support Table\n\n\u003e **If an instruction does not have an AVX, AVX2, or SSE backing, `Simpd` will instead use loop unrolling.** If a user wanted to detect a program trying to use a `Simpd` function on a data type that didn't have a hardware accellerated backing, the `@simd_unsupported_op` note can be caught by a metaprogram. See `tests/metaprogram_test.jai` for details.\n\nThe API of these functions may be made to reject any unsupported types instead of just reporting it as a note to the metaprogram. This aspect of the design is under review and open to discussion.\n\n##### Arithmetic\n\n| Instruction | float32 | float64 | i8/u8 | i16/u16 | i32/u32 | i64/u64 |\n| --- | --- | --- | --- | --- | --- | --- |\n| `simd_add :: (src: []$T, dst: []T, simd_type := SIMD_Type.cpu)` | AVX+AVX2+SSE| AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE |\n| `simd_subtract :: (src: []$T, dst: []T, simd_type := SIMD_Type.cpu)` | AVX+AVX2+SSE| AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE |\n| `simd_multiply :: (src: []$T, dst: []T, simd_type := SIMD_Type.cpu)` | AVX+AVX2+SSE| AVX+AVX2+SSE |  | AVX+AVX2+SSE | AVX+AVX2+SSE |  |\n| `simd_divide :: (src: []$T, dst: []T, simd_type := SIMD_Type.cpu)` | AVX+AVX2+SSE| AVX+AVX2+SSE |  | |  |  |\n\u003c!-- | `simd_reciprocal` | AVX+AVX2+SSE| |  | |  |  |\n| `simd_reciprocal_root` | AVX+AVX2+SSE| |  | |  |  |\n| `simd_root` | AVX+AVX2+SSE| |  | |  |  |\n| `simd_average` | | | AVX+AVX2+SSE | AVX+AVX2+SSE |  |  | --\u003e\n\u003c!--\n##### Bit Manipulation\n\n| Instruction | float32 | float64 | i8/u8 | i16/u16 | i32/u32 | i64/u64 |\n| --- | --- | --- | --- | --- | --- | --- |\n| `simd_shift_left` | AVX+AVX2+SSE | AVX+AVX2+SSE | | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE |\n| `simd_shift_right` | AVX+AVX2+SSE | AVX+AVX2+SSE | | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE |\n| `simd_rotate_left` | AVX+AVX2+SSE | AVX+AVX2+SSE | | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE |\n| `simd_rotate_right`| AVX+AVX2+SSE | AVX+AVX2+SSE | | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | --\u003e\n\u003c!--\n##### Comparisons and Validity\n\n| Instruction | float32 | float64 | i8/u8 | i16/u16 | i32/u32 | i64/u64 | Note |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| `simd_equal` | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE |  Sets destination to 1 if equal, 0 otherwise |\n| `simd_not_equal` | AVX+AVX2+SSE | AVX+AVX2+SSE | | |  |  | Sets destination to 1 if dst != src, 0 otherwise |\n| `simd_greater` | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE |  Sets destination to 1 if dst \u003e src, 0 otherwise |\n| `simd_greater_or_equal` | AVX+AVX2+SSE | AVX+AVX2+SSE | | |  |  | Sets destination to 1 if dst \u003e= src, 0 otherwise |\n| `simd_less` | AVX+AVX2+SSE | AVX+AVX2+SSE | | |  |  | Sets destination to 1 if dst \u003c src, 0 otherwise |\n| `simd_less_or_equal` | AVX+AVX2+SSE | AVX+AVX2+SSE | | |  |  | Sets destination to 1 if dst \u003c= src, 0 otherwise |\n| `simd_nan` | AVX+AVX2+SSE | AVX+AVX2+SSE | | |  |  | Sets destination to 1 if dst or src is NaN, 0 otherwise |\n| `simd_valid` | AVX+AVX2+SSE | AVX+AVX2+SSE | | |  |  | Sets destination to 1 if dst and src is not NaN, 0 otherwise |\n| `simd_max` | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | |  Sets destination to 1 if dst \u003e src, 0 otherwise |\n| `simd_min` | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | AVX+AVX2+SSE | |  Sets destination to 1 if dst \u003e src, 0 otherwise | --\u003e\n\n##### Conversions\n\n```simd_convert_to :: (src: []$T1, target: $T2, simd_type := SIMD_Type.cpu) -\u003e dst: []$T2```\n\n`✔️` for fully supported by all backings and empty for `loop unroll`\n\n|src/dst| float64 | float32 | s64 | s32 | s16 | s8 | u64 | u32 | u16 | u8 |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| float64   | No-op |✔️| |✔️|| | ||| |\n| float32   |✔️| No-op | |✔️| | | || | | |\n| s64       | | | No-op || | | | | | | |\n| s32       |✔️|✔️|✔️| No-op | | | | | | |\n| s16       | | |✔️|✔️| No-op | | || | | |\n| s8        | | |✔️|✔️|✔️| No-op | || | | |\n| u64       | | | | | | | No-op | | | | |\n| u32       ||| | | | |✔️| No-op || |\n| u16       | | | || | |✔️|✔️| No-op ||\n| u8        | | | || | |✔️|✔️|✔️| No-op |\n\n##### Logical\n\n| Instruction | Any |\n| --- | --- |\n| `simd_clear :: (dst: []$T, simd_type := SIMD_Type.cpu)` | AVX2+AVX+SSE |\n| `simd_and :: (src: []$T, dst: []T, simd_type := SIMD_Type.cpu)` | AVX2+AVX+SSE |\n\u003c!-- | `simd_or` | AVX2+AVX+SSE |\n| `simd_xor` | AVX2+AVX+SSE |\n| `simd_nand` | AVX2+AVX+SSE |\n| `simd_copy` | AVX2+AVX+SSE | --\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathaou%2Fjai-simpd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmathaou%2Fjai-simpd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathaou%2Fjai-simpd/lists"}