{"id":13450079,"url":"https://github.com/AdamNiederer/faster","last_synced_at":"2025-03-23T16:30:57.507Z","repository":{"id":37601534,"uuid":"109561324","full_name":"AdamNiederer/faster","owner":"AdamNiederer","description":"SIMD for humans","archived":false,"fork":false,"pushed_at":"2023-08-29T20:41:53.000Z","size":464,"stargazers_count":1574,"open_issues_count":26,"forks_count":51,"subscribers_count":31,"default_branch":"master","last_synced_at":"2025-03-23T06:12:33.504Z","etag":null,"topics":["cross-platform","intrinsics","optimization","simd"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AdamNiederer.png","metadata":{"files":{"readme":"README.org","changelog":"CHANGELOG.org","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-11-05T07:55:44.000Z","updated_at":"2025-03-20T16:05:00.000Z","dependencies_parsed_at":"2024-01-13T17:20:09.325Z","dependency_job_id":"bf914f4c-7109-4465-87b2-5676c0c4c15a","html_url":"https://github.com/AdamNiederer/faster","commit_stats":{"total_commits":297,"total_committers":14,"mean_commits":"21.214285714285715","dds":"0.18518518518518523","last_synced_commit":"0795fa67ec1e94ae18264cc6bd1006e7682509ec"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdamNiederer%2Ffaster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdamNiederer%2Ffaster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdamNiederer%2Ffaster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdamNiederer%2Ffaster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AdamNiederer","download_url":"https://codeload.github.com/AdamNiederer/faster/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245130704,"owners_count":20565697,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cross-platform","intrinsics","optimization","simd"],"created_at":"2024-07-31T07:00:29.674Z","updated_at":"2025-03-23T16:30:57.478Z","avatar_url":"https://github.com/AdamNiederer.png","language":"Rust","readme":"* faster\n  #+BEGIN_HTML\n    \u003cdiv\u003e\n      \u003ca href=\"https://crates.io/crates/faster\"\u003e\n        \u003cimg src=\"https://img.shields.io/crates/v/faster.svg\" alt=\"crates.io\" /\u003e\n      \u003c/a\u003e\n      \u003ca href=\"https://travis-ci.org/AdamNiederer/faster\"\u003e\n        \u003cimg src=\"https://travis-ci.org/AdamNiederer/faster.svg?branch=master\" alt=\"Build Status\"/\u003e\n      \u003c/a\u003e\n    \u003c/div\u003e\n  #+END_HTML\n\n** SIMD for Humans\nEasy, powerful, portable, absurdly fast numerical calculations. Includes static\ndispatch with inlining based on your platform and vector types, zero-allocation\niteration, vectorized loading/storing, and support for uneven collections.\n\nIt looks something like this:\n#+BEGIN_SRC rust\n  use faster::*;\n  \n  let lots_of_3s = (\u0026[-123.456f32; 128][..]).simd_iter()\n      .simd_map(f32s(0.0), |v| {\n          f32s(9.0) * v.abs().sqrt().rsqrt().ceil().sqrt() - f32s(4.0) - f32s(2.0)\n      })\n      .scalar_collect();\n#+END_SRC\n\nWhich is analogous to this scalar code:\n#+BEGIN_SRC rust\n  let lots_of_3s = (\u0026[-123.456f32; 128][..]).iter()\n      .map(|v| {\n          9.0 * v.abs().sqrt().sqrt().recip().ceil().sqrt() - 4.0 - 2.0\n      })\n      .collect::\u003cVec\u003cf32\u003e\u003e();\n#+END_SRC\n\nThe vector size is entirely determined by the machine you're compiling for - it\nattempts to use the largest vector size supported by your machine, and works on\nany platform or architecture (see below for details).\n\nCompare this to traditional explicit SIMD:\n#+BEGIN_SRC rust\n  use std::mem::transmute;\n  use stdsimd::{f32x4, f32x8};\n\n  let lots_of_3s = \u0026mut [-123.456f32; 128][..];\n\n  if cfg!(all(not(target_feature = \"avx\"), target_feature = \"sse\")) {\n      for ch in init.chunks_mut(4) {\n          let v = f32x4::load(ch, 0);\n          let scalar_abs_mask = unsafe { transmute::\u003cu32, f32\u003e(0x7fffffff) };\n          let abs_mask = f32x4::splat(scalar_abs_mask);\n          // There isn't actually an absolute value intrinsic for floats - you\n          // have to look at the IEEE 754 spec and do some bit flipping\n          v = unsafe { _mm_and_ps(v, abs_mask) };\n          v = unsafe { _mm_sqrt_ps(v) };\n          v = unsafe { _mm_rsqrt_ps(v) };\n          v = unsafe { _mm_ceil_ps(v) };\n          v = unsafe { _mm_sqrt_ps(v) };\n          v = unsafe { _mm_mul_ps(v, 9.0) };\n          v = unsafe { _mm_sub_ps(v, 4.0) };\n          v = unsafe { _mm_sub_ps(v, 2.0) };\n          f32x4::store(ch, 0);\n      }\n  } else if cfg!(all(not(target_feature = \"avx512\"), target_feature = \"avx\")) {\n      for ch in init.chunks_mut(8) {\n          let v = f32x8::load(ch, 0);\n          let scalar_abs_mask = unsafe { transmute::\u003cu32, f32\u003e(0x7fffffff) };\n          let abs_mask = f32x8::splat(scalar_abs_mask);\n          v = unsafe { _mm256_and_ps(v, abs_mask) };\n          v = unsafe { _mm256_sqrt_ps(v) };\n          v = unsafe { _mm256_rsqrt_ps(v) };\n          v = unsafe { _mm256_ceil_ps(v) };\n          v = unsafe { _mm256_sqrt_ps(v) };\n          v = unsafe { _mm256_mul_ps(v, 9.0) };\n          v = unsafe { _mm256_sub_ps(v, 4.0) };\n          v = unsafe { _mm256_sub_ps(v, 2.0) };\n          f32x8::store(ch, 0);\n      }\n  }\n#+END_SRC\nEven with all of that boilerplate, this still only supports x86-64 machines with\nSSE or AVX - and you have to look up each intrinsic to ensure it's usable for\nyour compilation target.\n** Upcoming Features\nA rewrite of the iterator API is upcoming, as well as internal changes to better\nmatch the direction Rust is taking with explicit SIMD.\n** Compatibility\nFaster currently supports any architecture with floating point support, although\nhardware acceleration is only enabled on machines with x86's vector extensions.\n** Performance\nHere are some extremely unscientific benchmarks which, at least, prove that this\nisn't any worse than scalar iterators. Even on ancient CPUs, a lot of\nperformance can be extracted out of SIMD.\n\n#+BEGIN_SRC shell\n  $ RUSTFLAGS=\"-C target-cpu=ivybridge\" cargo bench # host is ivybridge; target has AVX\n  test tests::base100_enc_scalar    ... bench:       1,307 ns/iter (+/- 45)\n  test tests::base100_enc_simd      ... bench:         332 ns/iter (+/- 10)\n  test tests::determinant2_scalar   ... bench:         486 ns/iter (+/- 8)\n  test tests::determinant2_simd     ... bench:         215 ns/iter (+/- 3)\n  test tests::determinant3_scalar   ... bench:         389 ns/iter (+/- 6)\n  test tests::determinant3_simd     ... bench:         209 ns/iter (+/- 3)\n  test tests::map_fill_simd         ... bench:         835 ns/iter (+/- 12)\n  test tests::map_scalar            ... bench:       6,963 ns/iter (+/- 117)\n  test tests::map_simd              ... bench:         879 ns/iter (+/- 18)\n  test tests::map_uneven_simd       ... bench:         884 ns/iter (+/- 10)\n  test tests::nop_scalar            ... bench:          49 ns/iter (+/- 0)\n  test tests::nop_simd              ... bench:          34 ns/iter (+/- 0)\n  test tests::reduce_scalar         ... bench:       6,905 ns/iter (+/- 107)\n  test tests::reduce_simd           ... bench:         839 ns/iter (+/- 13)\n  test tests::reduce_uneven_simd    ... bench:         838 ns/iter (+/- 11)\n  test tests::zip_nop_scalar        ... bench:         824 ns/iter (+/- 18)\n  test tests::zip_nop_simd          ... bench:         231 ns/iter (+/- 5)\n  test tests::zip_scalar            ... bench:         901 ns/iter (+/- 29)\n  test tests::zip_simd              ... bench:       1,128 ns/iter (+/- 12)\n\n  RUSTFLAGS=\"-C target-cpu=x86-64\" cargo bench # host is ivybridge; target has SSE2\n  test tests::base100_enc_scalar    ... bench:         760 ns/iter (+/- 11)\n  test tests::base100_enc_simd      ... bench:         492 ns/iter (+/- 2)\n  test tests::determinant2_scalar   ... bench:         477 ns/iter (+/- 3)\n  test tests::determinant2_simd     ... bench:         277 ns/iter (+/- 1)\n  test tests::determinant3_scalar   ... bench:         380 ns/iter (+/- 3)\n  test tests::determinant3_simd     ... bench:         285 ns/iter (+/- 2)\n  test tests::map_fill_simd         ... bench:       1,797 ns/iter (+/- 8)\n  test tests::map_scalar            ... bench:       7,237 ns/iter (+/- 51)\n  test tests::map_simd              ... bench:       1,879 ns/iter (+/- 12)\n  test tests::map_uneven_simd       ... bench:       1,878 ns/iter (+/- 9)\n  test tests::nop_scalar            ... bench:          47 ns/iter (+/- 0)\n  test tests::nop_simd              ... bench:          34 ns/iter (+/- 0)\n  test tests::reduce_scalar         ... bench:       7,021 ns/iter (+/- 39)\n  test tests::reduce_simd           ... bench:       1,801 ns/iter (+/- 8)\n  test tests::reduce_uneven_simd    ... bench:       1,734 ns/iter (+/- 9)\n  test tests::zip_nop_scalar        ... bench:         803 ns/iter (+/- 9)\n  test tests::zip_nop_simd          ... bench:         257 ns/iter (+/- 1)\n  test tests::zip_scalar            ... bench:         988 ns/iter (+/- 6)\n  test tests::zip_simd              ... bench:         629 ns/iter (+/- 5)\n\n  $ RUSTFLAGS=\"-C target-cpu=pentium\" cargo bench # host is ivybridge; this only runs the polyfills!\n  test tests::bench_determinant2_scalar ... bench:         427 ns/iter (+/- 2)\n  test tests::bench_determinant2_simd   ... bench:         402 ns/iter (+/- 1)\n  test tests::bench_determinant3_scalar ... bench:         354 ns/iter (+/- 1)\n  test tests::bench_determinant3_simd   ... bench:         593 ns/iter (+/- 1)\n  test tests::bench_map_scalar          ... bench:       7,195 ns/iter (+/- 28)\n  test tests::bench_map_simd            ... bench:       6,271 ns/iter (+/- 22)\n  test tests::bench_map_uneven_simd     ... bench:       6,288 ns/iter (+/- 22)\n  test tests::bench_nop_scalar          ... bench:          38 ns/iter (+/- 0)\n  test tests::bench_nop_simd            ... bench:          69 ns/iter (+/- 0)\n  test tests::bench_reduce_scalar       ... bench:       7,004 ns/iter (+/- 17)\n  test tests::bench_reduce_simd         ... bench:       6,063 ns/iter (+/- 17)\n  test tests::bench_reduce_uneven_simd  ... bench:       6,107 ns/iter (+/- 11)\n  test tests::bench_zip_nop_scalar      ... bench:         623 ns/iter (+/- 2)\n  test tests::bench_zip_nop_simd        ... bench:         289 ns/iter (+/- 1)\n  test tests::bench_zip_scalar          ... bench:         972 ns/iter (+/- 3)\n  test tests::bench_zip_simd            ... bench:         621 ns/iter (+/- 3)\n#+END_SRC\n","funding_links":[],"categories":["Rust","Cool"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAdamNiederer%2Ffaster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAdamNiederer%2Ffaster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAdamNiederer%2Ffaster/lists"}