Projects in Awesome Lists tagged with avx512

https://github.com/simdjson/simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

aarch64 arm arm64 avx2 avx512 c-plus-plus clang clang-cl cpp11 gcc-compiler json json-parser json-pointer loongarch neon simd sse42 vs2019 x64

Last synced: 29 Sep 2024

https://github.com/hjlebbink/asm-dude

Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window

assembler assembly assembly-language-programming avx2 avx512 code-completion disassembly masm nasm syntax-highlighting visual-studio visual-studio-extension x86-64

Last synced: 30 Sep 2024

https://github.com/HJLebbink/asm-dude

Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window

assembler assembly assembly-language-programming avx2 avx512 code-completion disassembly masm nasm syntax-highlighting visual-studio visual-studio-extension x86-64

Last synced: 01 Aug 2024

https://github.com/google/highway

Performance-portable, length-agnostic SIMD with runtime dispatch

avx avx-512 avx-instructions avx2 avx512 intrinsics neon simd simd-instructions simd-intrinsics simd-library simd-parallelism simd-programming sse42 wasm

Last synced: 30 Sep 2024

https://github.com/oneapi-src/onednn

oneAPI Deep Neural Network Library (oneDNN)

aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture

Last synced: 30 Sep 2024

https://github.com/oneapi-src/oneDNN

oneAPI Deep Neural Network Library (oneDNN)

aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture

Last synced: 30 Jul 2024

https://github.com/intel/mkl-dnn

oneAPI Deep Neural Network Library (oneDNN)

aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture

Last synced: 31 Jul 2024

https://github.com/simd-everywhere/simde

Implementations of SIMD instruction sets for systems which don't natively support them.

altivec arm arm64 avx avx2 avx512 fma gfni mmx neon powerpc simd simd-intrinsics sse sse2 sse3 sse41 sse42 ssse3 vectorization

Last synced: 01 Aug 2024

https://github.com/xtensor-stack/xsimd

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

avx avx512 c-plus-plus-11 cpp mathematical-functions neon simd simd-instructions simd-intrinsics sse sve vectorization

Last synced: 30 Sep 2024

https://github.com/ermig1979/simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.

altivec amx arm avx avx512 c-plus-plus haar-cascade image-processing lbp machine-learning neon neural-network powerpc simd simd-library sse vsx

Last synced: 25 Sep 2024

https://github.com/ermig1979/Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.

altivec amx arm avx avx512 c-plus-plus haar-cascade image-processing lbp machine-learning neon neural-network powerpc simd simd-library sse vsx

Last synced: 30 Jul 2024

https://github.com/kfrlib/kfr

Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)

audio audio-processing avx avx512 clang cplusplus cplusplus-14 cplusplus-17 cpp14 cpp17 cxx dft digital-signal-processing discrete-fourier-transform dsp fast-fourier-transform fft header-only simd

Last synced: 01 Oct 2024

https://github.com/vcdevel/vc

SIMD Vector Classes for C++

avx avx2 avx512 c-plus-plus cpp cpp11 cpp14 cpp17 data-parallel neon parallel parallel-computing portable simd simd-instructions simd-programming simd-vector sse vectorization

Last synced: 30 Sep 2024

https://github.com/VcDevel/Vc

SIMD Vector Classes for C++

avx avx2 avx512 c-plus-plus cpp cpp11 cpp14 cpp17 data-parallel neon parallel parallel-computing portable simd simd-instructions simd-programming simd-vector sse vectorization

Last synced: 30 Jul 2024

https://github.com/p12tic/libsimdpp

Portable header-only C++ low level SIMD library

altivec avx2 avx512 msa neon simd sse vsx

Last synced: 01 Oct 2024

https://github.com/SnellerInc/sneller

World's fastest log analysis: λ + SQL + JSON + S3

avx512 go high-performance indexless json log query-engine s3 schemaless serverless simd sql vectorized

Last synced: 31 Jul 2024

https://github.com/minio/sha256-simd

Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.

arm assembly avx avx-instructions avx512 golang intel plan9

Last synced: 29 Sep 2024

https://github.com/kimwalisch/primesieve

🚀 Fast prime number generator

avx512 eratosthenes math neon prime-numbers primes sieve sieve-of-eratosthenes sse

Last synced: 30 Jul 2024

https://github.com/intel/x86-simd-sort

C++ template library for high performance SIMD based sorting algorithms

argsort avx2 avx512 partialsort quickselect quicksort sort x86

Last synced: 31 Jul 2024

https://github.com/libxsmm/libxsmm

Library for specialized dense and sparse matrix operations, and deep learning primitives.

amx avx avx2 avx512 bfloat16 blas convolution fortran intel jit machine-learning matrix matrix-multiplication simd sparse sse tensor transpose vector

Last synced: 30 Jul 2024

https://github.com/ashvardanian/SimSIMD

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐

arm-neon arm-sve assembly avx2 avx512 blas blas-libraries distance-calculation distance-measures float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search

Last synced: 31 Jul 2024

https://github.com/ashvardanian/simsimd

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐

arm-neon arm-sve assembly avx2 avx512 blas blas-libraries distance-calculation distance-measures float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search

Last synced: 30 Sep 2024

https://github.com/agenium-scale/nsimd

Agenium Scale vectorization library for CPUs and GPUs

aarch64 avx avx2 avx512 cpp20 cpp20-library cuda hpc neon neon128 rocm simd simd-instructions simd-library simd-programming sse2 sse42 sve vectorization-library

Last synced: 29 Sep 2024

https://github.com/WojciechMula/sse-popcount

SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html

aarch64 arm-neon avx2 avx512 popcount sse

Last synced: 31 Jul 2024

https://github.com/WojciechMula/toys

Storage for my snippets, toy programs, etc.

avx2 avx512 sse string-algorithms

Last synced: 31 Jul 2024

https://github.com/RRZE-HPC/OSACA

Open Source Architecture Code Analyzer

aarch64 arm64v8 assembly avx avx2 avx512 critical-path hpc in-core latency loop-carried-dependency neon out-of-order performance-analysis performance-modeling port-mapping python sve throughput x86

Last synced: 12 Aug 2024

https://github.com/powturbo/Turbo-Base64

Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!

arm avx avx2 avx512 base64 base64-decoding base64-encoding benchmark encoding encoding-library library neon simd sse

Last synced: 04 Aug 2024

https://github.com/WojciechMula/base64-avx512

Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"

avx512 base64 simd

Last synced: 03 Aug 2024

https://github.com/minio/md5-simd

Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.

assembly avx2 avx512 golang hashing md5 performance simd