Projects in Awesome Lists tagged with bfloat16

https://github.com/uxlfoundation/onednn

oneAPI Deep Neural Network Library (oneDNN)

aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture

Last synced: 02 Jul 2026

https://github.com/oneapi-src/oneDNN

oneAPI Deep Neural Network Library (oneDNN)

aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture

Last synced: 29 Mar 2025

https://github.com/uxlfoundation/oneDNN

oneAPI Deep Neural Network Library (oneDNN)

aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture

Last synced: 15 Mar 2025

https://github.com/ashvardanian/simsimd

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

arm-neon arm-sve assembly avx2 avx512 bfloat16 blas blas-libraries distance-calculation float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search

Last synced: 13 May 2025

https://github.com/ashvardanian/SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

arm-neon arm-sve assembly avx2 avx512 bfloat16 blas blas-libraries distance-calculation float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search

Last synced: 23 Mar 2025

https://github.com/hfp/libxsmm

Library for specialized dense and sparse matrix operations, and deep learning primitives.

amx avx avx2 avx512 bfloat16 blas convolution fortran intel jit machine-learning matrix matrix-multiplication simd sparse sse tensor transpose vector

Last synced: 21 Oct 2025

https://github.com/libxsmm/libxsmm

Library for specialized dense and sparse matrix operations, and deep learning primitives.

amx avx avx2 avx512 bfloat16 blas convolution fortran intel jit machine-learning matrix matrix-multiplication simd sparse sse tensor transpose vector

Last synced: 14 May 2025

https://github.com/juliamath/bfloat16s.jl

Julia implementation for the BFloat16 number type

bfloat16 julia math

Last synced: 04 Apr 2025

https://github.com/shibatch/tlfloat

C++ template library for floating point operations

arbitrary-precision bfloat16 constexpr cplusplus cpp20 cross-platform cuda elementary-functions float128 float256 floating-point half-precision heapless ieee754 library math octuple-precision quadruple-precision templates

Last synced: 14 Jul 2025

https://github.com/afterdusk/flop

IEEE 754-style floating-point converter

bfloat16 floating-point floating-point-conversion fp16 ieee-754 tensorfloat

Last synced: 08 May 2025

https://github.com/aahouzi/llama2-chatbot-cpu

A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.

4-bit-cpu bfloat16 chatbot chatbot-memory chatgpt cpu huggingface int8 intel ipex langchain llama llama2 meta meta-ai neural-compression numa optimization smooth-quantization streamlit