Projects in Awesome Lists tagged with bfloat16
A curated list of projects in awesome lists tagged with bfloat16 .
https://github.com/uxlfoundation/onednn
oneAPI Deep Neural Network Library (oneDNN)
aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture
Last synced: 11 Dec 2025
https://github.com/oneapi-src/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture
Last synced: 29 Mar 2025
https://github.com/uxlfoundation/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture
Last synced: 15 Mar 2025
https://github.com/ashvardanian/simsimd
Up to 200x Faster Dot Products & Similarity Metrics โ for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 ๐
arm-neon arm-sve assembly avx2 avx512 bfloat16 blas blas-libraries distance-calculation float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search
Last synced: 13 May 2025
https://github.com/ashvardanian/SimSIMD
Up to 200x Faster Dot Products & Similarity Metrics โ for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 ๐
arm-neon arm-sve assembly avx2 avx512 bfloat16 blas blas-libraries distance-calculation float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search
Last synced: 23 Mar 2025
https://github.com/hfp/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
amx avx avx2 avx512 bfloat16 blas convolution fortran intel jit machine-learning matrix matrix-multiplication simd sparse sse tensor transpose vector
Last synced: 21 Oct 2025
https://github.com/libxsmm/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
amx avx avx2 avx512 bfloat16 blas convolution fortran intel jit machine-learning matrix matrix-multiplication simd sparse sse tensor transpose vector
Last synced: 14 May 2025
https://github.com/juliamath/bfloat16s.jl
Julia implementation for the BFloat16 number type
Last synced: 04 Apr 2025
https://github.com/shibatch/tlfloat
C++ template library for floating point operations
arbitrary-precision bfloat16 constexpr cplusplus cpp20 cross-platform cuda elementary-functions float128 float256 floating-point half-precision heapless ieee754 library math octuple-precision quadruple-precision templates
Last synced: 14 Jul 2025
https://github.com/afterdusk/flop
IEEE 754-style floating-point converter
bfloat16 floating-point floating-point-conversion fp16 ieee-754 tensorfloat
Last synced: 08 May 2025
https://github.com/aahouzi/llama2-chatbot-cpu
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intelยฎ Extension For PyTorch with bfloat16.
4-bit-cpu bfloat16 chatbot chatbot-memory chatgpt cpu huggingface int8 intel ipex langchain llama llama2 meta meta-ai neural-compression numa optimization smooth-quantization streamlit
Last synced: 20 Mar 2025
https://github.com/kerneltuner/kernel_float
CUDA/HIP header-only library writing vectorized and low-precision (16 bit, 8 bit) GPU kernels
bfloat16 cpp cuda floating-point gpu half-precision header-only-library hip kernel-tuner low-precision mixed-precision performance reduced-precision vectorization
Last synced: 12 Apr 2025
https://github.com/puzzlef/vector-sum
Comparison of vector element sum using various data types.
bfloat16 experiment float sequential single-threaded sum vector
Last synced: 06 Apr 2025
https://github.com/imciner2/chopblas
Basic linear algebra routines implemented using the chop rounding function
arithmetic bfloat16 half-precision matlab matrix rounding stochastic-rounding
Last synced: 28 Apr 2025
https://github.com/sigurd4/custom_float
Customizable floating point types, with all standard floating point operations implemented from scratch.
bfloat16 float floating-point ieee754 tensorfloat
Last synced: 15 Jun 2025
https://github.com/puzzlef/pagerank-datatype
Comparison of PageRank algorithm using various datatypes.
bfloat16 csr experiment float graph pagerank pull sequential single-threaded
Last synced: 06 Apr 2025
https://github.com/dev-samuelkb/bf16
Visual Brainfuck runtime for interactive games. Compile easily on Linux with SDL2 support. Join the project on GitHub! ๐๐
ai audio-generation bfloat16 binary16 crates dia dialogue-tts fastapi float16 fp64 ieee754 python rust rust-embedded speech-synthesis-api tts tts-api voice-cloning
Last synced: 24 Jun 2025