Projects in Awesome Lists tagged with fp8
A curated list of projects in awesome lists tagged with fp8 .
https://github.com/NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
cuda deep-learning fp4 fp8 gpu jax machine-learning python pytorch
Last synced: 16 Nov 2025
https://github.com/nvidia/transformerengine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
cuda deep-learning fp8 gpu jax machine-learning python pytorch
Last synced: 24 Feb 2026
https://github.com/azure/ms-amp
Microsoft Automatic Mixed Precision Library
amp deep-learning fp8 gpu mixed-precision pytorch transformer
Last synced: 07 Apr 2025
https://github.com/aredden/flux-fp8-api
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
diffusion fast-inference flux fp8 pytorch quantization
Last synced: 19 Sep 2025
https://github.com/graphcore-research/jax-scalify
JAX Scalify: end-to-end scaled arithmetics
Last synced: 17 Jan 2026
https://github.com/zerfoo/zerfoo
Pure Go machine learning framework. Train, run, and serve ML models with go build. Zero CGo.
autodiff deep-learning distributed-training float16 float8 fp16 fp8 go golang graph-ml machine-learning ml-framework neural-network onnx transformer
Last synced: 13 Apr 2026
https://github.com/murrellgroup/microfloats.jl
Slow, low-precision floating point types
floating-point fp4 fp6 fp8 microfloat microscaling minifloat
Last synced: 12 Feb 2026
https://github.com/theogravity/dual-rtx-6000-blackwell-qwen3.6-27b-fp8
Optimized vLLM setup for Qwen3.6-27B-FP8 on dual RTX PRO 6000 Blackwell (192 GB GDDR7, no NVLink) ; config, benchmark sweep results, and custom chat template with thinking mode off by default.
benchmark blackwell fp8 llm-inference local-llm multi-token-prediction qwen3 rtx-pro-6000 speculative-decoding vllm
Last synced: 11 May 2026
https://github.com/pathcosmos/frankenstallm
Korean 3B LLM (pure Transformer) pretrained from scratch on 8× NVIDIA B200 GPUs with SFT + ORPO alignment
flash-attention fp8 gguf gqa korean-llm nvidia-b200 orpo pretraining sft transformer
Last synced: 29 May 2026
https://github.com/umangyadav/py_fp8
FP8 dtypes enumeration in python
fp8 fp8e4m3 fp8e4m3fnuz fp8e5m2 fp8e5m2fnuz
Last synced: 17 Jun 2025