An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with fp8

A curated list of projects in awesome lists tagged with fp8 .

https://github.com/NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

cuda deep-learning fp4 fp8 gpu jax machine-learning python pytorch

Last synced: 16 Nov 2025

https://github.com/nvidia/transformerengine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

cuda deep-learning fp8 gpu jax machine-learning python pytorch

Last synced: 24 Feb 2026

https://github.com/azure/ms-amp

Microsoft Automatic Mixed Precision Library

amp deep-learning fp8 gpu mixed-precision pytorch transformer

Last synced: 07 Apr 2025

https://github.com/intel/neural-speed

An innovative library for efficient LLM inference via low-bit quantization

cpu fp4 fp8 gaudi2 gpu int1 int2 int3 int4 int5 int6 int7 int8 llamacpp llm-fine-tuning llm-inference low-bit mxformat nf4 sparsity

Last synced: 25 Oct 2025

https://github.com/aredden/flux-fp8-api

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

diffusion fast-inference flux fp8 pytorch quantization

Last synced: 19 Sep 2025

https://github.com/graphcore-research/jax-scalify

JAX Scalify: end-to-end scaled arithmetics

fp8 jax llm low-precision

Last synced: 17 Jan 2026

https://github.com/zerfoo/zerfoo

Pure Go machine learning framework. Train, run, and serve ML models with go build. Zero CGo.

autodiff deep-learning distributed-training float16 float8 fp16 fp8 go golang graph-ml machine-learning ml-framework neural-network onnx transformer

Last synced: 13 Apr 2026

https://github.com/murrellgroup/microfloats.jl

Slow, low-precision floating point types

floating-point fp4 fp6 fp8 microfloat microscaling minifloat

Last synced: 12 Feb 2026

https://github.com/theogravity/dual-rtx-6000-blackwell-qwen3.6-27b-fp8

Optimized vLLM setup for Qwen3.6-27B-FP8 on dual RTX PRO 6000 Blackwell (192 GB GDDR7, no NVLink) ; config, benchmark sweep results, and custom chat template with thinking mode off by default.

benchmark blackwell fp8 llm-inference local-llm multi-token-prediction qwen3 rtx-pro-6000 speculative-decoding vllm

Last synced: 11 May 2026

https://github.com/pathcosmos/frankenstallm

Korean 3B LLM (pure Transformer) pretrained from scratch on 8× NVIDIA B200 GPUs with SFT + ORPO alignment

flash-attention fp8 gguf gqa korean-llm nvidia-b200 orpo pretraining sft transformer

Last synced: 29 May 2026

https://github.com/umangyadav/py_fp8

FP8 dtypes enumeration in python

fp8 fp8e4m3 fp8e4m3fnuz fp8e5m2 fp8e5m2fnuz

Last synced: 17 Jun 2025