https://github.com/coffeevampir3/hyper-amx
Repo for AMX + FAST
https://github.com/coffeevampir3/hyper-amx
amx avx512 inference inference-engine matmul numa-aware quantization tensor tensor-parallelism
Last synced: 9 days ago
JSON representation
Repo for AMX + FAST
- Host: GitHub
- URL: https://github.com/coffeevampir3/hyper-amx
- Owner: CoffeeVampir3
- Created: 2025-10-24T17:43:15.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-11-01T06:53:47.000Z (8 months ago)
- Last Synced: 2025-11-01T08:31:23.855Z (8 months ago)
- Topics: amx, avx512, inference, inference-engine, matmul, numa-aware, quantization, tensor, tensor-parallelism
- Language: C++
- Homepage:
- Size: 674 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Actively in Development
This is currently in progress and is not feature complete. Below are the existing features, but there's still quite a lot of work before inference can be done.
### Main points:
- Modern C++ (cpp23)
- Modules
- No external dependencies
- Megatron Tensor-Parallel Row/Col interleaving
- NUMA Awareness
- AVX512 + AMX exclusive
- Pure AMX GEMM
### AMXQ
- AMXQ (Grouped asymmetric mean-centered quantization)
- Fused AMXQ AMX GEMM (Reduces bandwidth pressure by shrinking the accumulator)