https://github.com/statusfailed/vibekernels
LLM-written matmul kernels
https://github.com/statusfailed/vibekernels
Last synced: about 1 month ago
JSON representation
LLM-written matmul kernels
- Host: GitHub
- URL: https://github.com/statusfailed/vibekernels
- Owner: statusfailed
- Created: 2025-12-10T16:16:25.000Z (3 months ago)
- Default Branch: master
- Last Pushed: 2025-12-11T13:42:43.000Z (3 months ago)
- Last Synced: 2025-12-12T16:40:18.872Z (3 months ago)
- Language: C++
- Size: 25.4 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Vibe Kernels
Fast CPU matrix multiplication kernels built by LLMs.
See PROMPT.md for LLM instructions.
## Kernels
**Reference**: OpenBLAS single-threaded SGEMM ~147.07 GFLOPS
| Kernel | Technique | GFLOPS | vs Reference |
|--------|-----------|--------|--------------|
| blocked | Cache blocking | ~4.96 | 3.4% |
| blocked_avx2 | AVX2 SIMD | ~25.66 | 17.4% |
| blocked_avx2_microkernel4x4 | 4×8 microkernel + packing | ~83.40 | 56.7% |
| blocked_avx512_microkernel4x16 | AVX512 16-wide | ~162.31 | 110.4% |
| blocked_avx512_microkernel4x16_prefetch | Memory prefetching | ~154.52 | 105.1% |
## Usage
```bash
nix develop
make clean && make
./run list # List kernels
./run test # Test correctness
./run bench # Benchmark performance
./run compare # Compare kernels
```
## Architecture
- `kernels/` - Kernel implementations
- `*_harness.hpp` - Test/benchmark/tuning frameworks
- `main.cpp` - CLI interface
Each kernel adds one optimization technique for systematic performance analysis.