https://github.com/farukalpay/fabe
High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.
https://github.com/farukalpay/fabe
aarch64 ai-acceleration avx2 avx512 c-library cpu-optimization high-performance-computing low-level math-library math-optimization neon numerical-computing physics-simulation portable-code scientific-computing signal-processing simd trigonometry vectorized-simd-optimizations x86-64
Last synced: 9 months ago
JSON representation
High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.
- Host: GitHub
- URL: https://github.com/farukalpay/fabe
- Owner: farukalpay
- Created: 2025-04-15T01:27:29.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-04-20T18:31:33.000Z (11 months ago)
- Last Synced: 2025-06-05T23:26:29.368Z (10 months ago)
- Topics: aarch64, ai-acceleration, avx2, avx512, c-library, cpu-optimization, high-performance-computing, low-level, math-library, math-optimization, neon, numerical-computing, physics-simulation, portable-code, scientific-computing, signal-processing, simd, trigonometry, vectorized-simd-optimizations, x86-64
- Language: C
- Homepage:
- Size: 901 KB
- Stars: 39
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing
[](LICENSE)
[]()
[]()
[]()
**FABE13-HX** is a high-performance C math library that delivers ultra-fast trigonometric functions (`sin`, `cos`, `sincos`) using advanced SIMD vectorization. Powered by the innovative **Ξ¨-Hyperbasis** algorithm, it outperforms traditional math libraries by up to **8.4Γ** while maintaining high precision.
## π Why Choose FABE13-HX for Your Numerical Computing Needs
FABE13-HX revolutionizes trigonometric computation for:
- **Machine Learning & AI Acceleration** - Optimize neural network performance
- **Scientific Simulations & HPC** - Accelerate physics, engineering, and computational modeling
- **Real-time Signal Processing** - Enhance DSP, audio, and sensor data analysis
- **Graphics & Visualization Systems** - Improve rendering performance
- **Embedded Computing** - Efficient performance on resource-constrained systems
## π‘ Key Features & Performance Benefits
- β‘ **Up to 8.4Γ Faster Than Standard Math Libraries** across various platforms and input sizes
- π **Cross-Architecture Optimization** with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
- π― **High Precision** with maximum error β€ 2e-11 compared to standard libm
- π§ **Novel Rational-Function Architecture** based on Ξ¨-Hyperbasis instead of traditional polynomials
- π’ **Extreme-Range Support** accurate up to |x| β 1e308 via advanced PayneβHanek reduction
- π§© **Unified API** for both scalar and vectorized operations
- π‘οΈ **Robust Error Handling** with proper NaN/Inf/0 behavior
Designed for **numerical computing**, **AI acceleration**, and **scientific simulation**, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.
---
## π Project Structure
```
fabe13/ # Core source
βββ fabe13.c # HX implementation
βββ fabe13.h # Public API
βββ benchmark_fabe13.c # Benchmark main
tests/
βββ test_fabe13.c # Optional unit tests
CMakeLists.txt # Cross-platform CMake
Makefile # Minimalist legacy build
build.sh # Recommended build script (cross-platform)
```
---
## βοΈ Build Instructions
### β
Recommended: `build.sh`
```bash
./build.sh
```
This script:
- Cleans and configures the build (Release mode)
- Enables both benchmarking and testing
- Compiles using aggressive `-Ofast`, `-ffast-math`, `-march=native` flags
- Runs all unit tests and benchmarks automatically
### π οΈ Manual CMake
```bash
mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark
```
### π§± Makefile (Legacy)
```bash
make all
make run-benchmark
```
---
## π FABE13-HX vs libm β Performance Benchmarks
FABE13-HX delivers consistent speedups over standard `libm`, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.
### π Performance Overview
- π¨ **FABE13-HX**: SIMD-accelerated (`AVX2+FMA`, Ξ¨-core)
- π΄ **libm**: Standard C math (`math.h`)
- π§ Input size: `N β [10 ... 1,000,000,000]` doubles
- βοΈ Timing: Full-array `sincos()` throughput
- π Aligned memory: 64 bytes
- π― Accuracy: β€ 2e-11 max diff (sin/cos)
---
### π Replit (Cloud / Linux, AVX2 Clang)
.png)
> β
**FABE13-HX is consistently faster than libm β up to 8.4Γ for large inputs.**
- Platform: Replit Linux
- SIMD: AVX2 + FMA
- Compiler: Clang 14 (nix)
- libm: GNU `math.h`
---
### π MacBook Pro (macOS AVX2, AppleClang)

> π¨ **FABE13-HX outperforms libm with up to 8.4Γ higher throughput on AppleClang (AVX2).**
- Platform: macOS 14.x (MacBook Pro 16")
- SIMD: AVX2 + FMA
- Compiler: AppleClang 16.0
- libm: macOS system `math.h`
---
### π Performance Overview
```
FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes
```
### π Scaling with Array Size
> **8.4Γ throughput improvement** for large array processing compared to standard libm
### ARM64/AArch64 Performance (NEON)
| Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup |
|------------|--------------|------------|-------------------|-----------------|---------|
| 10 | 0.0000 | 0.0000 | 50.00 | 50.00 | 1.00x |
| 100 | 0.0000 | 0.0000 | 166.67 | 71.43 | 2.33x |
| 1,000 | 0.0000 | 0.0000 | 185.19 | 72.46 | 2.56x |
| 10,000 | 0.0001 | 0.0001 | 173.01 | 71.02 | 2.44x |
| 100,000 | 0.0006 | 0.0009 | 177.12 | 115.82 | 1.53x |
| 1,000,000 | 0.0016 | 0.0072 | 614.85 | 138.34 | 4.44x |
| 10,000,000 | 0.0164 | 0.0720 | 611.30 | 138.95 | 4.40x |
| 100,000,000| 0.1673 | 0.7296 | 597.63 | 137.07 | 4.36x |
| 1,000,000,000| 1.8044 | 10.4989 | 554.19 | 95.25 | 5.82x |
### π Detailed Benchmark Snapshot (N = 1,000,000)
```
FABE13: 0.0016 sec | 614.85 M ops/sec
libm: 0.0072 sec | 138.34 M ops/sec
Speedup: 4.44x
Memory: Allocated 0.04 GB
Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU: 100.0% utilization for both implementations
Max diff vs libm: sin=1.224e-11, cos=1.225e-11
```
### π¬ Precision Analysis
- All test cases maintain acceptable numerical accuracy compared to libm
- Maximum difference observed: ~10β»ΒΉΒΉ for both sin and cos operations
- Properly handles edge cases (0, inf, nan) with correct behavior
---
## π¬ Core Algorithm (Ξ¨-Hyperbasis)
```c
// Core rational transformation
Ξ¨(x) = x / (1 + (3/8)xΒ²)
// sin(x) approximation
sin(x) β Ξ¨ β
(1 - a1β
Ψ² + a2β
Ξ¨β΄ - a3β
Ξ¨βΆ)
// cos(x) approximation
cos(x) β 1 - b1β
Ψ² + b2β
Ξ¨β΄ - b3β
Ξ¨βΆ
```
This allows both functions to share a unified base, optimizing performance and memory access.
---
## π Public API
```c
#include "fabe13/fabe13.h"
// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x); // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x); // [-1, 1]
double fabe13_acos(double x); // [-1, 1]
// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);
```
---
## π§ Design Highlights
- β
**Branchless Quadrant Correction**
- β
**NaN/Inf/0-safe logic**
- β
**Prefetch-friendly & unrolled scalar fallback**
- β
**SIMD-ready backend design (NEON / AVX2 / AVX512)**
- β
**Precision-preserving range reduction**
---
## π Future Development Roadmap
- [ ] Extended SIMD Ξ¨-Hyperbasis implementation (AVX2 / NEON / AVX512)
- [ ] Additional functions: `cosm1`, `expm1`, `log1p` with Ξ¨-Hyperbasis optimization
- [ ] Single-precision `float32` support (`fabe13_sinf`, etc.)
- [ ] Ultra-fast LUT-based variants for performance-critical applications
- [ ] Language bindings for Python, Rust, and C++
- [ ] Documentation and examples for common use cases
---
## π License
MIT License Β© 2025 Faruk Alpay
See [LICENSE](fabe13-old/LICENSE)
---
## 𧬠Author
**Faruk Alpay**
https://Frontier2075.com
https://lightcap.ai
> FABE13-HX is part of the **Lightcap Initiative** β building the most precise and elegant math primitives in open source.