An open API service indexing awesome lists of open source software.

https://github.com/farukalpay/fabe

High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.
https://github.com/farukalpay/fabe

aarch64 ai-acceleration avx2 avx512 c-library cpu-optimization high-performance-computing low-level math-library math-optimization neon numerical-computing physics-simulation portable-code scientific-computing signal-processing simd trigonometry vectorized-simd-optimizations x86-64

Last synced: 9 months ago
JSON representation

High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.

Awesome Lists containing this project

README

          

# FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Build](https://img.shields.io/badge/build-passing-brightgreen.svg)]()
[![Platform](https://img.shields.io/badge/platform-x86_64%20%7C%20AArch64-lightgrey.svg)]()
[![SIMD](https://img.shields.io/badge/SIMD-AVX2%2C%20AVX512%2C%20NEON-orange.svg)]()

**FABE13-HX** is a high-performance C math library that delivers ultra-fast trigonometric functions (`sin`, `cos`, `sincos`) using advanced SIMD vectorization. Powered by the innovative **Ξ¨-Hyperbasis** algorithm, it outperforms traditional math libraries by up to **8.4Γ—** while maintaining high precision.

## πŸš€ Why Choose FABE13-HX for Your Numerical Computing Needs

FABE13-HX revolutionizes trigonometric computation for:

- **Machine Learning & AI Acceleration** - Optimize neural network performance
- **Scientific Simulations & HPC** - Accelerate physics, engineering, and computational modeling
- **Real-time Signal Processing** - Enhance DSP, audio, and sensor data analysis
- **Graphics & Visualization Systems** - Improve rendering performance
- **Embedded Computing** - Efficient performance on resource-constrained systems

## πŸ’‘ Key Features & Performance Benefits

- ⚑ **Up to 8.4Γ— Faster Than Standard Math Libraries** across various platforms and input sizes
- πŸ”„ **Cross-Architecture Optimization** with support for AVX512F, AVX2+FMA (x86), NEON (ARM)
- 🎯 **High Precision** with maximum error ≀ 2e-11 compared to standard libm
- 🧠 **Novel Rational-Function Architecture** based on Ψ-Hyperbasis instead of traditional polynomials
- πŸ”’ **Extreme-Range Support** accurate up to |x| β‰ˆ 1e308 via advanced Payne–Hanek reduction
- 🧩 **Unified API** for both scalar and vectorized operations
- πŸ›‘οΈ **Robust Error Handling** with proper NaN/Inf/0 behavior

Designed for **numerical computing**, **AI acceleration**, and **scientific simulation**, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.

---

## πŸ“‚ Project Structure

```
fabe13/ # Core source
β”œβ”€β”€ fabe13.c # HX implementation
β”œβ”€β”€ fabe13.h # Public API
β”œβ”€β”€ benchmark_fabe13.c # Benchmark main

tests/
└── test_fabe13.c # Optional unit tests

CMakeLists.txt # Cross-platform CMake
Makefile # Minimalist legacy build
build.sh # Recommended build script (cross-platform)
```

---

## βš™οΈ Build Instructions

### βœ… Recommended: `build.sh`

```bash
./build.sh
```

This script:
- Cleans and configures the build (Release mode)
- Enables both benchmarking and testing
- Compiles using aggressive `-Ofast`, `-ffast-math`, `-march=native` flags
- Runs all unit tests and benchmarks automatically

### πŸ› οΈ Manual CMake

```bash
mkdir -p build && cd build
cmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON
make
./fabe13_test
./fabe13_benchmark
```

### 🧱 Makefile (Legacy)

```bash
make all
make run-benchmark
```

---

## πŸš€ FABE13-HX vs libm β€” Performance Benchmarks

FABE13-HX delivers consistent speedups over standard `libm`, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.

### πŸ“Š Performance Overview

- 🟨 **FABE13-HX**: SIMD-accelerated (`AVX2+FMA`, Ψ-core)
- πŸ”΄ **libm**: Standard C math (`math.h`)
- 🧠 Input size: `N ∈ [10 ... 1,000,000,000]` doubles
- βš™οΈ Timing: Full-array `sincos()` throughput
- πŸ“ Aligned memory: 64 bytes
- 🎯 Accuracy: ≀ 2e-11 max diff (sin/cos)

---

### 🌐 Replit (Cloud / Linux, AVX2 Clang)

![FABE13-HX vs libm β€” Replit](https://github.com/farukalpay/FABE/blob/main/img/Performance%20Comparison%3A%20FABE13-HX%20vs%20libm%20(Platform%3A%20Replit%2C%20AVX2%20Core%2C%20CMath%20backend).png)

> βœ… **FABE13-HX is consistently faster than libm β€” up to 8.4Γ— for large inputs.**

- Platform: Replit Linux
- SIMD: AVX2 + FMA
- Compiler: Clang 14 (nix)
- libm: GNU `math.h`

---

### 🍎 MacBook Pro (macOS AVX2, AppleClang)

![FABE13-HX vs libm β€” macOS](https://github.com/farukalpay/FABE/blob/main/img/FABE13-HX%20vs%20libm%20%E2%80%94%20Performance%20Benchmark.png)

> 🟨 **FABE13-HX outperforms libm with up to 8.4Γ— higher throughput on AppleClang (AVX2).**

- Platform: macOS 14.x (MacBook Pro 16")
- SIMD: AVX2 + FMA
- Compiler: AppleClang 16.0
- libm: macOS system `math.h`

---

### πŸ“Š Performance Overview

```
FABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)
Benchmark Alignment: 64 bytes
```

### πŸ“ˆ Scaling with Array Size

> **8.4Γ— throughput improvement** for large array processing compared to standard libm

### ARM64/AArch64 Performance (NEON)

| Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup |
|------------|--------------|------------|-------------------|-----------------|---------|
| 10 | 0.0000 | 0.0000 | 50.00 | 50.00 | 1.00x |
| 100 | 0.0000 | 0.0000 | 166.67 | 71.43 | 2.33x |
| 1,000 | 0.0000 | 0.0000 | 185.19 | 72.46 | 2.56x |
| 10,000 | 0.0001 | 0.0001 | 173.01 | 71.02 | 2.44x |
| 100,000 | 0.0006 | 0.0009 | 177.12 | 115.82 | 1.53x |
| 1,000,000 | 0.0016 | 0.0072 | 614.85 | 138.34 | 4.44x |
| 10,000,000 | 0.0164 | 0.0720 | 611.30 | 138.95 | 4.40x |
| 100,000,000| 0.1673 | 0.7296 | 597.63 | 137.07 | 4.36x |
| 1,000,000,000| 1.8044 | 10.4989 | 554.19 | 95.25 | 5.82x |

### πŸ” Detailed Benchmark Snapshot (N = 1,000,000)

```
FABE13: 0.0016 sec | 614.85 M ops/sec
libm: 0.0072 sec | 138.34 M ops/sec
Speedup: 4.44x

Memory: Allocated 0.04 GB
Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)
CPU: 100.0% utilization for both implementations

Max diff vs libm: sin=1.224e-11, cos=1.225e-11
```

### πŸ”¬ Precision Analysis

- All test cases maintain acceptable numerical accuracy compared to libm
- Maximum difference observed: ~10⁻¹¹ for both sin and cos operations
- Properly handles edge cases (0, inf, nan) with correct behavior

---

## πŸ”¬ Core Algorithm (Ξ¨-Hyperbasis)

```c
// Core rational transformation
Ξ¨(x) = x / (1 + (3/8)xΒ²)

// sin(x) approximation
sin(x) β‰ˆ Ξ¨ β‹… (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁢)

// cos(x) approximation
cos(x) β‰ˆ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁢
```

This allows both functions to share a unified base, optimizing performance and memory access.

---

## πŸ“Š Public API

```c
#include "fabe13/fabe13.h"

// Scalar API
double fabe13_sin(double x);
double fabe13_cos(double x);
double fabe13_sinc(double x); // sin(x)/x
double fabe13_tan(double x);
double fabe13_cot(double x);
double fabe13_atan(double x);
double fabe13_asin(double x); // [-1, 1]
double fabe13_acos(double x); // [-1, 1]

// SIMD vector API
void fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);
```

---

## 🧠 Design Highlights

- βœ… **Branchless Quadrant Correction**
- βœ… **NaN/Inf/0-safe logic**
- βœ… **Prefetch-friendly & unrolled scalar fallback**
- βœ… **SIMD-ready backend design (NEON / AVX2 / AVX512)**
- βœ… **Precision-preserving range reduction**

---

## πŸ”­ Future Development Roadmap

- [ ] Extended SIMD Ξ¨-Hyperbasis implementation (AVX2 / NEON / AVX512)
- [ ] Additional functions: `cosm1`, `expm1`, `log1p` with Ξ¨-Hyperbasis optimization
- [ ] Single-precision `float32` support (`fabe13_sinf`, etc.)
- [ ] Ultra-fast LUT-based variants for performance-critical applications
- [ ] Language bindings for Python, Rust, and C++
- [ ] Documentation and examples for common use cases

---

## πŸ“œ License

MIT License Β© 2025 Faruk Alpay
See [LICENSE](fabe13-old/LICENSE)

---

## 🧬 Author

**Faruk Alpay**
https://Frontier2075.com
https://lightcap.ai

> FABE13-HX is part of the **Lightcap Initiative** β€” building the most precise and elegant math primitives in open source.