{"id":28375674,"url":"https://github.com/farukalpay/fabe","last_synced_at":"2025-06-26T05:31:03.047Z","repository":{"id":287992041,"uuid":"966473826","full_name":"farukalpay/FABE","owner":"farukalpay","description":"High-accuracy SIMD sin/cos/sincos library in C with AVX2, AVX-512, and NEON support. Full-range reduction. Fast at scale. Portable by design.","archived":false,"fork":false,"pushed_at":"2025-04-20T18:31:33.000Z","size":923,"stargazers_count":39,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-05T23:26:29.368Z","etag":null,"topics":["aarch64","ai-acceleration","avx2","avx512","c-library","cpu-optimization","high-performance-computing","low-level","math-library","math-optimization","neon","numerical-computing","physics-simulation","portable-code","scientific-computing","signal-processing","simd","trigonometry","vectorized-simd-optimizations","x86-64"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/farukalpay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-15T01:27:29.000Z","updated_at":"2025-05-27T02:44:37.000Z","dependencies_parsed_at":"2025-04-16T04:33:05.409Z","dependency_job_id":null,"html_url":"https://github.com/farukalpay/FABE","commit_stats":null,"previous_names":["farukalpay/fabe"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/farukalpay/FABE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farukalpay%2FFABE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farukalpay%2FFABE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farukalpay%2FFABE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farukalpay%2FFABE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/farukalpay","download_url":"https://codeload.github.com/farukalpay/FABE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farukalpay%2FFABE/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262008789,"owners_count":23244256,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aarch64","ai-acceleration","avx2","avx512","c-library","cpu-optimization","high-performance-computing","low-level","math-library","math-optimization","neon","numerical-computing","physics-simulation","portable-code","scientific-computing","signal-processing","simd","trigonometry","vectorized-simd-optimizations","x86-64"],"created_at":"2025-05-29T23:06:27.701Z","updated_at":"2025-06-26T05:31:03.039Z","avatar_url":"https://github.com/farukalpay.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FABE13-HX: High-Performance SIMD Trigonometric Library for Scientific Computing\n\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Build](https://img.shields.io/badge/build-passing-brightgreen.svg)]()\n[![Platform](https://img.shields.io/badge/platform-x86_64%20%7C%20AArch64-lightgrey.svg)]()\n[![SIMD](https://img.shields.io/badge/SIMD-AVX2%2C%20AVX512%2C%20NEON-orange.svg)]()\n\n**FABE13-HX** is a high-performance C math library that delivers ultra-fast trigonometric functions (`sin`, `cos`, `sincos`) using advanced SIMD vectorization. Powered by the innovative **Ψ-Hyperbasis** algorithm, it outperforms traditional math libraries by up to **8.4×** while maintaining high precision.\n\n## 🚀 Why Choose FABE13-HX for Your Numerical Computing Needs\n\nFABE13-HX revolutionizes trigonometric computation for:\n\n- **Machine Learning \u0026 AI Acceleration** - Optimize neural network performance\n- **Scientific Simulations \u0026 HPC** - Accelerate physics, engineering, and computational modeling\n- **Real-time Signal Processing** - Enhance DSP, audio, and sensor data analysis\n- **Graphics \u0026 Visualization Systems** - Improve rendering performance\n- **Embedded Computing** - Efficient performance on resource-constrained systems\n\n## 💡 Key Features \u0026 Performance Benefits\n\n- ⚡ **Up to 8.4× Faster Than Standard Math Libraries** across various platforms and input sizes\n- 🔄 **Cross-Architecture Optimization** with support for AVX512F, AVX2+FMA (x86), NEON (ARM)\n- 🎯 **High Precision** with maximum error ≤ 2e-11 compared to standard libm\n- 🧠 **Novel Rational-Function Architecture** based on Ψ-Hyperbasis instead of traditional polynomials\n- 🔢 **Extreme-Range Support** accurate up to |x| ≈ 1e308 via advanced Payne–Hanek reduction\n- 🧩 **Unified API** for both scalar and vectorized operations\n- 🛡️ **Robust Error Handling** with proper NaN/Inf/0 behavior\n\nDesigned for **numerical computing**, **AI acceleration**, and **scientific simulation**, it replaces traditional polynomial approximations with a fused rational + correction model that's more efficient and vectorization-friendly.\n\n---\n\n## 📂 Project Structure\n\n```\nfabe13/                 # Core source\n├── fabe13.c            # HX implementation\n├── fabe13.h            # Public API\n├── benchmark_fabe13.c  # Benchmark main\n\ntests/\n└── test_fabe13.c       # Optional unit tests\n\nCMakeLists.txt          # Cross-platform CMake\nMakefile                # Minimalist legacy build\nbuild.sh                # Recommended build script (cross-platform)\n```\n\n---\n\n## ⚙️ Build Instructions\n\n### ✅ Recommended: `build.sh`\n\n```bash\n./build.sh\n```\n\nThis script:\n- Cleans and configures the build (Release mode)\n- Enables both benchmarking and testing\n- Compiles using aggressive `-Ofast`, `-ffast-math`, `-march=native` flags\n- Runs all unit tests and benchmarks automatically\n\n### 🛠️ Manual CMake\n\n```bash\nmkdir -p build \u0026\u0026 cd build\ncmake .. -DFABE13_ENABLE_BENCHMARK=ON -DFABE13_ENABLE_TEST=ON\nmake\n./fabe13_test\n./fabe13_benchmark\n```\n\n### 🧱 Makefile (Legacy)\n\n```bash\nmake all\nmake run-benchmark\n```\n\n---\n\n## 🚀 FABE13-HX vs libm — Performance Benchmarks\n\nFABE13-HX delivers consistent speedups over standard `libm`, across platforms and input sizes. These benchmarks highlight its advantage for both cloud-based and local environments.\n\n### 📊 Performance Overview\n\n- 🟨 **FABE13-HX**: SIMD-accelerated (`AVX2+FMA`, Ψ-core)\n- 🔴 **libm**: Standard C math (`math.h`)\n- 🧠 Input size: `N ∈ [10 ... 1,000,000,000]` doubles\n- ⚙️ Timing: Full-array `sincos()` throughput\n- 📐 Aligned memory: 64 bytes\n- 🎯 Accuracy: ≤ 2e-11 max diff (sin/cos)\n\n---\n\n### 🌐 Replit (Cloud / Linux, AVX2 Clang)\n\n![FABE13-HX vs libm — Replit](https://github.com/farukalpay/FABE/blob/main/img/Performance%20Comparison%3A%20FABE13-HX%20vs%20libm%20(Platform%3A%20Replit%2C%20AVX2%20Core%2C%20CMath%20backend).png)\n\n\u003e ✅ **FABE13-HX is consistently faster than libm — up to 8.4× for large inputs.**\n\n- Platform: Replit Linux\n- SIMD: AVX2 + FMA\n- Compiler: Clang 14 (nix)\n- libm: GNU `math.h`\n\n---\n\n### 🍎 MacBook Pro (macOS AVX2, AppleClang)\n\n![FABE13-HX vs libm — macOS](https://github.com/farukalpay/FABE/blob/main/img/FABE13-HX%20vs%20libm%20%E2%80%94%20Performance%20Benchmark.png)\n\n\u003e 🟨 **FABE13-HX outperforms libm with up to 8.4× higher throughput on AppleClang (AVX2).**\n\n- Platform: macOS 14.x (MacBook Pro 16\")\n- SIMD: AVX2 + FMA\n- Compiler: AppleClang 16.0\n- libm: macOS system `math.h`\n\n---\n\n### 📊 Performance Overview\n\n```\nFABE13 Active Implementation: NEON (AArch64) (SIMD Width: 2)\nBenchmark Alignment: 64 bytes\n```\n\n### 📈 Scaling with Array Size\n\n\u003e **8.4× throughput improvement** for large array processing compared to standard libm\n\n### ARM64/AArch64 Performance (NEON)\n\n| Array Size | FABE13 (sec) | Libm (sec) | FABE13 (M ops/sec) | Libm (M ops/sec) | Speedup |\n|------------|--------------|------------|-------------------|-----------------|---------|\n| 10         | 0.0000       | 0.0000     | 50.00             | 50.00           | 1.00x   |\n| 100        | 0.0000       | 0.0000     | 166.67            | 71.43           | 2.33x   |\n| 1,000      | 0.0000       | 0.0000     | 185.19            | 72.46           | 2.56x   |\n| 10,000     | 0.0001       | 0.0001     | 173.01            | 71.02           | 2.44x   |\n| 100,000    | 0.0006       | 0.0009     | 177.12            | 115.82          | 1.53x   |\n| 1,000,000  | 0.0016       | 0.0072     | 614.85            | 138.34          | 4.44x   |\n| 10,000,000 | 0.0164       | 0.0720     | 611.30            | 138.95          | 4.40x   |\n| 100,000,000| 0.1673       | 0.7296     | 597.63            | 137.07          | 4.36x   |\n| 1,000,000,000| 1.8044     | 10.4989    | 554.19            | 95.25           | 5.82x   |\n\n### 🔍 Detailed Benchmark Snapshot (N = 1,000,000)\n\n```\nFABE13:  0.0016 sec  |  614.85 M ops/sec\nlibm:    0.0072 sec  |  138.34 M ops/sec\nSpeedup: 4.44x\n\nMemory: Allocated 0.04 GB\n        Peak RSS: ~29 MB (FABE13), ~45 MB (Libm)\nCPU:    100.0% utilization for both implementations\n\nMax diff vs libm: sin=1.224e-11, cos=1.225e-11\n```\n\n### 🔬 Precision Analysis\n\n- All test cases maintain acceptable numerical accuracy compared to libm\n- Maximum difference observed: ~10⁻¹¹ for both sin and cos operations \n- Properly handles edge cases (0, inf, nan) with correct behavior\n\n---\n\n## 🔬 Core Algorithm (Ψ-Hyperbasis)\n\n```c\n// Core rational transformation\nΨ(x) = x / (1 + (3/8)x²)\n\n// sin(x) approximation\nsin(x) ≈ Ψ ⋅ (1 - a1⋅Ψ² + a2⋅Ψ⁴ - a3⋅Ψ⁶)\n\n// cos(x) approximation\ncos(x) ≈ 1 - b1⋅Ψ² + b2⋅Ψ⁴ - b3⋅Ψ⁶\n```\n\nThis allows both functions to share a unified base, optimizing performance and memory access.\n\n---\n\n## 📊 Public API\n\n```c\n#include \"fabe13/fabe13.h\"\n\n// Scalar API\ndouble fabe13_sin(double x);\ndouble fabe13_cos(double x);\ndouble fabe13_sinc(double x);  // sin(x)/x\ndouble fabe13_tan(double x);\ndouble fabe13_cot(double x);\ndouble fabe13_atan(double x);\ndouble fabe13_asin(double x);  // [-1, 1]\ndouble fabe13_acos(double x);  // [-1, 1]\n\n// SIMD vector API\nvoid fabe13_sincos(const double* in, double* sin_out, double* cos_out, int n);\n```\n\n---\n\n## 🧠 Design Highlights\n\n- ✅ **Branchless Quadrant Correction**\n- ✅ **NaN/Inf/0-safe logic**\n- ✅ **Prefetch-friendly \u0026 unrolled scalar fallback**\n- ✅ **SIMD-ready backend design (NEON / AVX2 / AVX512)**\n- ✅ **Precision-preserving range reduction**\n\n---\n\n## 🔭 Future Development Roadmap\n\n- [ ] Extended SIMD Ψ-Hyperbasis implementation (AVX2 / NEON / AVX512)\n- [ ] Additional functions: `cosm1`, `expm1`, `log1p` with Ψ-Hyperbasis optimization\n- [ ] Single-precision `float32` support (`fabe13_sinf`, etc.)\n- [ ] Ultra-fast LUT-based variants for performance-critical applications\n- [ ] Language bindings for Python, Rust, and C++\n- [ ] Documentation and examples for common use cases\n\n---\n\n## 📜 License\n\nMIT License © 2025 Faruk Alpay  \nSee [LICENSE](fabe13-old/LICENSE)\n\n---\n\n## 🧬 Author\n\n**Faruk Alpay**  \nhttps://Frontier2075.com  \nhttps://lightcap.ai  \n\n\u003e FABE13-HX is part of the **Lightcap Initiative** — building the most precise and elegant math primitives in open source.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarukalpay%2Ffabe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffarukalpay%2Ffabe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarukalpay%2Ffabe/lists"}