{"id":16622574,"url":"https://github.com/shibatch/tlfloat","last_synced_at":"2025-07-14T03:09:28.608Z","repository":{"id":230901327,"uuid":"780412692","full_name":"shibatch/tlfloat","owner":"shibatch","description":"C++ template library for floating point operations","archived":false,"fork":false,"pushed_at":"2025-07-11T04:17:38.000Z","size":758,"stargazers_count":28,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-11T08:31:48.909Z","etag":null,"topics":["arbitrary-precision","bfloat16","constexpr","cplusplus","cpp20","cross-platform","cuda","elementary-functions","float128","float256","floating-point","half-precision","heapless","ieee754","library","math","octuple-precision","quadruple-precision","templates"],"latest_commit_sha":null,"homepage":"https://shibatch.github.io/tlfloat-doxygen/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shibatch.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-04-01T12:32:38.000Z","updated_at":"2025-07-11T04:17:41.000Z","dependencies_parsed_at":"2025-07-11T06:09:42.166Z","dependency_job_id":"3a664088-29e5-4c0a-a58f-46c8a3ace44c","html_url":"https://github.com/shibatch/tlfloat","commit_stats":null,"previous_names":["shibatch/tlfloat"],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/shibatch/tlfloat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibatch%2Ftlfloat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibatch%2Ftlfloat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibatch%2Ftlfloat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibatch%2Ftlfloat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shibatch","download_url":"https://codeload.github.com/shibatch/tlfloat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibatch%2Ftlfloat/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265236960,"owners_count":23732504,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arbitrary-precision","bfloat16","constexpr","cplusplus","cpp20","cross-platform","cuda","elementary-functions","float128","float256","floating-point","half-precision","heapless","ieee754","library","math","octuple-precision","quadruple-precision","templates"],"created_at":"2024-10-12T03:00:57.815Z","updated_at":"2025-07-14T03:09:28.588Z","avatar_url":"https://github.com/shibatch.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"== TLFloat - C++ template library for floating point operations\n\nOriginal distribution site : https://github.com/shibatch/tlfloat\n\nDoxygen-generated reference :\nhttps://shibatch.github.io/tlfloat-doxygen/\n\nSome more documataion is available at :\nhttps://github.com/shibatch/tlfloat/wiki\n\n=== Introduction\n\nThis library implements C{pp} classes with which half, single, double,\nquadruple and octuple precision IEEE 754 floating point numbers can be\noperated.\n\nInternally, these classes are implemented as class templates on top of\narbitrary-precision integer class templates so that the templates are\nexpanded as arbitrary precision floating-point operations by just\nchanging the template parameters, rather than implementing each\nfloating-point operation for each precision. The arbitrary-precision\ninteger class templates are also included in this library.\n\n=== Features\n\n* Truly constexpr functions\n** Compilable with C{pp}20 standard\n** Most of the functions are implemented as templates\n*** Completely inlinable functions\n*** The functions can be evaluated at compile time\n** No malloc required\n** Works without libstdc{pp}\n* IEEE 754 compliant\n** Supports subnormal numbers, NaN, infinity, and signed zero\n* Supports a wide range of precisions\n** Half, float, double, quad, and octuple precisions\n** Returns correctly rounded results for arithmetic oprations, fma and\nsqrt\n** Returns 1-ulp accuracy results for other math.h functions\n*** All functions, including trigonometric functions, return\n1ulp-accuracy results for all input range\n* Portable\n** Compatible with Linux, Windows, microcontrollers, wasm, CUDA (version\n12 or later)\n** Constexpr functions can be called from CUDA devices with\n–expt-relaxed-constexpr compiler option\n* C/C{pp}11 API with libquadmath emulation\n** Most of libquadmath functions can be used on x86_64 clang and MSVC\n** 128-bit integer types can be used on MSVC\n** C{pp}11 FP and int classes with overloaded operators are provided\n*** C{pp}11 functions in TLFloat are not constexpr\n* Moderately optimized\n** Optimized for each architecture using intrinsics, etc.\n** Library design allows compilers to fully inline operations\n** All functions are thread-safe and reentrant\n* Implements most of the math.h functions\n** Arithmetic operations, comparison, cast operations\n** fma, sqrt, hypot, cbrt, fmod, remainder, remquo\n** sin, cos, tan, sinpi, cospi, tanpi\n** asin, acos, atan, atan2\n** log, log2, log10, log1p\n** exp, exp2, exp10, expm1, pow\n** sinh, cosh, tanh, asinh, acosh, atanh\n** erf, erfc, tgamma\n** trunc, floor, ceil, round, rint\n** fabs, copysign, fmax, fmin, fdim\n** ldexp, frexp, modf, nextafter\n** isnan, isinf, finite, fpclassify, signbit\n* Implements I/O functions\n** Conversion to/from C strings\n** printf-family functions\n* Provides BigInt template classes in addition to the FP classes\n** It provides operations for integers of artibrary length (2^N bits)\n** They can be used in the similar way to the ordinary int/uint types\n** Data formats are the same as ordinary int/uint\n** These classes are internally used to implement the FP classes in\nTLFloat\n\n=== How to build\n\n[arabic]\n. Check out the source code from our GitHub repository :\n`++git clone https://github.com/shibatch/tlfloat++`\n. Make a separate directory to create an out-of-source build :\n`++cd tlfloat \u0026\u0026 mkdir build \u0026\u0026 cd build++`\n. Run cmake to configure the project :\n`++cmake .. -DCMAKE_INSTALL_PREFIX=../../install++`\n. Run make to build and install the project : `make \u0026\u0026 make install`\n\n=== Compiling hello world example\n\nBelow is a simple C{pp} source code utilizing TLFloat.\n\n[source,c++]\n----\n#include \u003ciostream\u003e\n#include \u003ciomanip\u003e\n#include \u003ctlfloat/tlmath.hpp\u003e\n\nusing namespace tlfloat;\n\nOctuple machin() {\n  return 4 * (4 * atan(1 / Octuple(5)) - atan(1 / Octuple(239)));\n}\n\nint main(int argc, char **argv) {\n  std::cout \u003c\u003c std::setprecision(70) \u003c\u003c machin() \u003c\u003c std::endl;\n}\n----\n\nTo compile this source code, use the following command.\n\n[source,console]\n----\ng++ -std=c++20 -I./install/include hello.cpp\n----\n\nYou have to specify C{pp}20 standard. Note that you do not need to link\nany library in this example. This program computes PI in octuple\nprecision and shows it.\n\n[source,console]\n----\n$ ./a.out\n3.141592653589793238462643383279502884197169399375105820974944592307816\n----\n\n=== Libquadmath emulation\n\nIn gcc/g{pp} on x86_64 architecture, libquadmath provides math functions\nfor quadruple precision floating point numbers. However, libquadmath is\nnot available with clang or Visual Studio. By using the libquadmath\nemulation feature of TLFloat library, it is possible to use most of the\nfeatures of libquadmath with clang and Visual Studio.\n\nBelow is a simple C source code utilizing this feature.\n\n[source,c++]\n----\n#include \u003cstdio.h\u003e\n#include \u003cstdlib.h\u003e\n\n#define TLFLOAT_LIBQUADMATH_EMULATION\n#include \u003ctlfloat/tlfloat.h\u003e\n\nint main(int argc, char **argv) {\n  if (argc \u003c 3) exit(-1);\n\n  __float128 q1 = strtoflt128(argv[1], NULL);\n  __float128 q2 = strtoflt128(argv[2], NULL);\n\n  char str[256];\n  quadmath_snprintf(str, sizeof(str), \"%.30Qg\", powq(q1, q2));\n  puts(str);\n}\n----\n\nTo compile this source code, use the following command.\n\n[source,console]\n----\nclang quad.c -I./install/include -L./install/lib -ltlfloat -lm\n----\n\nBelow is an example of executing this program.\n\n[source,console]\n----\n$ ./a.out 1.234 2.345\n1.63732181977903314975233575019\n----\n\nIn order to use the libquadmath emulation feature, define\nTLFLOAT_LIBQUADMATH_EMULATION macro, include tlfloat/tlfloat.h instead\nof quadmath.h, and link with -ltlfloat -lm. If you need portability,\nreplace __float128 with tlfloat_quad.\n\n=== C++11 API\n\nBesides the C{pp}20 API, TLFloat provides classes that can be used with\nC{pp}11 standard.\n\nBelow is a simple C{pp} source code utilizing this feature.\n\n[source,c++]\n----\n#include \u003ciostream\u003e\n#include \u003ctlfloat/tlfloat.h\u003e\n\ntlfloat_octuple AGM(int N) {\n  tlfloat_octuple y = tlfloat_sqrto(2) - 1;\n  tlfloat_octuple a = y * y * 2;\n\n  for(int k=0;k\u003cN;k++) {\n    y = 1.0 - tlfloat_powo(y, 4);\n    y = tlfloat_powo(y, 1.0/4);\n    y = (1 - y) / (1 + y);\n    a *= tlfloat_powo(1 + y, 4);\n    a -= tlfloat_ldexpo(((y + 1) * y + 1) * y, 2 * k + 3);\n  }\n\n  return 1 / a;\n}\n\nint main(int argc, char **argv) {\n  std::cout \u003c\u003c tlfloat::to_string(AGM(3), 70) \u003c\u003c std::endl;\n}\n----\n\nTo compile this source code, use the following command.\n\n[source,console]\n----\ng++ cpp11.cpp -std=c++11 -I./install/include -L./install/lib -ltlfloat\n----\n\nBelow is an example of executing this program.\n\n[source,console]\n----\n$ ./a.out\n3.141592653589793238462643383279502884197169399375105820974944592307818\n----\n\n=== Benchmark results\n\nThis software package includes a benchmark tool. This can be built by\nspecifying `-DBUILD_BENCH=True` cmake option. Below are some results of\nthe benchmarks.\n\nCPU: AMD Ryzen 9 7950X (running at 4.5GHz)\n\nCompiler: gcc version 12.3.0 (Ubuntu 12.3.0-17ubuntu1)\n\nTLFloat Quad\n\n....\nTLFloat version      : 1.11.0\nConfig               : tlfloat quad\nMeasurement time     : 10 sec\nAddition             : 124.25 Mops/second\nMultiplication       : 102.486 Mops/second\nDivision             : 50.7983 Mops/second\nCast to/from double  : 168.078 Mops/second\nCompare              : 299.502 Mops/second\nFMA                  : 68.4656 Mops/second\nSquare root          : 15.9095 Mops/second\nRint                 : 191.323 Mops/second\nSin                  : 2.13261 Mops/second\nAtan                 : 1.30394 Mops/second\nExp                  : 1.49075 Mops/second\nLog                  : 1.58123 Mops/second\nPow                  : 0.88467 Mops/second\n....\n\nGNU libquadmath\n\n....\nTLFloat version      : 1.11.0\nConfig               : Libquadmath\nMeasurement time     : 10 sec\nAddition             : 87.7199 Mops/second\nMultiplication       : 78.2958 Mops/second\nDivision             : 72.6179 Mops/second\nCast to/from double  : 63.7482 Mops/second\nCompare              : 222.788 Mops/second\nFMA                  : 1.93365 Mops/second\nSquare root          : 9.50421 Mops/second\nRint                 : 37.6146 Mops/second\nSin                  : 1.77002 Mops/second\nAtan                 : 3.02089 Mops/second\nExp                  : 1.78973 Mops/second\nLog                  : 1.8479 Mops/second\nPow                  : 1.29873 Mops/second\n....\n\n=== Development plan\n\n* The following features will be added in future releases\n** Further documentation\n** Add C/C{pp}11 API for float16/bfloat16\n** Add support for conversion between string and float16/bfloat16\n** Remaining math functions in math.h\n*** Complex functions\n** More testing\n*** Add more testers for I/O functions\n** Further optimization\n\n=== License\n\nThe software is distributed under the Boost Software License, Version\n1.0. See accompanying file LICENSE.txt or copy at :\nhttp://www.boost.org/LICENSE_1_0.txt. Contributions to this project\nare accepted under the same license.\n\nThe fact that our software is released under an open source license\nonly means that you can use the current and older versions of the\nsoftware for free. If you want us to continue maintaining our\nsoftware, you need to financially support our project. Please see\nour https://github.com/shibatch/nofreelunch?tab=coc-ov-file[Code of\nConduct] or its https://youtu.be/35zFfdCuBII[introduction video].\n\nCopyright https://shibatch.github.io/[Naoki Shibata] and contributors\n2024-2025.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibatch%2Ftlfloat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshibatch%2Ftlfloat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibatch%2Ftlfloat/lists"}