{"id":16505558,"url":"https://github.com/davidbrochart/cpp-half","last_synced_at":"2026-05-25T16:01:44.398Z","repository":{"id":88093883,"uuid":"296044292","full_name":"davidbrochart/cpp-half","owner":"davidbrochart","description":"Half-precision floating-point library","archived":false,"fork":false,"pushed_at":"2020-09-16T15:22:58.000Z","size":47,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-02T00:47:43.596Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://half.sourceforge.net","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidbrochart.png","metadata":{"files":{"readme":"README.txt","changelog":"ChangeLog.txt","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-16T13:40:56.000Z","updated_at":"2020-09-16T15:22:45.000Z","dependencies_parsed_at":"2023-05-18T07:01:23.709Z","dependency_job_id":null,"html_url":"https://github.com/davidbrochart/cpp-half","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/davidbrochart/cpp-half","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbrochart%2Fcpp-half","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbrochart%2Fcpp-half/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbrochart%2Fcpp-half/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbrochart%2Fcpp-half/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidbrochart","download_url":"https://codeload.github.com/davidbrochart/cpp-half/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidbrochart%2Fcpp-half/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33482411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-25T14:31:05.219Z","status":"ssl_error","status_checked_at":"2026-05-25T14:31:02.878Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T15:12:24.818Z","updated_at":"2026-05-25T16:01:44.377Z","avatar_url":"https://github.com/davidbrochart.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"HALF-PRECISION FLOATING-POINT LIBRARY (Version 2.1.0)\r\n-----------------------------------------------------\r\n\r\nThis is a C++ header-only library to provide an IEEE 754 conformant 16-bit \r\nhalf-precision floating-point type along with corresponding arithmetic \r\noperators, type conversions and common mathematical functions. It aims for both \r\nefficiency and ease of use, trying to accurately mimic the behaviour of the \r\nbuilt-in floating-point types at the best performance possible.\r\n\r\n\r\nINSTALLATION AND REQUIREMENTS\r\n-----------------------------\r\n\r\nConveniently, the library consists of just a single header file containing all \r\nthe functionality, which can be directly included by your projects, without the \r\nneccessity to build anything or link to anything.\r\n\r\nWhereas this library is fully C++98-compatible, it can profit from certain \r\nC++11 features. Support for those features is checked automatically at compile \r\n(or rather preprocessing) time, but can be explicitly enabled or disabled by \r\npredefining the corresponding preprocessor symbols to either 1 or 0 yourself \r\nbefore including half.hpp. This is useful when the automatic detection fails \r\n(for more exotic implementations) or when a feature should be explicitly \r\ndisabled:\r\n\r\n  - 'long long' integer type for mathematical functions returning 'long long' \r\n    results (enabled for VC++ 2003 and icc 11.1 and newer, gcc and clang, \r\n    overridable with 'HALF_ENABLE_CPP11_LONG_LONG').\r\n\r\n  - Static assertions for extended compile-time checks (enabled for VC++ 2010, \r\n    gcc 4.3, clang 2.9, icc 11.1 and newer, overridable with \r\n    'HALF_ENABLE_CPP11_STATIC_ASSERT').\r\n\r\n  - Generalized constant expressions (enabled for VC++ 2015, gcc 4.6, clang 3.1, \r\n    icc 14.0 and newer, overridable with 'HALF_ENABLE_CPP11_CONSTEXPR').\r\n\r\n  - noexcept exception specifications (enabled for VC++ 2015, gcc 4.6, \r\n    clang 3.0, icc 14.0 and newer, overridable with 'HALF_ENABLE_CPP11_NOEXCEPT').\r\n\r\n  - User-defined literals for half-precision literals to work (enabled for \r\n    VC++ 2015, gcc 4.7, clang 3.1, icc 15.0 and newer, overridable with \r\n    'HALF_ENABLE_CPP11_USER_LITERALS').\r\n\r\n  - Thread-local storage for per-thread floating-point exception flags (enabled \r\n    for VC++ 2015, gcc 4.8, clang 3.3, icc 15.0 and newer, overridable with \r\n    'HALF_ENABLE_CPP11_THREAD_LOCAL').\r\n\r\n  - Type traits and template meta-programming features from \u003ctype_traits\u003e \r\n    (enabled for VC++ 2010, libstdc++ 4.3, libc++ and newer, overridable with \r\n    'HALF_ENABLE_CPP11_TYPE_TRAITS').\r\n\r\n  - Special integer types from \u003ccstdint\u003e (enabled for VC++ 2010, libstdc++ 4.3, \r\n    libc++ and newer, overridable with 'HALF_ENABLE_CPP11_CSTDINT').\r\n\r\n  - Certain C++11 single-precision mathematical functions from \u003ccmath\u003e for \r\n    floating-point classification during conversions from higher precision types \r\n    (enabled for VC++ 2013, libstdc++ 4.3, libc++ and newer, overridable with \r\n    'HALF_ENABLE_CPP11_CMATH').\r\n\r\n  - Floating-point environment control from \u003ccfenv\u003e for possible exception \r\n    propagation to the built-in floating-point platform (enabled for VC++ 2013, \r\n    libstdc++ 4.3, libc++ and newer, overridable with 'HALF_ENABLE_CPP11_CFENV').\r\n\r\n  - Hash functor 'std::hash' from \u003cfunctional\u003e (enabled for VC++ 2010, \r\n    libstdc++ 4.3, libc++ and newer, overridable with 'HALF_ENABLE_CPP11_HASH').\r\n\r\nThe library has been tested successfully with Visual C++ 2005-2015, gcc 4-8 \r\nand clang 3-8 on 32- and 64-bit x86 systems. Please contact me if you have any \r\nproblems, suggestions or even just success testing it on other platforms.\r\n\r\n\r\nDOCUMENTATION\r\n-------------\r\n\r\nWhat follows are some general words about the usage of the library and its \r\nimplementation. For a complete documentation of its interface consult the \r\ncorresponding website http://half.sourceforge.net. You may also generate the \r\ncomplete developer documentation from the library's only include file's doxygen \r\ncomments, but this is more relevant to developers rather than mere users.\r\n\r\nBASIC USAGE\r\n\r\nTo make use of the library just include its only header file half.hpp, which \r\ndefines all half-precision functionality inside the 'half_float' namespace. The \r\nactual 16-bit half-precision data type is represented by the 'half' type, which \r\nuses the standard IEEE representation with 1 sign bit, 5 exponent bits and 11 \r\nmantissa bits (including the hidden bit) and supports all types of special \r\nvalues, like subnormal values, infinity and NaNs. This type behaves like the \r\nbuilt-in floating-point types as much as possible, supporting the usual \r\narithmetic, comparison and streaming operators, which makes its use pretty \r\nstraight-forward:\r\n\r\n    using half_float::half;\r\n    half a(3.4), b(5);\r\n    half c = a * b;\r\n    c += 3;\r\n    if(c \u003e a)\r\n        std::cout \u003c\u003c c \u003c\u003c std::endl;\r\n\r\nAdditionally the 'half_float' namespace also defines half-precision versions \r\nfor all mathematical functions of the C++ standard library, which can be used \r\ndirectly through ADL:\r\n\r\n    half a(-3.14159);\r\n    half s = sin(abs(a));\r\n    long l = lround(s);\r\n\r\nYou may also specify explicit half-precision literals, since the library \r\nprovides a user-defined literal inside the 'half_float::literal' namespace, \r\nwhich you just need to import (assuming support for C++11 user-defined literals):\r\n\r\n    using namespace half_float::literal;\r\n    half x = 1.0_h;\r\n\r\nFurthermore the library provides proper specializations for \r\n'std::numeric_limits', defining various implementation properties, and \r\n'std::hash' for hashing half-precision numbers (assuming support for C++11 \r\n'std::hash'). Similar to the corresponding preprocessor symbols from \u003ccmath\u003e \r\nthe library also defines the 'HUGE_VALH' constant and maybe the 'FP_FAST_FMAH' \r\nsymbol.\r\n\r\nCONVERSIONS AND ROUNDING\r\n\r\nThe half is explicitly constructible/convertible from a single-precision float \r\nargument. Thus it is also explicitly constructible/convertible from any type \r\nimplicitly convertible to float, but constructing it from types like double or \r\nint will involve the usual warnings arising when implicitly converting those to \r\nfloat because of the lost precision. On the one hand those warnings are \r\nintentional, because converting those types to half neccessarily also reduces \r\nprecision. But on the other hand they are raised for explicit conversions from \r\nthose types, when the user knows what he is doing. So if those warnings keep \r\nbugging you, then you won't get around first explicitly converting to float \r\nbefore converting to half, or use the 'half_cast' described below. In addition \r\nyou can also directly assign float values to halfs.\r\n\r\nIn contrast to the float-to-half conversion, which reduces precision, the \r\nconversion from half to float (and thus to any other type implicitly \r\nconvertible from float) is implicit, because all values represetable with \r\nhalf-precision are also representable with single-precision. This way the \r\nhalf-to-float conversion behaves similar to the builtin float-to-double \r\nconversion and all arithmetic expressions involving both half-precision and \r\nsingle-precision arguments will be of single-precision type. This way you can \r\nalso directly use the mathematical functions of the C++ standard library, \r\nthough in this case you will invoke the single-precision versions which will \r\nalso return single-precision values, which is (even if maybe performing the \r\nexact same computation, see below) not as conceptually clean when working in a \r\nhalf-precision environment.\r\n\r\nThe default rounding mode for conversions between half and more precise types \r\nas well as for rounding results of arithmetic operations and mathematical \r\nfunctions rounds to the nearest representable value. But by predefining the \r\n'HALF_ROUND_STYLE' preprocessor symbol this default can be overridden with one \r\nof the other standard rounding modes using their respective constants or the \r\nequivalent values of 'std::float_round_style' (it can even be synchronized with \r\nthe built-in single-precision implementation by defining it to \r\n'std::numeric_limits\u003cfloat\u003e::round_style'):\r\n\r\n  - 'std::round_indeterminate' (-1) for the fastest rounding.\r\n\r\n  - 'std::round_toward_zero' (0) for rounding toward zero.\r\n\r\n  - 'std::round_to_nearest' (1) for rounding to the nearest value (default).\r\n\r\n  - 'std::round_toward_infinity' (2) for rounding toward positive infinity.\r\n\r\n  - 'std::round_toward_neg_infinity' (3) for rounding toward negative infinity.\r\n\r\nIn addition to changing the overall default rounding mode one can also use the \r\n'half_cast'. This converts between half and any built-in arithmetic type using \r\na configurable rounding mode (or the default rounding mode if none is \r\nspecified). In addition to a configurable rounding mode, 'half_cast' has \r\nanother big difference to a mere 'static_cast': Any conversions are performed \r\ndirectly using the given rounding mode, without any intermediate conversion \r\nto/from 'float'. This is especially relevant for conversions to integer types, \r\nwhich don't necessarily truncate anymore. But also for conversions from \r\n'double' or 'long double' this may produce more precise results than a \r\npre-conversion to 'float' using the single-precision implementation's current \r\nrounding mode would.\r\n\r\n    half a = half_cast\u003chalf\u003e(4.2);\r\n    half b = half_cast\u003chalf,std::numeric_limits\u003cfloat\u003e::round_style\u003e(4.2f);\r\n    assert( half_cast\u003cint, std::round_to_nearest\u003e( 0.7_h )     == 1 );\r\n    assert( half_cast\u003chalf,std::round_toward_zero\u003e( 4097 )     == 4096.0_h );\r\n    assert( half_cast\u003chalf,std::round_toward_infinity\u003e( 4097 ) == 4100.0_h );\r\n    assert( half_cast\u003chalf,std::round_toward_infinity\u003e( std::numeric_limits\u003cdouble\u003e::min() ) \u003e 0.0_h );\r\n\r\nACCURACY AND PERFORMANCE\r\n\r\nFrom version 2.0 onward the library is implemented without employing the \r\nunderlying floating-point implementation of the system (except for conversions, \r\nof course), providing an entirely self-contained half-precision implementation \r\nwith results independent from the system's existing single- or double-precision \r\nimplementation and its rounding behaviour.\r\n\r\nAs to accuracy, many of the operators and functions provided by this library \r\nare exact to rounding for all rounding modes, i.e. the error to the exact \r\nresult is at most 0.5 ULP (unit in the last place) for rounding to nearest and \r\nless than 1 ULP for all other rounding modes. This holds for all the operations \r\nrequired by the IEEE 754 standard and many more. Specifically the following \r\nfunctions might exhibit a deviation from the correctly rounded exact result by \r\n1 ULP for a select few input values: 'expm1', 'log1p', 'pow', 'atan2', 'erf', \r\n'erfc', 'lgamma', 'tgamma' (for more details see the documentation of the \r\nindividual functions). All other functions and operators are always exact to \r\nrounding or independent of the rounding mode altogether.\r\n\r\nThe increased IEEE-conformance and cleanliness of this implementation comes \r\nwith a certain performance cost compared to doing computations and mathematical \r\nfunctions in hardware-accelerated single-precision. On average and depending on \r\nthe platform, the arithemtic operators are about 75% as fast and the \r\nmathematical functions about 33-50% as fast as performing the corresponding \r\noperations in single-precision and converting between the inputs and outputs. \r\nHowever, directly computing with half-precision values is a rather rare \r\nuse-case and usually using actual 'float' values for all computations and \r\ntemproraries and using 'half's only for storage is the recommended way. But \r\nnevertheless the goal of this library was to provide a complete and \r\nconceptually clean IEEE-confromant half-precision implementation and in the few \r\ncases when you do need to compute directly in half-precision you do so for a \r\nreason and want accurate results.\r\n\r\nIf necessary, this internal implementation can be overridden by predefining the \r\n'HALF_ARITHMETIC_TYPE' preprocessor symbol to one of the built-in \r\nfloating-point types ('float', 'double' or 'long double'), which will cause the \r\nlibrary to use this type for computing arithmetic operations and mathematical \r\nfunctions (if available). However, due to using the platform's floating-point \r\nimplementation (and its rounding behaviour) internally, this might cause \r\nresults to deviate from the specified half-precision rounding mode. It will of \r\ncourse also inhibit the automatic exception detection described below.\r\n\r\nThe conversion operations between half-precision and single-precision types can \r\nalso make use of the F16C extension for x86 processors by using the \r\ncorresponding compiler intrinsics from \u003cimmintrin.h\u003e. Support for this is \r\nchecked at compile-time by looking for the '__F16C__' macro which at least gcc \r\nand clang define based on the target platform. It can also be enabled manually \r\nby predefining the 'HALF_ENABLE_F16C_INTRINSICS' preprocessor symbol to 1, or 0 \r\nfor explicitly disabling it. However, this will directly use the corresponding \r\nintrinsics for conversion without checking if they are available at runtime \r\n(possibly crashing if they are not), so make sure they are supported on the \r\ntarget platform before enabling this.\r\n\r\nEXCEPTION HANDLING\r\n\r\nThe half-precision implementation supports all 5 required floating-point \r\nexceptions from the IEEE standard to indicate erroneous inputs or inexact \r\nresults during operations. These are represented by exception flags which \r\nactually use the same values as the corresponding 'FE_...' flags defined in \r\nC++11's \u003ccfenv\u003e header if supported, specifically:\r\n\r\n  - 'FE_INVALID' for invalid inputs to an operation.\r\n  - 'FE_DIVBYZERO' for finite inputs producing infinite results.\r\n  - 'FE_OVERFLOW' if a result is too large to represent finitely.\r\n  - 'FE_UNDERFLOW' for a subnormal or zero result after rounding.\r\n  - 'FE_INEXACT' if a result needed rounding to be representable.\r\n  - 'FE_ALL_EXCEPT' as a convenient OR of all possible exception flags.\r\n\r\nThe internal exception flag state will start with all flags cleared and is \r\nmaintained per thread if C++11 thread-local storage is supported, otherwise it \r\nwill be maintained globally and will theoretically NOT be thread-safe (while \r\npractically being as thread-safe as a simple integer variable can be). These \r\nflags can be managed explicitly using the library's error handling functions, \r\nwhich again try to mimic the built-in functions for handling floating-point \r\nexceptions from \u003ccfenv\u003e. You can clear them with 'feclearexcept' (which is the \r\nonly way a flag can be cleared), test them with 'fetestexcept', explicitly \r\nraise errors with 'feraiseexcept' and save and restore their state using \r\n'fegetexceptflag' and 'fesetexceptflag'. You can also throw corresponding C++ \r\nexceptions based on the current flag state using 'fethrowexcept'.\r\n\r\nHowever, any automatic exception detection and handling during half-precision \r\noperations and functions is DISABLED by default, since it comes with a minor \r\nperformance overhead due to runtime checks, and reacting to IEEE floating-point \r\nexceptions is rarely ever needed in application code. But the library fully \r\nsupports IEEE-conformant detection of floating-point exceptions and various \r\nways for handling them, which can be enabled by pre-defining the corresponding \r\npreprocessor symbols to 1. They can be enabled individually or all at once and \r\nthey will be processed in the order they are listed here:\r\n\r\n  - 'HALF_ERRHANDLING_FLAGS' sets the internal exception flags described above \r\n    whenever the corresponding exception occurs.\r\n  - 'HALF_ERRHANDLING_ERRNO' sets the value of 'errno' from \u003ccerrno\u003e similar to \r\n    the behaviour of the built-in floating-point types when 'MATH_ERRNO' is used.\r\n  - 'HALF_ERRHANDLING_FENV' will propagate exceptions to the built-in \r\n    floating-point implementation using 'std::feraiseexcept' if support for \r\n    C++11 floating-point control is enabled. However, this does not synchronize \r\n    exceptions: neither will clearing  propagate nor will it work in reverse.\r\n  - 'HALF_ERRHANDLING_THROW_...' can be defined to a string literal which will \r\n    be used as description message for a C++ exception that is thrown whenever \r\n    a 'FE_...' exception occurs, similar to the behaviour of 'fethrowexcept'.\r\n\r\nIf any of the above error handling is activated, non-quiet operations on \r\nhalf-precision values will also raise a 'FE_INVALID' exception whenever \r\nthey encounter a signaling NaN value, in addition to transforming the value \r\ninto a quiet NaN. If error handling is disabled, signaling NaNs will be \r\ntreated like quiet NaNs (while still getting explicitly quieted if propagated \r\nto the result). There can also be additional treatment of overflow and \r\nunderflow errors after they have been processed as above, which is ENABLED by \r\ndefault (but of course only takes effect if any other exception handling is \r\nactivated) unless overridden by pre-defining the corresponding preprocessor \r\nsymbol to 0:\r\n\r\n  - 'HALF_ERRHANDLING_OVERFLOW_TO_INEXACT' will cause overflow errors to also \r\n    raise a 'FE_INEXACT' exception.\r\n  - 'HALF_ERRHANDLING_UNDERFLOW_TO_INEXACT' will cause underflow errors to also \r\n    raise a 'FE_INEXACT' exception. This will also slightly change the \r\n    behaviour of the underflow exception, which will ONLY be raised if the \r\n    result is actually inexact due to underflow. If this is disabled, underflow \r\n    exceptions will be raised for ANY (possibly exact) subnormal result.\r\n\r\n\r\nCREDITS AND CONTACT\r\n-------------------\r\n\r\nThis library is developed by CHRISTIAN RAU and released under the MIT License \r\n(see LICENSE.txt). If you have any questions or problems with it, feel free to \r\ncontact me at rauy@users.sourceforge.net.\r\n\r\nAdditional credit goes to JEROEN VAN DER ZIJP for his paper on \"Fast Half Float \r\nConversions\", whose algorithms have been used in the library for converting \r\nbetween half-precision and single-precision values.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidbrochart%2Fcpp-half","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidbrochart%2Fcpp-half","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidbrochart%2Fcpp-half/lists"}