{"id":23508806,"url":"https://github.com/lovasko/aggstat","last_synced_at":"2025-05-13T15:35:09.906Z","repository":{"id":78820626,"uuid":"213492519","full_name":"lovasko/aggstat","owner":"lovasko","description":"Constant Time and Space Implementation of Aggregate Functions in C99","archived":false,"fork":false,"pushed_at":"2021-06-10T23:57:38.000Z","size":151,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-25T11:32:48.175Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lovasko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-07T21:43:20.000Z","updated_at":"2022-06-21T17:53:26.000Z","dependencies_parsed_at":"2023-05-03T14:48:07.615Z","dependency_job_id":null,"html_url":"https://github.com/lovasko/aggstat","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovasko%2Faggstat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovasko%2Faggstat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovasko%2Faggstat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovasko%2Faggstat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lovasko","download_url":"https://codeload.github.com/lovasko/aggstat/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239188697,"owners_count":19597032,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-25T11:32:14.322Z","updated_at":"2025-02-16T19:44:35.226Z","avatar_url":"https://github.com/lovasko.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# libagg\nThe `libagg` is a C99 library that implements a set of algorithms that provide statistical\naggregation of a stream of floating-point numerical values. All aggregate functions are provided in\ntwo separate implementations: _static_ and _streaming_.\n\nThe two key properties achieved by the streaming implementations relate to the fact that\nthe full set of values does not need to be known ahead, enabling:\n * large data sets to be processed in constant amount of memory\n * analysis of data sets of unknown length (e.g. user input, non-deterministic behaviour of a\n   system)\n\nThese abilities come at a cost: the precision of the streaming implementation is reduced compared to\nthe static implementation. This trade-off is closely monitored by the test suite to document and\nempirically prove the magnitude of the error for each aggregate function.\n\n## Aggregate Functions\nThe module provides the following aggregate functions:\n * first value\n * last value\n * count\n * sum\n * minimum\n * maximum\n * average\n * variance\n * standard deviation\n * skewness\n * kurtosis\n * p-quantile\n * median\n\n## API\n### Functions\nThe streaming part of the library consists of the following three functions:\n * `agg_new` to initialize or reset the state\n * `agg_put` to update the statistical aggregate estimate\n * `agg_get` to obtain the statistical aggregate estimate\n\nThe static part of the library consists only of one function:\n * `agg_run` to calculate the statistical aggregate\n\n### Types\nThe streaming part of the library consists only of one type:\n  * `struct agg` which keeps track of state and should be treated as an opaque structure\n\nThe static part of the library does not use any custom types.\n\n### Constants\nThe following constants are used to identify the aggregate functions by both parts:\n * `AGG_FNC_FST` for first value\n * `AGG_FNC_LST` for last value\n * `AGG_FNC_CNT` for count\n * `AGG_FNC_SUM` for sum\n * `AGG_FNC_MIN` for minimum\n * `AGG_FNC_MAX` for maximum\n * `AGG_FNC_AVG` for average\n * `AGG_FNC_VAR` for variance\n * `AGG_FNC_DEV` for standard deviation\n * `AGG_FNC_SKW` for skewness\n * `AGG_FNC_KRT` for kurtosis\n * `AGG_FNC_QTL` for p-quantile\n * `AGG_FNC_MED` for median\n\n## Examples\nThe following snippet computes the 99th percentile of values in an stream whilst retrieving numbers\nfrom the hypothetical stream by calling the `get_number` function.\n```c\nstruct aggstat agg;\ndouble         num;\ndouble         p99;\nbool           ret;\nuint8_t        idx;\n\nagg_new(\u0026agg, AGG_FNC_QTL, 0.99);\n\nfor (idx = 0; idx \u003c 32; idx += 1) {\n  num = get_number();\n  agg_put(\u0026agg, num);\n}\n\nret = agg_get(\u0026agg, \u0026p99);\n```\n\nA similar computation can be performed using the static algorithm that expects the full stream\nknowledge ahead:\n```c\ndouble  num[32];\ndouble  var;\nbool    ret;\nuint8_t idx;\n\nfor (idx = 0; idx \u003c 32; idx += 1) {\n  num[idx] = get_number();\n}\n\nret = agg_run(\u0026p99, num, 32, AGG_FNC_QTL, 0.99);\n```\n\n## Floating-Point Types\nThe library uses the `double` type by default, as it is the de-facto standard floating-point type\nin the C99 language. This is evidenced by the fact that all mathematical functions for `float` and\n`long double` are differentiated by a suffix, e.g. `sin`, `sinf`, and `sinl`. Moreover, the\nnumerical literals in the language are `double` by default too.\n\nIn order to use other floating point types instead, the `AGG_BIT` macro with the appropriate bit width\nhas to be defined before the `agg.h` header file is included (and thus attended to by the pre-processor).\nThis creates a trade-off, where the precision of all functions The switch is not dynamic and has to\nbe done in during the compilation of the source code. The following table lists currently supported\nfloating point types and their respective values recognised by `AGG_BIT`:\n\n| Type         | `AGG_BIT` |\n|--------------|-----------|\n| `float`      | 32        |\n| `double`     | 64        |\n| `__float128` | 128       |\n\n## Testing\nThe library has a particular trade-off at its heart: it sacrifices the precision of the\ncomputations in order to provide the streaming capabilities of the aggregate functions. With the\nexplicit goal of keeping this trade-off in check, a suite of tests was introduced that compare the\nstreaming variants of functions to their non-streaming static versions and compute the difference\nbetween the two. Furthermore, each aggregate function specifies its accepted magnitude of error\nfor a given number of incoming values. The magnitudes are always the upper bound, rounded to the\ncloses power of ten - both positive and negative.\n\nThe precise values can be found in the [ERROR.md](ERROR.md) file.\n\n## Memory Usage\nThe library does not dynamically allocate any memory and thus all aggregations are performed in a\nconstant amount of statically allocated memory on the stack. Based on the chosen floating-point\ntype - `double` or `float` -  the core type `struct agg` takes up 92 and 136 bytes, respectively.\n\n## Performance\nVast majority of the code is branchless and hand-optimized for performance. The test suite measures\nthe average execution time per a single value, which tends to be in the order of nanoseconds.\n\nThe measurements show stable performance with almost no variance, which makes the library suitable\nfor use in low-latency scenarios.\n\n## Note on Optimizations\nAll major C99 compilers offer multiple optimization levels, some of which might sacrifice the\ncorrectness of the computation in order to achieve better performance. The `-ffast-math` option,\nwhich is part of the `-Ofast` optimization level, causes a number of changes to the behaviour of the\nfloating-point computations. This in turn causes slight divergence in the numerical precision of\nthe algorithms in questions. The error testing takes this into account and monitors the skew\nappropriately.\n\nThe [ERROR.md](ERROR.md) file contains the columns `double fast` and `float fast` that represent\nthe `-Ofast` compilation option.\n\n## Note on Randomness\nAll tests use a very weak source of randomness: a simple linear congruential generator that is not\nto be used in a serious production setting where either cryptographic safety or perfect\ndistribution are of the essence.\n\nThe reason for depending on the weak generator is that of dependencies of the module: one of the\ndesign goals of the module is to be extremely light. The standard `rand` function was not used in\norder to silence static analysis warnings.\n\n## Standards\nBoth the library and test suite are written in standards-compliant C99. The provided source code\nought to compile on all standard compilers without any warnings. Reports of any compiler or static\nanalysis warnings is encourage and will be addressed.\n\n## Future Work\nThe following areas of focus are not addressed by the library at this time:\n  * support for the `long double` time\n  * inter-quartile range aggregate function\n  * headless mode where the function type is not stored as part of `struct agg`\n  * ability to select an integer type size for the count variables\n\n## License\nThe module is licensed under the 2-clause BSD license (see LICENSE file for more information). In\ncase you need a different license, feel free to contact the author.\n\n## Author\nDaniel Lovasko \u003cdaniel.lovasko@gmail.com\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovasko%2Faggstat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flovasko%2Faggstat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovasko%2Faggstat/lists"}