{"id":13591034,"url":"https://github.com/HdrHistogram/HdrHistogram","last_synced_at":"2025-04-08T14:32:21.142Z","repository":{"id":4614066,"uuid":"5757735","full_name":"HdrHistogram/HdrHistogram","owner":"HdrHistogram","description":"A High Dynamic Range (HDR) Histogram","archived":false,"fork":false,"pushed_at":"2024-05-28T17:06:19.000Z","size":5107,"stargazers_count":2142,"open_issues_count":26,"forks_count":251,"subscribers_count":118,"default_branch":"master","last_synced_at":"2024-05-29T07:08:06.525Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://hdrhistogram.github.io/HdrHistogram/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HdrHistogram.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-09-11T00:14:02.000Z","updated_at":"2024-05-30T08:25:45.512Z","dependencies_parsed_at":"2024-01-05T20:45:54.220Z","dependency_job_id":"07cc3cd1-a97a-41a0-8e0d-726a6b32ed4b","html_url":"https://github.com/HdrHistogram/HdrHistogram","commit_stats":{"total_commits":678,"total_committers":46,"mean_commits":14.73913043478261,"dds":0.3761061946902655,"last_synced_commit":"7b0edce258c0847387e3ed532057556b1cc6bd9d"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HdrHistogram%2FHdrHistogram","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HdrHistogram%2FHdrHistogram/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HdrHistogram%2FHdrHistogram/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HdrHistogram%2FHdrHistogram/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HdrHistogram","download_url":"https://codeload.github.com/HdrHistogram/HdrHistogram/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247796160,"owners_count":20997522,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:00:52.940Z","updated_at":"2025-04-08T14:32:21.122Z","avatar_url":"https://github.com/HdrHistogram.png","language":"Java","readme":"\u003ca href=\"https://foojay.io/works-with-openjdk\"\u003e\u003cimg align=\"right\" src=\"https://github.com/foojayio/badges/raw/main/works_with_openjdk/Works-with-OpenJDK.png\" width=\"100\"\u003e\u003c/a\u003e\n\nHdrHistogram\n----------------------------------------------\n[![Gitter](https://img.shields.io/gitter/room/gitterHQ/gitter.svg)](https://gitter.im/HdrHistogram/HdrHistogram?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n[![Java CI](https://github.com/hdrhistogram/hdrhistogram/workflows/Java%20CI/badge.svg)](https://github.com/hdrhistogram/hdrhistogram/actions)\n[![Javadocs](http://www.javadoc.io/badge/org.hdrhistogram/HdrHistogram.svg)](http://www.javadoc.io/doc/org.hdrhistogram/HdrHistogram)\n----------------------------------------------------------------------------\nHdrHistogram: A High Dynamic Range (HDR) Histogram\n\nThis repository currently includes a Java implementation of\nHdrHistogram. C, C#/.NET, Python, Javascript, Rust, Erlang, and Go ports\ncan be found in other repositories. All of which share common concepts\nand data representation capabilities. Look at repositories under the\n[HdrHistogram organization](https://github.com/HdrHistogram) for various\nimplementations and useful tools.\n\nNote: The below is an excerpt from a Histogram JavaDoc. While much\nof it generally applies to other language implementations as well,\nsome details may vary by implementation (e.g. iteration and\nsynchronization), so you should consult the documentation or header\ninformation of the specific API library you intend to use.\n\n----------------------------------------------\n\nHdrHistogram supports the recording and analyzing of sampled data value\ncounts across a configurable integer value range with configurable value\nprecision within the range. Value precision is expressed as the number of\nsignificant digits in the value recording, and provides control over value\nquantization behavior across the value range and the subsequent value\nresolution at any given level.\n\nFor example, a Histogram could be configured to track the counts of\nobserved integer values between 0 and 3,600,000,000 while maintaining a\nvalue precision of 3 significant digits across that range. Value\nquantization within the range will thus be no larger than 1/1,000th\n(or 0.1%) of any value. This example Histogram could be used to track and\nanalyze the counts of observed response times ranging between 1 microsecond\nand 1 hour in magnitude, while maintaining a value resolution of 1\nmicrosecond up to 1 millisecond, a resolution of 1 millisecond (or better)\nup to one second, and a resolution of 1 second (or better) up to 1,000\nseconds. At its maximum tracked value (1 hour), it would still maintain a\nresolution of 3.6 seconds (or better).\n\nThe HdrHistogram package includes the Histogram implementation, which tracks\nvalue counts in long fields, and is expected to be the commonly used\nHistogram form. IntHistogram and ShortHistogram, which track value counts in\nint and short fields respectively, are provided for use cases where smaller\ncount ranges are practical and smaller overall storage is beneficial.\n\nHdrHistogram is designed for recording histograms of value measurements in\nlatency and performance sensitive applications. Measurements show value\nrecording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs.\nAbstractHistogram maintains a fixed cost in both space and time. A\nHistogram's memory footprint is constant, with no allocation operations\ninvolved in recording data values or in iterating through them. The memory\nfootprint is fixed regardless of the number of data value samples recorded,\nand depends solely on the dynamic range and precision chosen. The amount of\nwork involved in recording a sample is constant, and directly computes\nstorage index locations such that no iteration or searching is ever involved\nin recording data values.\n\nA combination of high dynamic range and precision is useful for collection\nand accurate post-recording analysis of sampled value data distribution in\nvarious forms. Whether it's calculating or plotting arbitrary percentiles,\niterating through and summarizing values in various ways, or deriving mean\nand standard deviation values, the fact that the recorded data information\nis kept in high resolution allows for accurate post-recording analysis with\nlow [and ultimately configurable] loss in accuracy when compared to\nperforming the same analysis directly on the potentially infinite series of\nsourced data values samples.\n\nA common use example of HdrHistogram would be to record response times\nin units of microseconds across a dynamic range stretching from 1 usec to\nover an hour, with a good enough resolution to support later performing\npost-recording analysis on the collected data. Analysis can include\ncomputing, examining, and reporting of distribution by percentiles, linear\nor logarithmic value buckets, mean and standard deviation, or by any other\nmeans that can be easily added by using the various iteration techniques\nsupported by the Histogram.\nIn order to facilitate the accuracy needed for various post-recording\nanalysis techniques, this example can maintain a resolution of ~1 usec\nor better for times ranging to ~2 msec in magnitude, while at the same time\nmaintaining a resolution of ~1 msec or better for times ranging to ~2 sec,\nand a resolution of ~1 second or better for values up to 2,000 seconds.\nThis sort of example resolution can be thought of as \"always accurate to 3\ndecimal points.\" Such an example Histogram would simply be created with a\nhighestTrackableValue of 3,600,000,000, and a numberOfSignificantValueDigits\nof 3, and would occupy a fixed, unchanging memory footprint of around 185KB\n(see \"Footprint estimation\" below).\n\n\nHistogram variants and internal representation\n----------------------------------------------\n\nThe HdrHistogram package includes multiple implementations of the\n`AbstractHistogram` class:\n- `Histogram`, which is the commonly used Histogram form and tracks\n  value counts in long fields.\n- `IntHistogram` and `ShortHistogram`, which track value counts in int\n  and short fields respectively, are provided for use cases where\n  smaller count ranges are practical and smaller overall storage\n  is beneficial (e.g. systems where tens of thousands of in-memory\n  histogram are being tracked).\n- `AtomicHistogram` and `SynchronizedHistogram` (see 'Synchronization \n  and concurrent access' below)\n\nInternally, data in HdrHistogram variants is maintained using a concept\nsomewhat similar to that of floating point number representation: Using an \nexponent a (non-normalized) mantissa to support a wide dynamic range at\na high but varying (by exponent value) resolution. AbstractHistogram uses\nexponentially increasing bucket value ranges (the parallel of the exponent\nportion of a floating point number) with each bucket containing a fixed\nnumber (per bucket) set of linear sub-buckets (the parallel of a non-normalized\nmantissa portion of a floating point number). Both dynamic range and resolution\nare configurable, with highestTrackableValue controlling dynamic range, and\nnumberOfSignificantValueDigits controlling resolution.\n\nSynchronization and concurrent access\n----------------------------------------------\n\nIn the interest of keeping value recording cost to a minimum, the commonly\nused Histogram class and it's IntHistogram and ShortHistogram variants are\nNOT internally synchronized, and do NOT use atomic variables. Callers\nwishing to make potentially concurrent, multi-threaded updates or queries\nagainst Histogram objects should either take care to externally synchronize\nand/or order their access, or use the ConcurrentHistogram, AtomicHistogram,\nor SynchronizedHistogram or variants.\n\nA common pattern seen in histogram value recording involves recording values in\na critical path (multi-threaded or not), coupled with a non-critical path\nreading the recorded data for summary/reporting purposes. When such continuous \nnon-blocking recording operation (concurrent or not) is desired even when\nsampling, analyzing, or reporting operations are needed, consider using\nthe Recorder and SingleWriterRecorder recorder variants that were specifically\ndesigned for that purpose. Recorders provide a recording API similar to\nHistogram, and internally maintain and coordinate active/inactive histograms\nsuch that recording remains wait-free in the presence of accurate and stable\ninterval sampling.\n\nIt is worth mentioning that since Histogram objects are additive, it is\ncommon practice to use per-thread non-synchronized histograms or\nSingleWriterRecorders, and use a summary/reporting thread to perform\nhistogram aggregation math across time and/or threads.  \n\n\nIteration\n----------------------------------------------\n\nHistograms support multiple convenient forms of iterating through the\nhistogram data set, including linear, logarithmic, and percentile iteration\nmechanisms, as well as means for iterating through each recorded value or\neach possible value level. The iteration mechanisms are accessible through\nthe HistogramData available through `getHistogramData()`.\nIteration mechanisms all provide HistogramIterationValue data points along\nthe histogram's iterated data set, and are available for the default\n(corrected) histogram data set via the following HistogramData methods:\n\n - `percentiles`: An `Iterable\u003cHistogramIterationValue\u003e` through the histogram\n                using a PercentileIterator\n - `linearBucketValues`: An `Iterable\u003cHistogramIterationValue\u003e` through the\n                histogram using a LinearIterator\n - `logarithmicBucketValues`: An `Iterable\u003cHistogramIterationValue\u003e` through\n                the histogram using a LogarithmicIterator\n - `recordedValues`: An `Iterable\u003cHistogramIterationValue\u003e` through the\n                histogram using a RecordedValuesIterator\n - `allValues`: An `Iterable\u003cHistogramIterationValue\u003e` through the histogram\n                using a AllValuesIterator\n\nIteration is typically done with a for-each loop statement. E.g.:\n\n``` java\n for (HistogramIterationValue v :\n      histogram.getHistogramData().percentiles(ticksPerHalfDistance)) {\n     ...\n }\n```\n\n or\n\n``` java\n for (HistogramIterationValue v :\n      histogram.getRawHistogramData().linearBucketValues(unitsPerBucket)) {\n     ...\n }\n```\n\nThe iterators associated with each iteration method are resettable, such\nthat a caller that would like to avoid allocating a new iterator object for\neach iteration loop can re-use an iterator to repeatedly iterate through\nthe histogram. This iterator re-use usually takes the form of a traditional\nfor loop using the Iterator's `hasNext()` and `next()` methods.\n\nSo to avoid allocating a new iterator object for each iteration loop:\n\n``` java\n PercentileIterator iter =\n    histogram.getHistogramData().percentiles().iterator(ticksPerHalfDistance);\n ...\n iter.reset(percentileTicksPerHalfDistance);\n for (iter.hasNext() {\n     HistogramIterationValue v = iter.next();\n     ...\n }\n```\n\nEquivalent Values and value ranges\n----------------------------------------------\n\nDue to the finite (and configurable) resolution of the histogram, multiple\nadjacent integer data values can be \"equivalent\". Two values are considered\n\"equivalent\" if samples recorded for both are always counted in a common\ntotal count due to the histogram's resolution level. HdrHistogram provides\nmethods for determining the lowest and highest equivalent values for any\ngiven value, as well as determining whether two values are equivalent, and\nfor finding the next non-equivalent value for a given value (useful when\nlooping through values, in order to avoid a double-counting count).\n\nCorrected vs. Raw value recording calls\n----------------------------------------------\n\nIn order to support a common use case needed when histogram values are used\nto track response time distribution, Histogram provides for the recording\nof corrected histogram value by supporting a `recordValueWithExpectedInterval()`\nvariant is provided. This value recording form is useful in [common latency\nmeasurement] scenarios where response times may exceed the expected interval\nbetween issuing requests, leading to \"dropped\" response time measurements\nthat would typically correlate with \"bad\" results.\n\nWhen a value recorded in the histogram exceeds the\nexpectedIntervalBetweenValueSamples parameter, recorded histogram data will\nreflect an appropriate number of additional values, linearly decreasing in\nsteps of expectedIntervalBetweenValueSamples, down to the last value that\nwould still be higher than expectedIntervalBetweenValueSamples.\n\nTo illustrate why this corrective behavior is critically needed in order\nto accurately represent value distribution when large value measurements\nmay lead to missed samples, imagine a system for which response times\nsamples are taken once every 10 msec to characterize response time\ndistribution. The hypothetical system behaves \"perfectly\" for 100 seconds\n(10,000 recorded samples), with each sample showing a 1msec response time\nvalue. At each sample for 100 seconds (10,000 logged samples at 1 msec\neach). The hypothetical system then encounters a 100 sec pause during which\nonly a single sample is recorded (with a 100 second value).\nThe raw data histogram collected for such a hypothetical system (over the\n200 second scenario above) would show ~99.99% of results at 1 msec or below,\nwhich is obviously \"not right\". The same histogram, corrected with the\nknowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly\nrepresent the response time distribution. Only ~50% of results will be at\n1 msec or below, with the remaining 50% coming from the auto-generated value\nrecords covering the missing increments spread between 10msec and 100 sec.\n\nData sets recorded with and without an expectedIntervalBetweenValueSamples\nparameter will differ only if at least one value recorded with the recordValue\nmethod was greater than its associated expectedIntervalBetweenValueSamples\nparameter.\nData sets recorded with an expectedIntervalBetweenValueSamples parameter will\nbe identical to ones recorded without it if all values recorded via the\nrecordValue calls were smaller than their associated (and optional)\nexpectedIntervalBetweenValueSamples parameters.\n\nWhen used for response time characterization, the recording with the optional\nexpectedIntervalBetweenValueSamples parameter will tend to produce data sets\nthat would much more accurately reflect the response time distribution that a\nrandom, uncoordinated request would have experienced.\n\nFootprint estimation\n----------------------------------------------\n\nDue to its dynamic range representation, Histogram is relatively efficient\nin memory space requirements given the accuracy and dynamic range it covers.\nStill, it is useful to be able to estimate the memory footprint involved\nfor a given highestTrackableValue and numberOfSignificantValueDigits\ncombination. Beyond a relatively small fixed-size footprint used for internal\nfields and stats (which can be estimated as \"fixed at well less than 1KB\"),\nthe bulk of a Histogram's storage is taken up by its data value recording\ncounts array. The total footprint can be conservatively estimated by:\n\n``` java\n largestValueWithSingleUnitResolution =\n        2 * (10 ^ numberOfSignificantValueDigits);\n subBucketSize =\n        roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);\n\n expectedHistogramFootprintInBytes = 512 +\n      ({primitive type size} / 2) *\n      (log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *\n      subBucketSize\n```\n\nA conservative (high) estimate of a Histogram's footprint in bytes is\navailable via the `getEstimatedFootprintInBytes()` method.\n","funding_links":[],"categories":["Java","II. Databases, search engines, big data and machine learning","Utility","Performance Testing"],"sub_categories":["6. Working with messy data","Other","Results Analysis \u0026 Reporting"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHdrHistogram%2FHdrHistogram","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHdrHistogram%2FHdrHistogram","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHdrHistogram%2FHdrHistogram/lists"}