Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/isarn/isarn-sketches
Sketching data structures for scala, including t-digest
https://github.com/isarn/isarn-sketches
algebird data-sketching numeric probability-distribution scala sketching t-digest
Last synced: 3 months ago
JSON representation
Sketching data structures for scala, including t-digest
- Host: GitHub
- URL: https://github.com/isarn/isarn-sketches
- Owner: isarn
- License: apache-2.0
- Created: 2016-11-30T23:10:02.000Z (about 8 years ago)
- Default Branch: develop
- Last Pushed: 2021-09-07T21:07:09.000Z (over 3 years ago)
- Last Synced: 2024-06-11T20:20:40.718Z (8 months ago)
- Topics: algebird, data-sketching, numeric, probability-distribution, scala, sketching, t-digest
- Language: Scala
- Homepage:
- Size: 1.32 MB
- Stars: 14
- Watchers: 4
- Forks: 5
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# isarn-sketches
Sketching data structures### API documentation
- https://isarn.github.io/isarn-sketches/scala/api/
- https://isarn.github.io/isarn-sketches/java/api/### Compatibility
isarn-sketches can operate with [Algebird](https://twitter.github.io/algebird/) via the
[isarn-sketches-algebird-api](https://github.com/isarn/isarn-sketches-algebird-api)isarn-sketches can also operate with [Apache Spark](https://github.com/apache/spark) via the [isarn-sketches-spark](https://github.com/isarn/isarn-sketches-spark) library
### How to use in your project
``` scala
// isarn-sketches
libraryDependencies += "org.isarnproject" %% "isarn-sketches" % "0.3.0"// isarn-sketches-java
libraryDependencies += "org.isarnproject" % "isarn-sketches-java" % "0.3.0"
```### t-digest
``` scala
scala> import org.isarnproject.sketches.TDigest
import org.isarnproject.sketches.TDigestscala> val data = Vector.fill(10000) { scala.util.Random.nextGaussian() }
data: scala.collection.immutable.Vector[Double] = Vector(1.6046163970051968, 0.44151418924289004, ...scala> val sketch = TDigest.sketch(data)
sketch: org.isarnproject.sketches.TDigest = TDigest(0.5,0,74,TDigestMap(-3.819069044174932 -> (1.0, 1.0), ...scala> sketch.cdf(0)
res0: Double = 0.4984362744530557scala> sketch.cdfInverse(0.5)
res1: Double = 0.0038481195948969205
```#### t-digest resources
* Original paper: [Computing Extremely Accurate Quantiles Using t-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)
* Video Talk: [Sketching Data with T-Digest In Apache Spark](https://youtu.be/ETUYhEZRtWE)