{"id":18012074,"url":"https://github.com/burmanm/gorilla-tsc","last_synced_at":"2025-05-16T18:09:50.045Z","repository":{"id":47341926,"uuid":"64661642","full_name":"burmanm/gorilla-tsc","owner":"burmanm","description":"Implementation of time series compression method from the Facebook's Gorilla paper","archived":false,"fork":false,"pushed_at":"2021-09-03T06:39:48.000Z","size":128,"stargazers_count":211,"open_issues_count":11,"forks_count":38,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-12T17:46:06.665Z","etag":null,"topics":["compression","java","time-series","timeseries"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/burmanm.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-08-01T11:31:20.000Z","updated_at":"2025-01-26T11:25:30.000Z","dependencies_parsed_at":"2022-08-27T23:10:12.175Z","dependency_job_id":null,"html_url":"https://github.com/burmanm/gorilla-tsc","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/burmanm%2Fgorilla-tsc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/burmanm%2Fgorilla-tsc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/burmanm%2Fgorilla-tsc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/burmanm%2Fgorilla-tsc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/burmanm","download_url":"https://codeload.github.com/burmanm/gorilla-tsc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254582907,"owners_count":22095518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","java","time-series","timeseries"],"created_at":"2024-10-30T03:14:16.633Z","updated_at":"2025-05-16T18:09:50.022Z","avatar_url":"https://github.com/burmanm.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"= Time series compression library, based on the Facebook's Gorilla paper\n:source-language: java\n\nifdef::env-github[]\n[link=https://travis-ci.org/burmanm/gorilla-tsc]\nimage::https://travis-ci.org/burmanm/gorilla-tsc.svg?branch=master[Build Status,70,18]\n[link=https://maven-badges.herokuapp.com/maven-central/fi.iki.yak/compression-gorilla]\nimage::https://img.shields.io/maven-central/v/fi.iki.yak/compression-gorilla.svg[Maven central]\nendif::[]\n\n== Introduction\n\nThis is Java based implementation of the compression methods described in the paper link:http://www.vldb.org/pvldb/vol8/p1816-teller.pdf[\"Gorilla: A Fast, Scalable, In-Memory Time Series Database\"]. For explanation on how the compression methods work, read the excellent paper.\n\nIn comparison to the original paper, this implementation allows using both integer values (`long`) as well as\nfloating point values (`double`), both 64 bit in length.\n\nVersions 1.x and 2.x are not compatible with each other due to small differences to the stored array. Versions 2.x\nwill support reading and storing older format also, see usage for more details.\n\n== Usage\n\nThe included tests are a good source for examples.\n\n=== Maven\n\n[source, xml]\n----\n    \u003cdependency\u003e\n        \u003cgroupId\u003efi.iki.yak\u003c/groupId\u003e\n        \u003cartifactId\u003ecompression-gorilla\u003c/artifactId\u003e\n    \u003c/dependency\u003e\n----\n\nYou can find latest version from the maven logo link above.\n\n=== Compressing\n\nTo compress in the older 1.x format, use class ``Compressor``. For 2.x, use ``GorillaCompressor`` (recommended).\n``LongArrayOutput`` is also recommended compared to ``ByteBufferBitOutput`` because of performance. One can supply\nalternative predictor to the ``GorillaCompressor`` if required. One such implementation is included,\n``DifferentialFCM`` that provides better compression ratio for some data patterns.\n\n[source, java]\n----\nlong now = LocalDateTime.now(ZoneOffset.UTC).truncatedTo(ChronoUnit.HOURS)\n        .toInstant(ZoneOffset.UTC).toEpochMilli();\n\nLongArrayOutput output = new LongArrayOutput();\nGorillaCompressor c = new GorillaCompressor(now, output);\n----\n\nCompression class requires a block timestamp and an implementation of `BitOutput` interface.\n\n[source, java]\n----\nc.addValue(long, double);\n----\n\nAdds a new floating-point value to the time series. If you wish to store only long values, use `c.addValue(long,\nlong)`, however do `not` mix these in the same series.\n\nAfter the block is ready, remember to call:\n\n[source, java]\n----\nc.close();\n----\n\nwhich flushes the remaining data to the stream and writes closing information.\n\n=== Decompressing\n\nTo decompress from the older 1.x format, use class ``Decompressor``. For 2.x, use ``GorillaDecompressor`` (recommended).\n``LongArrayInput`` is also recommended compared to ``ByteBufferBitInput`` because of performance if the 2.x\nformat was used to compress the time series. If the original compressor used different predictor than\n``LastValuePredictor`` it must be defined in the constructor.\n\n[source, java]\n----\nLongArrayInput input = new LongArrayInput(byteBuffer);\nGorillaDecompressor d = new GorillaDecompressor(input);\n----\n\nTo decompress a stream of bytes, supply `GorillaDecompressor` with a suitable implementation of `BitInput` interface.\n The LongArrayInput allows to decompress a long array or existing `ByteBuffer` presentation with 8 byte word\n length.\n\n[source, java]\n----\nPair pair = d.readPair();\n----\n\nRequesting next pair with `readPair()` returns the following series value or a `null` once the series is completely\nread. The pair is a simple placeholder object with `getTimestamp()` and `getDoubleValue()` or `getLongValue()`.\n\n== Performance\n\nThe following performance in reached in a Linux VM running on VMware Player in Windows 8.1 host. i7 2600K at 4GHz.\nThe benchmark used is the ``EncodingBenchmark``. These results should not be directly compared to other\nimplementations unless similar dataset is used.\n\nResults are in millions of datapoints (timestamp + value) pairs per second. The values in this benchmark are\nin doubles (performance with longs is slightly higher, around ~2-3M/s).\n\n.Compression\n|===\n|GorillaCompressor (2.0.0) |Compressor (1.1.0)\n\n|83.5M/s (~1.34GB/s)\n|31.2M/s (~499MB/s)\n|===\n\n\n.Decompression\n|===\n|GorillaDecompressor (2.0.0) |Decompressor (1.1.0)\n\n|77,9M/s (~1.25GB/s)\n|51.4M/s (~822MB/s)\n|===\n\nMost of the differences in decompression / compression speed between versions come from implementation changes and\nnot from the small changes to the output format.\n\n== Roadmap\n\nThere were few things I wanted to get to 2.0.0, but had to decide against due to lack of time. I will implement these\n later with potentially some breaking API changes:\n\n * Support timestamp only compressions (2.2.x)\n * Include ByteBufferLongOutput/ByteBufferLongInput in the package (2.2.x)\n * Move bit operations to inside the GorillaCompressor/GorillaDecompressor to allow easier usage with\n other allocators (2.2.x)\n\n== Internals\n\n=== Differences to the original paper\n\n* Maximum number of leadingZeros is stored with 6 bits to allow up to 63 leading zeros, which are necessary when\nstoring long values. (\u003e= 2.0.0)\n* Timestamp delta-of-delta are stored by first turning them with ZigZag encoding to positive integers and then\nreduced by one to fit in the necessary bits. In the decoding phase all the values are incremented by one to fetch the\n original value. (\u003e= 2.0.0)\n* The compressed blocks are created with a 27 bit delta header (unlike in the original paper, which uses a 14 bit delta\n  header). This allows to use up to one day block size using millisecond precision. (\u003e= 1.0.0)\n\n=== Data structure\n\nValues must be inserted in the increasing time order, out-of-order insertions are not supported.\n\nThe included ByteBufferBitInput and ByteBufferBitOutput classes use a big endian order for the data.\n\n== Contributing\n\nFile an issue and/or send a pull request.\n\n=== License\n\n....\n   Copyright 2016-2018 Michael Burman and/or other contributors.\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n....\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fburmanm%2Fgorilla-tsc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fburmanm%2Fgorilla-tsc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fburmanm%2Fgorilla-tsc/lists"}