https://github.com/cldellow/manu
Mostly archived, not updated.
https://github.com/cldellow/manu
Last synced: 10 months ago
JSON representation
Mostly archived, not updated.
- Host: GitHub
- URL: https://github.com/cldellow/manu
- Owner: cldellow
- License: epl-1.0
- Created: 2018-01-08T21:28:21.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-12T15:41:57.000Z (almost 8 years ago)
- Last Synced: 2025-08-23T11:05:14.083Z (10 months ago)
- Language: Java
- Size: 1.02 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Manu: "Mostly archived, not updated"
[](https://travis-ci.org/cldellow/manu)
[](https://codecov.io/gh/cldellow/manu)
[](https://mvnrepository.com/artifact/com.cldellow/manu)
A time series storage format for integers and floats, using efficient delta encodings from [FastPFOR](https://github.com/lemire/JavaFastPFOR).
Examples: pageviews by article in Wikipedia, stock open/close/high/low prices, weather temperatures.
## Components
- [manu-format](format), a library for maintaining the data on disk
- [manu-cli](cli), a command-line tool for ingesting data into the format
- [manu-serve](serve), a web server to expose the data over REST
## Design criteria
### Priorities
- Cheap
- I'm doing this to drive a hobby project; my dream would be to host a variety of datasets for $10/month.
- A Fermi estimate suggests Wikipedia pageviews has 100B datapoints over the last 10 years. This implies that storage costs will dominate.
- Doesn’t need to be always-on
- This sort of follows from cheap -- the ability to load subsets of data, or to run on spot instances will be a useful tool to cut costs.
### Non-priorities
- Concurrent / fast writes
- These can happen offline.
- Fast reads
- The pareto principle will likely apply to queries - 1% of keys will get 99% of reads. We can use Varnish or similar to cache at the application level.
### Assumptions
- Dense datasets
- Keys: if we see a key once, we expect to see it again.
- Values: if key X has a datapoint at T1, we expect most other keys will as well.
- Correlated values
- Value for key X at T1 is likely related to value at T2.
- Some datasets can be lossy
- Wikipedia pageviews, e.g., are likely insensitive to precision so long as the trend is generally correct.
## Obligatory

Credit: [Our Greatest Asset](https://www.smbc-comics.com/comic/our-greatest-asset), Saturday Morning Breakfast Cereal