{"id":13504384,"url":"https://github.com/nakabonne/tstorage","last_synced_at":"2025-05-14T12:08:00.438Z","repository":{"id":37360422,"uuid":"367250415","full_name":"nakabonne/tstorage","owner":"nakabonne","description":"An embedded time-series database","archived":false,"fork":false,"pushed_at":"2024-10-20T12:42:36.000Z","size":537,"stargazers_count":1186,"open_issues_count":15,"forks_count":84,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-04-22T21:12:14.181Z","etag":null,"topics":["database","golang","golang-library","metrics","time-series","time-series-database"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nakabonne.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-14T04:40:29.000Z","updated_at":"2025-04-22T10:12:56.000Z","dependencies_parsed_at":"2024-02-18T07:32:36.195Z","dependency_job_id":"c80984af-cd7d-488c-a6e7-97c72001c02a","html_url":"https://github.com/nakabonne/tstorage","commit_stats":{"total_commits":237,"total_committers":6,"mean_commits":39.5,"dds":"0.025316455696202556","last_synced_commit":"7e4b396c9216554d0f3067cc11f1d050603a80ec"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nakabonne%2Ftstorage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nakabonne%2Ftstorage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nakabonne%2Ftstorage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nakabonne%2Ftstorage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nakabonne","download_url":"https://codeload.github.com/nakabonne/tstorage/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251296593,"owners_count":21566637,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","golang","golang-library","metrics","time-series","time-series-database"],"created_at":"2024-08-01T00:00:36.055Z","updated_at":"2025-04-28T10:38:48.484Z","avatar_url":"https://github.com/nakabonne.png","language":"Go","funding_links":[],"categories":["开源类库","Databases","Go","Open source library"],"sub_categories":["数据库","Time Series","Database"],"readme":"# tstorage [![Go Reference](https://pkg.go.dev/badge/mod/github.com/nakabonne/tstorage.svg)](https://pkg.go.dev/mod/github.com/nakabonne/tstorage)\n\n`tstorage` is a lightweight local on-disk storage engine for time-series data with a straightforward API.\nEspecially ingestion is massively optimized as it provides goroutine safe capabilities of write into and read from TSDB that partitions data points by time.\n\n## Motivation\nI'm working on a couple of tools that handle a tremendous amount of time-series data, such as [Ali](https://github.com/nakabonne/ali) and [Gosivy](https://github.com/nakabonne/gosivy).\nEspecially Ali, I had been facing a problem of increasing heap consumption over time as it's a load testing tool that aims to perform real-time analysis.\nI little poked around a fast TSDB library that offers simple APIs but eventually nothing works as well as I'd like, that's why I settled on writing this package myself.\n\nTo see how much `tstorage` has helped improve Ali's performance, see the release notes [here](https://github.com/nakabonne/ali/releases/tag/v0.7.0).\n\n## Usage\nCurrently, `tstorage` requires Go version 1.16 or greater\n\nBy default, `tstorage.Storage` works as an in-memory database.\nThe below example illustrates how to insert a row into the memory and immediately select it.\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/nakabonne/tstorage\"\n)\n\nfunc main() {\n\tstorage, _ := tstorage.NewStorage(\n\t\ttstorage.WithTimestampPrecision(tstorage.Seconds),\n\t)\n\tdefer storage.Close()\n\n\t_ = storage.InsertRows([]tstorage.Row{\n\t\t{\n\t\t\tMetric: \"metric1\",\n\t\t\tDataPoint: tstorage.DataPoint{Timestamp: 1600000000, Value: 0.1},\n\t\t},\n\t})\n\tpoints, _ := storage.Select(\"metric1\", nil, 1600000000, 1600000001)\n\tfor _, p := range points {\n\t\tfmt.Printf(\"timestamp: %v, value: %v\\n\", p.Timestamp, p.Value)\n\t\t// =\u003e timestamp: 1600000000, value: 0.1\n\t}\n}\n```\n\n### Using disk\nTo make time-series data persistent on disk, specify the path to directory that stores time-series data through [WithDataPath](https://pkg.go.dev/github.com/nakabonne/tstorage#WithDataPath) option.\n\n```go\nstorage, _ := tstorage.NewStorage(\n\ttstorage.WithDataPath(\"./data\"),\n)\ndefer storage.Close()\n```\n\n### Labeled metrics\nIn tstorage, you can identify a metric with combination of metric name and optional labels.\nHere is an example of insertion a labeled metric to the disk.\n\n```go\nmetric := \"mem_alloc_bytes\"\nlabels := []tstorage.Label{\n\t{Name: \"host\", Value: \"host-1\"},\n}\n\n_ = storage.InsertRows([]tstorage.Row{\n\t{\n\t\tMetric:    metric,\n\t\tLabels:    labels,\n\t\tDataPoint: tstorage.DataPoint{Timestamp: 1600000000, Value: 0.1},\n\t},\n})\npoints, _ := storage.Select(metric, labels, 1600000000, 1600000001)\n```\n\nFor more examples see [the documentation](https://pkg.go.dev/github.com/nakabonne/tstorage#pkg-examples).\n\n## Benchmarks\nBenchmark tests were made using Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz with 16GB of RAM on macOS 10.15.7\n\n```\n$ go version\ngo version go1.16.2 darwin/amd64\n\n$ go test -benchtime=4s -benchmem -bench=. .\ngoos: darwin\ngoarch: amd64\npkg: github.com/nakabonne/tstorage\ncpu: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz\nBenchmarkStorage_InsertRows-8                  \t14135685\t       305.9 ns/op\t     174 B/op\t       2 allocs/op\nBenchmarkStorage_SelectAmongThousandPoints-8   \t20548806\t       222.4 ns/op\t      56 B/op\t       2 allocs/op\nBenchmarkStorage_SelectAmongMillionPoints-8    \t16185709\t       292.2 ns/op\t      56 B/op\t       1 allocs/op\nPASS\nok  \tgithub.com/nakabonne/tstorage\t16.501s\n```\n\n## Internal\nTime-series database has specific characteristics in its workload.\nIn terms of write operations, a time-series database has to ingest a tremendous amount of data points ordered by time.\nTime-series data is immutable, mostly an append-only workload with delete operations performed in batches on less recent data.\nIn terms of read operations, in most cases, we want to retrieve multiple data points by specifying its time range, also, most recent first: query the recent data in real-time.\nBesides, time-series data is already indexed in time order.\n\nBased on these characteristics, `tstorage` adopts a linear data model structure that partitions data points by time, totally different from the B-trees or LSM trees based storage engines.\nEach partition acts as a fully independent database containing all data points for its time range.\n\n\n```\n  │                 │\nRead              Write\n  │                 │\n  │                 V\n  │      ┌───────────────────┐ max: 1600010800\n  ├─────\u003e   Memory Partition\n  │      └───────────────────┘ min: 1600007201\n  │\n  │      ┌───────────────────┐ max: 1600007200\n  ├─────\u003e   Memory Partition\n  │      └───────────────────┘ min: 1600003601\n  │\n  │      ┌───────────────────┐ max: 1600003600\n  └─────\u003e   Disk Partition\n         └───────────────────┘ min: 1600000000\n```\n\nKey benefits:\n- We can easily ignore all data outside of the partition time range when querying data points.\n- Most read operations work fast because recent data get cached in heap.\n- When a partition gets full, we can persist the data from our in-memory database by sequentially writing just a handful of larger files. We avoid any write-amplification and serve SSDs and HDDs equally well.\n\n### Memory partition\nThe memory partition is writable and stores data points in heap. The head partition is always memory partition. Its next one is also memory partition to accept out-of-order data points.\nIt stores data points in an ordered Slice, which offers excellent cache hit ratio compared to linked lists unless it gets updated way too often (like delete, add elements at random locations).\n\nAll incoming data is written to a write-ahead log (WAL) right before inserting into a memory partition to prevent data loss.\n\n### Disk partition\nThe old memory partitions get compacted and persisted to the directory prefixed with `p-`, under the directory specified with the [WithDataPath](https://pkg.go.dev/github.com/nakabonne/tstorage#WithDataPath) option.\nHere is the macro layout of disk partitions:\n\n```\n$ tree ./data\n./data\n├── p-1600000001-1600003600\n│   ├── data\n│   └── meta.json\n├── p-1600003601-1600007200\n│   ├── data\n│   └── meta.json\n└── p-1600007201-1600010800\n    ├── data\n    └── meta.json\n```\n\nAs you can see each partition holds two files: `meta.json` and `data`.\nThe `data` is compressed, read-only and is memory-mapped with [mmap(2)](https://en.wikipedia.org/wiki/Mmap) that maps a kernel address space to a user address space.\nTherefore, what it has to store in heap is only partition's metadata. Just looking at `meta.json` gives us a good picture of what it stores:\n\n```json\n$ cat ./data/p-1600000001-1600003600/meta.json\n{\n  \"minTimestamp\": 1600000001,\n  \"maxTimestamp\": 1600003600,\n  \"numDataPoints\": 7200,\n  \"metrics\": {\n    \"metric-1\": {\n      \"name\": \"metric-1\",\n      \"offset\": 0,\n      \"minTimestamp\": 1600000001,\n      \"maxTimestamp\": 1600003600,\n      \"numDataPoints\": 3600\n    },\n    \"metric-2\": {\n      \"name\": \"metric-2\",\n      \"offset\": 36014,\n      \"minTimestamp\": 1600000001,\n      \"maxTimestamp\": 1600003600,\n      \"numDataPoints\": 3600\n    }\n  }\n}\n```\n\nEach metric has its own file offset of the beginning.\nData point slice for each metric is compressed separately, so all we have to do when reading is to seek, and read the points off.\n\n### Out-of-order data points\nWhat data points get out-of-order in real-world applications is not uncommon because of network latency or clock synchronization issues; `tstorage` basically doesn't discard them.\nIf out-of-order data points are within the range of the head memory partition, they get temporarily buffered and merged at flush time.\nSometimes we should handle data points that cross a partition boundary. That is the reason why `tstorage` keeps more than one partition writable.\n\n## More\nWant to know more details on tstorage internal? If so see the blog post: [Write a time-series database engine from scratch](https://nakabonne.dev/posts/write-tsdb-from-scratch).\n\n## Acknowledgements\nThis package is implemented based on tons of existing ideas. What I especially got inspired by are:\n- https://misfra.me/state-of-the-state-part-iii\n- https://fabxc.org/tsdb\n- https://questdb.io/blog/2020/11/26/why-timeseries-data\n- https://akumuli.org/akumuli/2017/04/29/nbplustree\n- https://github.com/VictoriaMetrics/VictoriaMetrics\n\nA big \"thank you!\" goes out to all of them.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnakabonne%2Ftstorage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnakabonne%2Ftstorage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnakabonne%2Ftstorage/lists"}