{"id":21915597,"url":"https://github.com/clustercockpit/cc-metric-store","last_synced_at":"2026-03-06T14:13:50.670Z","repository":{"id":43323525,"uuid":"375072242","full_name":"ClusterCockpit/cc-metric-store","owner":"ClusterCockpit","description":"A simple in-memory metric store","archived":false,"fork":false,"pushed_at":"2024-11-18T14:45:41.000Z","size":229,"stargazers_count":2,"open_issues_count":9,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-18T16:05:53.211Z","etag":null,"topics":["golang","in-memory-caching","metrics-databases"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ClusterCockpit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-08T16:15:24.000Z","updated_at":"2024-11-18T14:45:53.000Z","dependencies_parsed_at":"2024-06-28T06:36:10.682Z","dependency_job_id":"176a6359-9620-46ca-b64d-0ef040a7e603","html_url":"https://github.com/ClusterCockpit/cc-metric-store","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fcc-metric-store","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fcc-metric-store/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fcc-metric-store/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fcc-metric-store/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ClusterCockpit","download_url":"https://codeload.github.com/ClusterCockpit/cc-metric-store/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226971327,"owners_count":17711413,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","in-memory-caching","metrics-databases"],"created_at":"2024-11-28T19:13:02.389Z","updated_at":"2026-03-06T14:13:50.661Z","avatar_url":"https://github.com/ClusterCockpit.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ClusterCockpit Metric Store\n\n[![Build \u0026 Test](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml/badge.svg)](https://github.com/ClusterCockpit/cc-metric-store/actions/workflows/test.yml)\n\nThe cc-metric-store provides a simple in-memory time series database for storing\nmetrics of cluster nodes at preconfigured intervals. It is meant to be used as\npart of the [ClusterCockpit suite](https://github.com/ClusterCockpit). As all\ndata is kept in-memory, accessing it is very fast. It also provides topology aware\naggregations over time _and_ nodes/sockets/cpus.\n\nThe storage engine is provided by the\n[cc-backend](https://github.com/ClusterCockpit/cc-backend) package\n(`cc-backend/pkg/metricstore`). This repository provides the HTTP API wrapper.\n\nThe [NATS.io](https://nats.io/) based writing endpoint and the HTTP write\nendpoint both consume messages in [this format of the InfluxDB line\nprotocol](https://github.com/ClusterCockpit/cc-specifications/blob/master/metrics/lineprotocol_alternative.md).\n\n## Building\n\n`cc-metric-store` can be built using the provided `Makefile`.\nIt supports the following targets:\n\n- `make`: Build the application, copy an example configuration file and generate\n  checkpoint folders if required.\n- `make clean`: Clean the golang build cache and application binary\n- `make distclean`: In addition to the clean target also remove the `./var`\n  folder and `config.json`\n- `make swagger`: Regenerate the Swagger files from the source comments.\n- `make test`: Run tests and basic checks (`go build`, `go vet`, `go test`).\n\n## Running\n\n```sh\n./cc-metric-store                              # Uses ./config.json\n./cc-metric-store -config /path/to/config.json\n./cc-metric-store -dev                         # Enable Swagger UI at /swagger/\n./cc-metric-store -loglevel debug              # debug|info|warn (default)|err|crit\n./cc-metric-store -logdate                     # Add date and time to log messages\n./cc-metric-store -version                     # Show version information and exit\n./cc-metric-store -gops                        # Enable gops agent for debugging\n```\n\n## REST API Endpoints\n\nThe REST API is documented in [swagger.json](./api/swagger.json). You can\nexplore and try the REST API using the integrated [SwaggerUI web\ninterface](http://localhost:8082/swagger/) (requires the `-dev` flag).\n\nFor more information on the `cc-metric-store` REST API have a look at the\nClusterCockpit documentation [website](https://clustercockpit.org/docs/reference/cc-metric-store/ccms-rest-api/).\n\nAll endpoints support both trailing-slash and non-trailing-slash variants:\n\n| Method | Path                | Description                            |\n| ------ | ------------------- | -------------------------------------- |\n| `GET`  | `/api/query/`       | Query metrics with selectors           |\n| `POST` | `/api/write/`       | Write metrics (InfluxDB line protocol) |\n| `POST` | `/api/free/`        | Free buffers up to a timestamp         |\n| `GET`  | `/api/debug/`       | Dump internal state                    |\n| `GET`  | `/api/healthcheck/` | Check node health status               |\n\nIf `jwt-public-key` is set in `config.json`, all endpoints require JWT\nauthentication using an Ed25519 key (`Authorization: Bearer \u003ctoken\u003e`).\n\n## Run tests\n\nSome benchmarks concurrently access the `MemoryStore`, so enabling the\n[Race Detector](https://golang.org/doc/articles/race_detector) might be useful.\nThe benchmarks also work as tests as they do check if the returned values are as\nexpected.\n\n```sh\n# Tests only\ngo test -v ./...\n\n# Benchmarks as well\ngo test -bench=. -race -v ./...\n```\n\n## What are these selectors mentioned in the code?\n\nThe cc-metric-store works as a time-series database and uses the InfluxDB line\nprotocol as input format. Unlike InfluxDB, the data is indexed by one single\nstrictly hierarchical tree structure. A selector is built out of the tags in the\nInfluxDB line protocol, and can be used to select a node (not in the sense of a\ncompute node, can also be a socket, cpu, ...) in that tree. The implementation\ncalls those nodes `level` to avoid confusion. It is impossible to access data\nonly by knowing the _socket_ or _cpu_ tag — all higher up levels have to be\nspecified as well.\n\nThis is what the hierarchy currently looks like:\n\n- cluster1\n  - host1\n    - socket0\n    - socket1\n    - ...\n    - cpu1\n    - cpu2\n    - cpu3\n    - cpu4\n    - ...\n    - gpu1\n    - gpu2\n  - host2\n  - ...\n- cluster2\n- ...\n\nExample selectors:\n\n1. `[\"cluster1\", \"host1\", \"cpu0\"]`: Select only the cpu0 of host1 in cluster1\n2. `[\"cluster1\", \"host1\", [\"cpu4\", \"cpu5\", \"cpu6\", \"cpu7\"]]`: Select only CPUs 4-7 of host1 in cluster1\n3. `[\"cluster1\", \"host1\"]`: Select the complete node. If querying for a CPU-specific metric such as flops, all CPUs are implied\n\n## Config file\n\nThe config file is a JSON document with four top-level sections.\n\n### `main`\n\n```json\n\"main\": {\n  \"addr\": \"0.0.0.0:8082\",\n  \"https-cert-file\": \"\",\n  \"https-key-file\": \"\",\n  \"jwt-public-key\": \"\u003cbase64-encoded Ed25519 public key\u003e\",\n  \"user\": \"\",\n  \"group\": \"\",\n  \"backend-url\": \"\"\n}\n```\n\n- `addr`: Address and port to listen on (default: `0.0.0.0:8082`)\n- `https-cert-file` / `https-key-file`: Paths to TLS certificate/key for HTTPS\n- `jwt-public-key`: Base64-encoded Ed25519 public key for JWT authentication. If empty, no auth is required.\n- `user` / `group`: Drop privileges to this user/group after startup\n- `backend-url`: Optional URL of a cc-backend instance used as node provider\n\n### `metrics`\n\nPer-metric configuration. Each key is the metric name:\n\n```json\n\"metrics\": {\n  \"cpu_load\": { \"frequency\": 60, \"aggregation\": null },\n  \"flops_any\": { \"frequency\": 60, \"aggregation\": \"sum\" },\n  \"cpu_user\":  { \"frequency\": 60, \"aggregation\": \"avg\" }\n}\n```\n\n- `frequency`: Sampling interval in seconds\n- `aggregation`: How to aggregate sub-level data: `\"sum\"`, `\"avg\"`, or `null` (no aggregation)\n\n### `metric-store`\n\n```json\n\"metric-store\": {\n  \"checkpoints\": {\n    \"file-format\": \"wal\",\n    \"directory\": \"./var/checkpoints\"\n  },\n  \"memory-cap\": 100,\n  \"retention-in-memory\": \"24h\",\n  \"num-workers\": 0,\n  \"cleanup\": {\n    \"mode\": \"archive\",\n    \"directory\": \"./var/archive\"\n  },\n  \"nats-subscriptions\": [\n    { \"subscribe-to\": \"hpc-nats\", \"cluster-tag\": \"fritz\" }\n  ]\n}\n```\n\n- `checkpoints.file-format`: Checkpoint format: `\"json\"` (default, human-readable) or `\"wal\"` (binary WAL, crash-safe). See [Checkpoint formats](#checkpoint-formats) below.\n- `checkpoints.directory`: Root directory for checkpoint files (organized as `\u003cdir\u003e/\u003ccluster\u003e/\u003chost\u003e/`)\n- `memory-cap`: Approximate memory cap in MB for metric buffers\n- `retention-in-memory`: How long to keep data in memory (e.g. `\"48h\"`)\n- `num-workers`: Number of parallel workers for checkpoint/archive I/O (0 = auto, capped at 10)\n- `cleanup.mode`: What to do with data older than `retention-in-memory`: `\"archive\"` (write Parquet) or `\"delete\"`\n- `cleanup.directory`: Root directory for Parquet archive files (required when `mode` is `\"archive\"`)\n- `nats-subscriptions`: List of NATS subjects to subscribe to, with associated cluster tag\n\n### Checkpoint formats\n\nThe `checkpoints.file-format` field controls how in-memory data is persisted to disk.\n\n**`\"json\"` (default)** — human-readable JSON snapshots written periodically. Each\nsnapshot is stored as `\u003cdir\u003e/\u003ccluster\u003e/\u003chost\u003e/\u003ctimestamp\u003e.json` and contains the\nfull metric hierarchy. Easy to inspect and recover manually, but larger on disk\nand slower to write.\n\n**`\"wal\"`** — binary Write-Ahead Log format designed for crash safety. Two file\ntypes are used per host:\n\n- `current.wal` — append-only binary log. Every incoming data point is appended\n  immediately (magic `0xCC1DA7A1`, 4-byte CRC32 per record). Truncated trailing\n  records from unclean shutdowns are silently skipped on restart.\n- `\u003ctimestamp\u003e.bin` — binary snapshot written at each checkpoint interval\n  (magic `0xCC5B0001`). Contains the complete hierarchical metric state\n  column-by-column. Written atomically via a `.tmp` rename.\n\nOn startup the most recent `.bin` snapshot is loaded, then any remaining WAL\nentries are replayed on top. The WAL is rotated (old file deleted, new one\nstarted) after each successful snapshot.\n\nThe `\"wal\"` option is the default and will be the only supported option in the\nfuture. The `\"json\"` checkpoint format is still provided to migrate from\nprevious cc-metric-store version.\n\n### Parquet archive\n\nWhen `cleanup.mode` is `\"archive\"`, data that ages out of the in-memory\nretention window is written to [Apache Parquet](https://parquet.apache.org/)\nfiles before being freed. Files are organized as:\n\n```\n\u003ccleanup.directory\u003e/\n  \u003ccluster\u003e/\n    \u003ctimestamp\u003e.parquet\n```\n\nOne Parquet file is produced per cluster per cleanup run, consolidating all\nhosts. Rows use a long (tidy) schema:\n\n| Column      | Type    | Description                                                             |\n| ----------- | ------- | ----------------------------------------------------------------------- |\n| `cluster`   | string  | Cluster name                                                            |\n| `hostname`  | string  | Host name                                                               |\n| `metric`    | string  | Metric name                                                             |\n| `scope`     | string  | Hardware scope (`node`, `socket`, `core`, `hwthread`, `accelerator`, …) |\n| `scope_id`  | string  | Numeric ID within the scope (e.g. `\"0\"`)                                |\n| `timestamp` | int64   | Unix timestamp (seconds)                                                |\n| `frequency` | int64   | Sampling interval in seconds                                            |\n| `value`     | float32 | Metric value                                                            |\n\nFiles are compressed with Zstandard and sorted by `(cluster, hostname, metric,\ntimestamp)` for efficient columnar reads. The `cpu` prefix in the tree is\ntreated as an alias for `hwthread` scope.\n\n### `nats`\n\n```json\n\"nats\": {\n  \"address\": \"nats://0.0.0.0:4222\",\n  \"username\": \"root\",\n  \"password\": \"root\"\n}\n```\n\nNATS connection is optional. If not configured, only the HTTP write endpoint is available.\n\nFor more information see the ClusterCockpit documentation [website](https://clustercockpit.org/docs/reference/cc-metric-store/ccms-configuration/).\n\n## Test the complete setup (excluding cc-backend itself)\n\nThere are two ways for sending data to the cc-metric-store, both of which are\nsupported by the\n[cc-metric-collector](https://github.com/ClusterCockpit/cc-metric-collector).\nThis example uses NATS; the alternative is to use HTTP.\n\n```sh\n# Only needed once, downloads the docker image\ndocker pull nats:latest\n\n# Start the NATS server\ndocker run -p 4222:4222 -ti nats:latest\n```\n\nSecond, build and start the\n[cc-metric-collector](https://github.com/ClusterCockpit/cc-metric-collector)\nusing the following as Sink-Config:\n\n```json\n{\n  \"type\": \"nats\",\n  \"host\": \"localhost\",\n  \"port\": \"4222\",\n  \"database\": \"updates\"\n}\n```\n\nThird, build and start the metric store. For this example here, the\n`config.json` file already in the repository should work just fine.\n\n```sh\n# Assuming you have a clone of this repo in ./cc-metric-store:\ncd cc-metric-store\nmake\n./cc-metric-store\n```\n\nAnd finally, use the API to fetch some data. The API is protected by JWT based\nauthentication if `jwt-public-key` is set in `config.json`. You can use this JWT\nfor testing:\n`eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw`\n\n```sh\nJWT=\"eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw\"\n\n# If the collector and store and nats-server have been running for at least 60 seconds on the same host:\ncurl -H \"Authorization: Bearer $JWT\" \\\n     \"http://localhost:8082/api/query/\" \\\n     -d '{\n       \"cluster\": \"testcluster\",\n       \"from\": '\"$(expr $(date +%s) - 60)\"',\n       \"to\": '\"$(date +%s)\"',\n       \"queries\": [{ \"metric\": \"cpu_load\", \"host\": \"'\"$(hostname)\"'\" }]\n     }'\n```\n\nFor debugging, the debug endpoint dumps the current content to stdout:\n\n```sh\nJWT=\"eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw\"\n\n# Dump everything\ncurl -H \"Authorization: Bearer $JWT\" \"http://localhost:8082/api/debug/\"\n\n# Dump a specific selector (colon-separated path)\ncurl -H \"Authorization: Bearer $JWT\" \"http://localhost:8082/api/debug/?selector=testcluster:host1\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclustercockpit%2Fcc-metric-store","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclustercockpit%2Fcc-metric-store","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclustercockpit%2Fcc-metric-store/lists"}