{"id":31701776,"url":"https://github.com/getlantern/zenodb","last_synced_at":"2025-10-08T21:08:33.721Z","repository":{"id":47660968,"uuid":"62095250","full_name":"getlantern/zenodb","owner":"getlantern","description":"Time-based database","archived":false,"fork":false,"pushed_at":"2023-03-07T01:22:19.000Z","size":20460,"stargazers_count":106,"open_issues_count":16,"forks_count":11,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-10-29T09:02:48.243Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getlantern.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-06-27T23:32:33.000Z","updated_at":"2024-10-25T15:48:33.000Z","dependencies_parsed_at":"2024-06-18T22:38:56.158Z","dependency_job_id":"5890d427-c884-44a5-abe7-ea5001f504bc","html_url":"https://github.com/getlantern/zenodb","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/getlantern/zenodb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getlantern%2Fzenodb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getlantern%2Fzenodb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getlantern%2Fzenodb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getlantern%2Fzenodb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getlantern","download_url":"https://codeload.github.com/getlantern/zenodb/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getlantern%2Fzenodb/sbom","scorecard":{"id":424422,"data":{"date":"2025-08-11","repo":{"name":"github.com/getlantern/zenodb","commit":"df46cc148a8c461fae7358d2c62bff1b94b7f3f9"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2,"checks":[{"name":"Code-Review","score":2,"reason":"Found 5/20 approved changesets -- score normalized to 2","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 16 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Vulnerabilities","score":0,"reason":"19 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GO-2021-0227 / GHSA-3vm4-22fp-5rfm","Warn: Project is vulnerable to: GO-2022-0968 / GHSA-gwc9-m7rh-j2ww","Warn: Project is vulnerable to: GO-2021-0356 / GHSA-8c26-wmh5-6g9v","Warn: Project is vulnerable to: GO-2024-2961","Warn: Project is vulnerable to: GO-2023-2402 / GHSA-45x7-px36-x8w8","Warn: Project is vulnerable to: GO-2024-3321 / GHSA-v778-237x-gjrc","Warn: Project is vulnerable to: GO-2025-3487 / GHSA-hcg3-q754-cr77","Warn: Project is vulnerable to: GO-2022-0288","Warn: Project is vulnerable to: GO-2022-0969 / GHSA-69cg-p879-7622","Warn: Project is vulnerable to: GO-2022-1144 / GHSA-xrjj-mj9h-534m","Warn: Project is vulnerable to: GO-2023-1571 / GHSA-vvpx-j8f3-3w6h","Warn: Project is vulnerable to: GO-2023-1988 / GHSA-2wrh-6pvc-2jm9","Warn: Project is vulnerable to: GO-2023-2102 / GHSA-4374-p667-p6c8","Warn: Project is vulnerable to: GO-2023-2153 / GHSA-m425-mq94-257g / GHSA-qppj-fm5r-hxr3","Warn: Project is vulnerable to: GO-2024-2687 / GHSA-4v7x-pqxf-cx7m","Warn: Project is vulnerable to: GO-2024-3333","Warn: Project is vulnerable to: GO-2025-3503 / GHSA-qxp5-gwg8-xv66","Warn: Project is vulnerable to: GO-2025-3595 / GHSA-vvgc-356p-c3xw","Warn: Project is vulnerable to: GO-2022-0603 / GHSA-hp87-p4gw-j4gq"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-19T01:56:50.426Z","repository_id":47660968,"created_at":"2025-08-19T01:56:50.426Z","updated_at":"2025-08-19T01:56:50.426Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000716,"owners_count":26082837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-08T21:08:32.686Z","updated_at":"2025-10-08T21:08:33.709Z","avatar_url":"https://github.com/getlantern.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"ZenoDB [![Travis CI Status](https://travis-ci.org/getlantern/zenodb.svg?branch=master)](https://travis-ci.org/getlantern/zenodb)\u0026nbsp;[![Coverage Status](https://coveralls.io/repos/getlantern/zenodb/badge.png)](https://coveralls.io/r/getlantern/zenodb)\u0026nbsp;[![GoDoc](https://godoc.org/github.com/getlantern/zenodb?status.png)](http://godoc.org/github.com/getlantern/zenodb)\u0026nbsp;[![Sourcegraph](https://sourcegraph.com/github.com/getlantern/zenodb/-/badge.svg)](https://sourcegraph.com/github.com/getlantern/zenodb?badge)\n==========\n\nZenoDB is a Go-based embeddable time series database optimized for performing\naggregated analytical SQL queries on dimensional data.  It was developed to\nreplace [influxdb](https://github.com/influxdata/influxdb/) as a repository for\nclient and server metrics at [Lantern](https://www.getlantern.org).\n\n## Dependencies\nThis project uses Go modules to manage dependencies. If running a Go version\nprior to 1.13, you can enable Go modules using the environment variable\n`GO111MODULE=on`, like:\n\n```\nGO111MODULE=on go install github.com/getlantern/zenodb/cmd/zeno\n```\n\n## Current Features\n\n * No limits on the number of dimensions\n * SQL-based query language including GROUP BY and HAVING support\n * Auto-correlation\n * Reasonably efficient storage model\n * (Mostly) parallel query processing\n * Crosstab queries\n * FROM subqueries\n * Write-ahead Log\n * Seems pretty fast\n * Materialized views (with historical data from write-ahead log)\n * Some unit tests\n * Limit query memory consumption to avoid OOM killer\n * Multi-leader, multi-follower architecture\n \n## Future Stuff\n\n * Auto cleanup of Write-ahead Log\n * Harmonize field vs column language\n * More unit tests and general code cleanup\n * Byte array buffers to avoid allocations for sequences and ByteMaps\n * Smart sorting - e.g. only sort data files if a substantial number of new keys have been added\n * More validations/error checking\n * TLS in HTTP\n * Stored statistics (database-level, table-level, size, throughput, dimensions, etc.)\n * Optimized queries using expression references (avoid recomputing same expression when referenced multiple times in same row)\n * Completely parallel query processing\n * Interruptible queries using Context\n * User-level authentication/authorization\n * Multi-dimensional crosstab queries\n * Read-only query server replication using rsync?\n\n## Standalone Quick Start\n\nIn this tutorial, you will:\n\n* Run Zeno as a standalone database\n* Insert data into zeno using the RESTful HTTP API\n* Query zeno using the zeno-cli\n\nYou will learn how to use zeno to:\n\n* Aggregate multi-dimensional data\n* Correlate different kinds of data by inserting into a single table\n* Correlate data using the `IF` function\n* Derive data by performing calculations on existing data\n* Sort data\n* Filter data based on the results of your aggregations\n\nInstall [Go](https://golang.org/doc/install) if you haven't already.\n\n```bash\nGO111MODULE=on go install github.com/getlantern/zenodb/cmd/zeno\nGO111MODULE=on go install github.com/getlantern/zenodb/cmd/zeno-cli\n```\n\nMake a working directory (e.g. '~/zeno-quickstart').  In here, create a\n`schema.yaml` like the below to configure your database:\n\n```sql\ncombined:\n  retentionperiod: 1h\n  sql: \u003e\n    SELECT\n      requests,\n      AVG(load_avg) AS load_avg\n    FROM inbound\n    GROUP BY *, period(5m)\n```\n\nThis schema creates a *table* called `combined` which is filled by selecting\ndata from the *stream* `inbound`. `combined` keeps track of the `requests` and\n`load_avg` values and includes all dimensions from the `inbound` stream.  It\ngroups data into 5 minute buckets. The `requests` column is grouped using the\n`SUM` aggregation operator, which is the default if no operator is specified.\n`load_avg` on the other hand is aggregated as an average.\n\n**Core Concept** - Zeno does not store individual points, everything is stored\nin aggregated form.\n\nOpen three terminals\n\nTerminal 1\n\n```bash\n\u003e # Start the database\n\u003e cd ~/zeno-quickstart\n\u003e zeno\nDEBUG zenodb: schema.go:77 Creating table 'combined' as\nSELECT\n  requests,\n  AVG(load_avg) AS load_avg\nFROM inbound GROUP BY *, period(5m)\nDEBUG zenodb: schema.go:78 MaxMemStoreBytes: 1 B    MaxFlushLatency: 0s    MinFlushLatency: 0s\nDEBUG zenodb: table.go:118 MinFlushLatency disabled\nDEBUG zenodb: table.go:122 MaxFlushLatency disabled\nDEBUG zenodb: schema.go:83 Created table combined\nDEBUG zenodb: zenodb.go:63 Enabling geolocation functions\nDEBUG zenodb.combined: row_store.go:111 Will flush after 2562047h47m16.854775807s\nDEBUG zenodb: zenodb.go:75 Dir: /Users/ox.to.a.cart//zeno-quickstart    SchemaFile: /Users/ox.to.a.cart//zeno-quickstart/schema.yaml\nOpened database at /Users/ox.to.a.cart//zeno-quickstart\nListening for gRPC connections at 127.0.0.1:17712\nListening for HTTP connections at 127.0.0.1:17713\n```\n\nTerminal 2\n\n```bash\n\u003e # Submit some data via the REST API. Omit the ts parameter to use current time.\n\u003e curl -i -H \"Content-Type: application/json\" -X POST -d '{\"dims\": {\"server\": \"56.234.163.23\", \"path\": \"/index.html\", \"status\": 200}, \"vals\": {\"requests\": 56}}\n{\"dims\": {\"server\": \"56.234.163.23\", \"path\": \"/login\", \"status\": 200}, \"vals\": {\"requests\": 34}}\n{\"dims\": {\"server\": \"56.234.163.23\", \"path\": \"/login\", \"status\": 500}, \"vals\": {\"requests\": 12}}\n{\"dims\": {\"server\": \"56.234.163.23\"}, \"vals\": {\"load_avg\": 1.7}}\n{\"dims\": {\"server\": \"56.234.163.24\", \"path\": \"/index.html\", \"status\": 200}, \"vals\": {\"requests\": 523}}\n{\"dims\": {\"server\": \"56.234.163.24\", \"path\": \"/login\", \"status\": 200}, \"vals\": {\"requests\": 411}}\n{\"dims\": {\"server\": \"56.234.163.24\", \"path\": \"/login\", \"status\": 500}, \"vals\": {\"requests\": 28}}\n{\"dims\": {\"server\": \"56.234.163.24\"}, \"vals\": {\"load_avg\": 0.3}}' -k https://localhost:17713/insert/inbound\nHTTP/1.1 201 Created\nDate: Mon, 29 Aug 2016 03:00:38 GMT\nContent-Length: 0\nContent-Type: text/plain; charset=utf-8\n```\n\nNotice that:\n\n* You're inserting into a the stream `inbound` not the table `combined`\n* You can batch insert multiple points in a single HTTP request\n* You can insert heterogenous data like HTTP response statuses and load averages\n  into a single stream, thereby automatically correlating the data on any shared\n  dimensions (bye bye JOINs!).\n\nTerminal 3\n\n*SQL*\n\n```sql\nSELECT\n  _points,\n  requests,\n  load_avg\nFROM combined\nGROUP BY *\nORDER BY requests DESC\n```\n\n*zeno-cli*\n\n```sql\n\u003e # Query the data\n\u003e zeno-cli -insecure -fresh\nWill save history to /Users/ox.to.a.cart/Library/Application Support/zeno-cli/history\nzeno-cli \u003e SELECT _points, requests, load_avg FROM combined GROUP BY * ORDER BY requests DESC;\n# time                             path           server           status        _points    requests    load_avg\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.24    200            1.0000    523.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    200            1.0000    411.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.23    200            1.0000     56.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    200            1.0000     34.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    500            1.0000     28.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    500            1.0000     12.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.23    \u003cnil\u003e          1.0000      0.0000      1.7000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.24    \u003cnil\u003e          1.0000      0.0000      0.3000\n```\n\nNotice that:\n\n* Dimensions are included in the result based on the GROUP BY, you don't include\n  them in the SELECT expression.\n* There's a built-in field `_points` that gives a count of the number of points\n  that were inserted.\n\nNow run the same insert again.\n\nThen run the same query again.\n\n**Pro tip** - zeno-cli has a history, so try the up-arrow or `Ctrl+R`.\n\n*zeno-cli*\n\n```sql\nzeno-cli \u003e SELECT _points, requests, load_avg FROM combined GROUP BY * ORDER BY requests DESC;\n# time                             path           server           status        _points     requests    load_avg\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.24    200            2.0000    1046.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    200            2.0000     822.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.23    200            2.0000     112.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    200            2.0000      68.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    500            2.0000      56.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    500            2.0000      24.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.23    \u003cnil\u003e          2.0000       0.0000      1.7000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.24    \u003cnil\u003e          2.0000       0.0000      0.3000\n```\n\nAs long as you're submitted the 2nd batch of data soon after the first, you\nshould see that the number of rows hasn't changed, Zeno just aggregated the data\non the existing timestamps.  The requests figures all doubled, since these are\naggregated as a `SUM`. The load_avg figures remained unchanged since they're\nbeing aggregated as an `AVG`.\n\nNote - if enough time has elapsed that we have a new timestamp, you will see\nadditional rows like this:\n\n```\n# time                             path           server           status        _points    requests    load_avg\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.24    200            1.0000    523.0000      0.0000\nMon, 29 Aug 2016 03:05:00 UTC      /index.html    56.234.163.24    200            1.0000    523.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    200            1.0000    411.0000      0.0000\nMon, 29 Aug 2016 03:05:00 UTC      /login         56.234.163.24    200            1.0000    411.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.23    200            1.0000     56.0000      0.0000\nMon, 29 Aug 2016 03:05:00 UTC      /index.html    56.234.163.23    200            1.0000     56.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    200            1.0000     34.0000      0.0000\nMon, 29 Aug 2016 03:05:00 UTC      /login         56.234.163.23    200            1.0000     34.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    500            1.0000     28.0000      0.0000\nMon, 29 Aug 2016 03:05:00 UTC      /login         56.234.163.24    500            1.0000     28.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    500            1.0000     12.0000      0.0000\nMon, 29 Aug 2016 03:05:00 UTC      /login         56.234.163.23    500            1.0000     12.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.23    \u003cnil\u003e          1.0000      0.0000      1.7000\nMon, 29 Aug 2016 03:05:00 UTC      \u003cnil\u003e          56.234.163.23    \u003cnil\u003e          1.0000      0.0000      1.7000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.24    \u003cnil\u003e          1.0000      0.0000      0.3000\nMon, 29 Aug 2016 03:05:00 UTC      \u003cnil\u003e          56.234.163.24    \u003cnil\u003e          1.0000      0.0000      0.3000\n```\n\n**Core Concept** - Zeno knows how to aggregate fields based on the schema, so\nyou don't need to include aggregation operators in your query.  What happens if\nwe try to query for `SUM(load_avg)`?\n\n*zeno-cli*\n\n```sql\nzeno-cli \u003e SELECT _points, requests, SUM(load_avg) AS load_avg FROM combined GROUP BY * ORDER BY requests DESC;\nrpc error: code = 2 desc = No column found for load_avg (SUM(load_avg))\n```\n\nThe underlying column is an `AVG(load_avg)`, so taking a SUM is not possible!\n\nSometimes, it's useful to show a dimension in columns rather than rows. You can\ndo this using the `CROSSTAB` function.\n\n```sql\nSELECT\n  requests,\n  load_avg\nFROM combined\nGROUP BY server, CROSSTAB(path)\nORDER BY requests;\n```\n\n```sql\nzeno-cli \u003e SELECT requests, load_avg FROM combined GROUP BY server, CROSSTAB(path) ORDER BY requests;\n# time                             server                  requests    requests     requests    load_avg    load_avg\n#                                                       /index.html      /login      *total*       \u003cnil\u003e     *total*\nMon, 29 Aug 2016 03:05:00 UTC      56.234.163.23           112.0000     92.0000     204.0000      1.7000      1.7000\nMon, 29 Aug 2016 03:05:00 UTC      56.234.163.24          1046.0000    878.0000    1924.0000      0.3000      0.3000\n```\n\nNotice how there's a second header row now that shows the different values of\npath. Notice also how paths that don't have any data are not shown, and notice\nthat a *total* column is automatically included for each field.\n\nNow let's do some correlation using the `IF` function.  `IF` takes two\nparameters, a conditional expression that determines whether or not to include a\nvalue based on its associated dimensions, and the value expression that selects\nwhich column to include.\n\nLet's say that we want to get the error rate, defined as the number of non-200\nstatuses versus total requests:\n\n*sql*\n\n```sql\nSELECT\n  IF(status \u003c\u003e 200, requests) AS errors,\n  requests,\n  errors / requests AS error_rate,\n  load_avg\nFROM combined\nGROUP BY *\nORDER BY error_rate DESC\n```\n\n*zeno-cli*\n\n```sql\nzeno-cli \u003e SELECT IF(status \u003c\u003e 200, requests) AS errors, requests, errors / requests AS error_rate, load_avg FROM combined GROUP BY * ORDER BY error_rate DESC;\n# time                             path           server           status         errors     requests    error_rate    load_avg\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    500           56.0000      56.0000        1.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    500           24.0000      24.0000        1.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.23    200            0.0000      68.0000        0.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.23    \u003cnil\u003e          0.0000       0.0000        0.0000      1.7000\nMon, 29 Aug 2016 03:00:00 UTC      \u003cnil\u003e          56.234.163.24    \u003cnil\u003e          0.0000       0.0000        0.0000      0.3000\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.23    200            0.0000     112.0000        0.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /login         56.234.163.24    200            0.0000     822.0000        0.0000      0.0000\nMon, 29 Aug 2016 03:00:00 UTC      /index.html    56.234.163.24    200            0.0000    1046.0000        0.0000      0.0000\n```\n\nOkay, this distinguishes between errors and other requests, but errors and other\nrequests aren't being correlated yet so the error_rate isn't useful. Notice also\nthat load_avg is separate from the requests measurements.  That's because we're\nstill implicitly grouping on status and path, so error rows are separate from\nsuccess rows and load_avg rows (which are only associated with servers, not\npaths) are separate from everything else.\n\nSo instead, let's group only by server:\n\n*sql*\n\n```sql\nSELECT\n  IF(status \u003c\u003e 200, requests) AS errors,\n  requests,\n  errors / requests AS error_rate,\n  load_avg\nFROM combined\nGROUP BY server\nORDER BY error_rate DESC\n```\n\n*zeno-cli*\n\n```sql\nzeno-cli \u003e SELECT IF(status \u003c\u003e 200, requests) AS errors, requests, errors / requests AS error_rate, load_avg FROM combined GROUP BY server ORDER BY error_rate DESC;\n# time                             server                errors     requests    error_rate    load_avg\nMon, 29 Aug 2016 03:00:00 UTC      56.234.163.23        24.0000     204.0000        0.1176      1.7000\nMon, 29 Aug 2016 03:00:00 UTC      56.234.163.24        56.0000    1924.0000        0.0291      0.3000\n```\n\nThat looks better!  We're getting a meaningful error rate, and we can even see\nthat there's a correlation between the error_rate and the load_avg.\n\n**Challenge** - What calculation would yield a meaningful understanding of the\nrelationship between error_rate and load_avg?\n\nNow, if we had a ton of servers, we would really only be interested in the ones\nwith the top error rates.  We could handle that either with a `LIMIT` clause:\n\n*sql*\n\n```sql\nSELECT\n  IF(status \u003c\u003e 200, requests) AS errors,\n  requests,\n  errors / requests AS error_rate,\n  load_avg\nFROM combined\nGROUP BY server\nORDER BY error_rate DESC\nLIMIT 1\n```\n\n*zeno-cli*\n\n```sql\nzeno-cli \u003e SELECT IF(status \u003c\u003e 200, requests) AS errors, requests, errors / requests AS error_rate, load_avg FROM combined GROUP BY server ORDER BY error_rate DESC LIMIT 1;\n# time                             server                errors    requests    error_rate    load_avg\nMon, 29 Aug 2016 03:00:00 UTC      56.234.163.23        24.0000    204.0000        0.1176      1.7000\n```\n\nOr you can we can use the `HAVING` clause to filter based on the actual data:\n\n*sql*\n\n```sql\nSELECT\n  IF(status \u003c\u003e 200, requests) AS errors,\n  requests,\n  errors / requests AS error_rate,\n  load_avg\nFROM combined\nGROUP BY server\nHAVING error_rate \u003e 0.1\nORDER BY error_rate DESC\n```\n\n*zeno-cli*\n\n```sql\nzeno-cli \u003e SELECT IF(status \u003c\u003e 200, requests) AS errors, requests, errors / requests AS error_rate, load_avg FROM combined GROUP BY server HAVING error_rate \u003e 0.1 ORDER BY error_rate DESC;\n# time                             server                errors    requests    error_rate    load_avg\nMon, 29 Aug 2016 03:05:00 UTC      56.234.163.23        24.0000    204.0000        0.1176      1.7000\n```\n\n\n\nThere!  You've just aggregated, correlated and gained valuable insights into\nyour server infrastructure.  At [Lantern](https://www.getlantern.org) we do this\nsort of stuff with data from thousands of servers and millions of clients!\n\n## Schema\n\nZenoDB relies on a schema file (by default `schema.yaml`).\n\n### Example: How to add a view\n\nA view is really a language construct for creating a table whose properties are derived from an existing table.  The data stream between the view and the table it inherits from is the same, however, it's stored separately. Consequently, a view can have different (and finer) granularity than its parent table.\n\nThis is an example of a view defined in the YAML Schema:\n\n```\nemojis_fetched:\n  view:             true\n  retentionperiod:  168h\n  backfill:         6h\n  minflushlatency:  1m\n  maxflushlatency:  1h\n  partitionby:      [client_ip]\n  sql: \u003e\n    SELECT success_count, error_count, error_rate, emojis_fetched\n      FROM core\n      WHERE client_ip LIKE ‘192.-%’\n      GROUP BY client_ip, period(1h)\n```\n\nWe start with the name of the view:\n\n`emojis_fetched`\n\nThis means that the data for this table comes from another table or view, rather than an input stream:\n\n`view: true`\n\nThis means that we keep 1 week worth of history\n\n`retentionperiod: 168h`\n\nThis means that we won’t flush the memstore more frequently than every 1 minute:\n\n`minflushlatency: 1m`\n\nAnd flush at least every hour:\n\n`maxflushlatency:  1h`\n\nFlusing means storing the data held in memory until now, and saving it to the permanent on-disk storage for later retrieval.\n\nThe physical storage happens on the follower nodes, so Zenodb needs to know how to distribute that data across nodes:\n\n`partitionby: [client_ip]`\n\nThe next part is the definition of the contents of the table/view:\n\n```\n  sql: \u003e\n    SELECT success_count, error_count, error_rate, emojis_fetched\n      FROM core\n      WHERE client_ip LIKE ‘192.-%’\n      GROUP BY client_ip, period(1h)\n```\n\nThe `SELECT` row are the fields. In this case, success_count, error_count, error_rate and emojis_fetched. The rest of the statement are the grouping and filter operations, which are optional and can be adapted to the needs. This selects only measurements related to a group of IPs.\n\n`WHERE client_ip LIKE ‘192.-%’`\n\nFor instance, if we wanted to change it to emojis:\n\n`WHERE emojis_fetched LIKE ‘smile-%’`\n\nThis selects which dimensions to keep, and what resolution to use. This is usually the hardest part of creating a view, because you need to anticipate what questions will be asked:\n\n`GROUP BY client_ip, period(1h)`\n\n## Functions\n\nTODO - fill out function reference\n\n## Subqueries\n\nTODO - explain how subqueries work\n\n## Embedding\n\nCheck out the [zenodbdemo](zenodbdemo/zenodbdemo.go) for an example of how to\nembed zenodb.\n\n## Implementation Notes\n\n### Sequences\n\nThese are the central units of data storage in Zenodb: https://github.com/getlantern/zenodb/blob/master/encoding/seq.go\n\nIn short, we group by dimension, and then we summarize/roll-up fields. The summarization is done using an aggregation function like SUM, AVG, MIN, MAX, etc. That is key aspect to how Zenodb manages to discard data and achieve compression.\nFor example, for a new view, let’s say we’re not grouping by anything (that would be `GROUP BY _, PERIOD(1h)`), and let’s look just at the success_count field, in storage, we would have a single row with no dimensions and an array of aggregated success_counts by time period:\n\n`success_count: [4000, 5000]`\n\nThat would be the result of a series of points that add up to 4000 in the first period, and to 5000 in the second period (of 1h)\n\nNow let’s say that we did `GROUP BY geo_country, PERIOD(1h)`\n\nAnd let’s say that success_counts are evenly split between US and AU, then we would have two rows:\n\n```\ngeo_country: US   success_count: [2000, 2500]\ngeo_country: AU   success_count: [2000, 2500]\n```\n\nOne important aspect is that each field is actually not just an array, the array is actually preceded by the timestamp high water mark, so it would actually be something like this:\n\n`geo_country: US   success_count: 20180118T05:00Z[1000, 1500]`\n\n### Views\n\nSince everything in zenodb comes in through input streams, views cannot actually be constructed from the underlying tables, so they need to be stored independently. This allows for views to have different granularities that the tables/views they are referring to. In other words, at runtime, views are actually just like tables that pull from that same input stream, the only difference is that when you define a view, it can take into account knowledge from the definition of the underlying table.\n\n## Clustering\n\n### Performance timestamps\n\n* Partition on high cardinality fields/combinations that you frequently query\n* Don't partition on low-cardinality fields as these will tend to hotspot one\n  one or another partition and slow down synchronization from the leader.\n* Don't partition on too many different fields/combinations is this will\n  increase amount of data that each follower has to synchronize.\n\n## Acknowledgements\n\n * [sqlparser](https://github.com/xwb1989/sqlparser) - Go SQL parser\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetlantern%2Fzenodb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetlantern%2Fzenodb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetlantern%2Fzenodb/lists"}