{"id":15449889,"url":"https://github.com/iand/starbow","last_synced_at":"2025-04-19T22:50:10.479Z","repository":{"id":57616224,"uuid":"42999668","full_name":"iand/starbow","owner":"iand","description":"Starbow is a server for calculating statistics from streaming data.","archived":false,"fork":false,"pushed_at":"2020-06-30T10:03:41.000Z","size":257,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-13T19:23:00.275Z","etag":null,"topics":["bigdata","golang","server","statistics"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iand.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-09-23T12:32:24.000Z","updated_at":"2020-10-12T09:26:50.000Z","dependencies_parsed_at":"2022-08-27T07:31:31.249Z","dependency_job_id":null,"html_url":"https://github.com/iand/starbow","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iand%2Fstarbow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iand%2Fstarbow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iand%2Fstarbow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iand%2Fstarbow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iand","download_url":"https://codeload.github.com/iand/starbow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249824099,"owners_count":21330265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigdata","golang","server","statistics"],"created_at":"2024-10-01T21:02:23.910Z","updated_at":"2025-04-19T22:50:10.432Z","avatar_url":"https://github.com/iand.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Starbow\n\nStarbow is a server for calculating statistics from streaming data.\n\n![starbow](https://github.com/iand/starbow/blob/master/doc/starbow.png)\n\n\nStarbow aggregates observations from streams and collates statistical data from them according to pre-defined rules, holding them in memory. \nIt is not a general purpose database: only the derived statistics from the observation data is stored, the observations are discarded after processing.\nEach observation may be routed to zero or more *collations* which are analogous to buckets or tables in a database. \nCollations define a series of statistical measures that are updated for each observation received.\nMeasures can be precise (e.g. max/min/count/mean) or approximate (e.g. set cardinality). \n\nThe following measures are supported:\n\n* Precise\n    - Count \n    - Sum\n    - Mean\n    - Variance\n    - Max\n    - Min\n * Approximate\n    - Cardinality (implemented using )hyperloglog)\n\n# Collations\n\nStarbow aggregates statistics in \"collations\". A collation definition\ncomprises a filter and a set of keys used for routing and grouping\nobservations and a set of measures used for calculating statistics. Each\nunique set of key values encountered results in a new collation instance being\ncreated.\n\nWhen an observation is received it is routed to the appropriate collation\ninstance based on the values of the observation's fields corresponding to the\ncollation's keys. The collation then uses the values of the observation's\nmeasure fields to update its statistics.\n\nCollations can be defined via the Starbow API on an ad-hoc basis and participate\nin routing as soon as they are created. The API also provides a service for\nmeasuring the memory requirements for a collation given the expected\ncardinality of the keys. The memory footprint of a collation will vary heavily\nbased on the types of measures it employs.\n\n# Status\n\nStarbow is \"demoware\" and not all features are complete. \nThe statistic core of the server works well but the following areas are mostly stubbed out to support demoing and testing:\n\n * Query Language - supports a very limited SQL dialect of the form `select x,y where z`. This is parsed with regex for simplicity but needs a fuller grammar parser.\n * Query Results - returns human readable dump of results, should define a useful format\n * Server - very basic daemon facilities\n * Collation Creation - currently hardcoded to the demo (see demo.go) but needs to be possible via HTTP API\n * Collation Size Estimation - not supported by API, needed to estimate memory requirements for server\n * Persistence - Starbow is intended to run from memory but it should checkpoint and persist data\n\nIn the future the standard measures could be extended by:\n\n* Lookback - limit the measure to a lookback window of time from the present\n* Windowed - limit the measure to a a fixed time window\n* Bucketed - limit the measure to a series of equally sized time windows\n\nAlso several additional approximate measures are planned:\n\n* Containment - allow testing whether specific values have been seen (via bloom filter) \n* Frequency - number of occurrences of a particular value (count-min et al.)\n* TopN - most frequent items\n* Histogram - counts in various quantiles\n\n# Demo\n\nCompiling and running starbow will automatically start the demo server on port 2525. Two collations are defined for an airport dataset:\n\n * country\n     - counts number of records containing a country\n     - height field: supports mean, sum, variance, max and min of airport heights per country\n     - iata field: count of unique IATA codes per country\n * tz\n     - counts number of records containing a timezone\n     - country field: count of unique countries per timezone\n     - height field: count of unique heights per timezone\n\nThe misc folder contains `airports.sh` that posts the airport data observations to the server and `queries.sh` that posts some sample queries to the server.\n\n## Example Queries\n\n(See Status section above for limitations on the query language implementation)\n\nThe country collation contains a precise count of all records that contain a country. This can be queried using `select count(*)`:\n\n    select count(*) where country='United States'\n\n    count(*)=1315\n\nThe height field in this collation allows min and max to be queried. These values are precisely calculated from the streaming data received:\n\n    select min(height), max(height) where country='United States'\n\n    min(height)=-54\n    max(height)=9078\n\nThe iata field is a statistical estimate of the number of unique IATA codes in each country, using hyperloglog. \nThis estimate will not be precise. More precise results can be obtained by increasing the precision configured for the field in the collation definition. \nHigher precision estimates require more memory.\n\n    select uniquecount(iata) where country='United States'\n\n    uniquecount(iata)=38\n\n\n# Observation Format\n\nObservations (stream items) can be sent to the server with an HTTP POST to `/obs` using a custom line based format.\n\nEach line starts with a timestamp in milliseconds. The first byte after the timestamp defines the delimiter for the rest of the line which is assumed to be key/value pairs in the format `key=value`. \nAn example:\n\n\n    1478566442000000|name=Toulouse|country=France|iata=LFBF|height=535|tz=Europe/Paris\n\n\n# HTTP API\n\nStarbow supports a very simple API:\n\n * `POST /obs` - send one or more observations to the server\n * `GET+POST /collation/{name}?q={query}` - query a collation\n\n\n# Versioning\n\nThis project uses [Semantic Versioning](https://semver.org). Version information is held in the internal/version package. Each release is tagged in the git repository by prefixing the version with `v`, e.g. `v0.1.1-devel`\n\nRelease steps:\n\n 1. Update Major, Minor and Patch as appropriate in internal/version/version.go\n 2. Ensure that PreRelease is empty.\n 3. Ensure CHANGELOG.md is up to date.\n 4. Commit changes.\n 5. Tag the commit with the version number prefixed by `v`.\n 6. Change PreRelease to `devel`.\n 7. Commit change.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiand%2Fstarbow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiand%2Fstarbow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiand%2Fstarbow/lists"}