{"id":13450009,"url":"https://github.com/SnellerInc/sneller","last_synced_at":"2025-03-23T16:31:00.269Z","repository":{"id":37594803,"uuid":"474070793","full_name":"SnellerInc/sneller","owner":"SnellerInc","description":"World's fastest log analysis: λ + SQL + JSON + S3","archived":false,"fork":false,"pushed_at":"2024-01-07T16:28:34.000Z","size":25079,"stargazers_count":1014,"open_issues_count":4,"forks_count":41,"subscribers_count":23,"default_branch":"master","last_synced_at":"2024-10-28T17:39:03.487Z","etag":null,"topics":["avx512","go","high-performance","indexless","json","log","query-engine","s3","schemaless","serverless","simd","sql","vectorized"],"latest_commit_sha":null,"homepage":"https://sneller.ai","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SnellerInc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"code_of_conduct.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-03-25T15:46:05.000Z","updated_at":"2024-10-27T02:27:10.000Z","dependencies_parsed_at":"2024-01-07T17:56:10.711Z","dependency_job_id":null,"html_url":"https://github.com/SnellerInc/sneller","commit_stats":{"total_commits":997,"total_committers":18,"mean_commits":"55.388888888888886","dds":0.5757271815446339,"last_synced_commit":"4c473a555abc4fb3870a1f5467293584e2c4aa5e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnellerInc%2Fsneller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnellerInc%2Fsneller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnellerInc%2Fsneller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SnellerInc%2Fsneller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SnellerInc","download_url":"https://codeload.github.com/SnellerInc/sneller/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245130695,"owners_count":20565695,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avx512","go","high-performance","indexless","json","log","query-engine","s3","schemaless","serverless","simd","sql","vectorized"],"created_at":"2024-07-31T07:00:26.763Z","updated_at":"2025-03-23T16:30:55.247Z","avatar_url":"https://github.com/SnellerInc.png","language":"Go","funding_links":[],"categories":["Go","Database / Structures","\u003ca name=\"Go\"\u003e\u003c/a\u003eGo"],"sub_categories":[],"readme":"## Become a test partner\n\nPlease reach out to frank@sneller.io if you are interested in becoming a test partner of our serverless cloud offering.\n\n# SQL for JSON at scale: fast, simple, schemaless\n\nSneller is a high-performance SQL engine built to analyze\npetabyte-scale un-structured logs and other event data.\n\nHere are a couple major differentiators between Sneller and other SQL solutions:\n\n \u003c!-- TODO: Add link to \"explaining\" blog post for next topic as well --\u003e\n - Sneller is designed to use cloud object storage as its **only** backing store.\n - Sneller's SQL VM is [implemented in AVX-512 assembly](https://sneller.io/blog/sql-vm-in-avx-512/).\n   Medium-sized compute clusters provide throughput in excess of **terabytes per second**.\n - Sneller is [completely schemaless](https://sneller.io/blog/why-schemaless/).\n   No more ETL-ing your data! Heterogeneous JSON data can be ingested directly.\n - Sneller uses a [hybrid approach between columnar and row-oriented data layouts](https://sneller.io/blog/zion-format/)\n   to provide lightweight ingest, low storage footprint, and super fast scanning speeds.\n\n[Sneller Cloud](https://console.sneller.io/register) gives you access to a hosted version of the Sneller SQL engine\nthat runs directly on data stored entirely in **your S3 buckets**.\nOur cloud platform offers excellent performance and is priced at an extremely competitive \\$150 **per petabyte** of data scanned.\n\n\u003c!-- TODO: Grafana Demo --\u003e\n\n## Browser Demo\n\nYou can run queries **for free** against Sneller Cloud from your browser through our [playground](https://sneller.io/playground).\nWe've created [a public table containing about 1 billion rows](https://sneller.io/playground) from the [GitHub archive](https://www.gharchive.org) data set.\nAdditionally, you can create new ephemeral tables by uploading your own JSON data (but please don't upload anything sensitive!)\n\nThe Sneller playground is also usable directly with a local http client like `curl`:\n\n[![asciicast](https://asciinema.org/a/580308.svg)](https://asciinema.org/a/580308)\n\n## Local Demo\n\n[![asciicast](https://asciinema.org/a/eOjVUwlA7ZYXTGtC6PpsupR2O.svg)](https://asciinema.org/a/eOjVUwlA7ZYXTGtC6PpsupR2O)\n\nIf you have `go` installed on a machine with AVX512, you can build tables\nfrom JSON files and run the query engine locally:\n\n```console\n$ grep -q avx512 /proc/cpuinfo \u0026\u0026 echo \"yes, I have AVX512\"\nyes, I have AVX512\n$ # install the sdb tool (make sure $GOBIN is in your $PATH)\n$ go install github.com/SnellerInc/sneller/cmd/sdb@latest\n$ # pack a JSON object into a table that can be queried;\n$ # here we're using some github archive JSON:\n$ wget https://data.gharchive.org/2015-01-01-15.json.gz\n$ sdb pack -o github.zion 2015-01-01-15.json.gz\n$ # run a query, using JSON as the output format:\n$ sdb query -v -fmt=json \"select count(*), type from read_file('github.zion') group by type\"\n{\"type\": \"CreateEvent\", \"count\": 1471}\n{\"type\": \"PushEvent\", \"count\": 5815}\n{\"type\": \"WatchEvent\", \"count\": 1230}\n{\"type\": \"ReleaseEvent\", \"count\": 60}\n{\"type\": \"PullRequestEvent\", \"count\": 474}\n{\"type\": \"IssuesEvent\", \"count\": 545}\n{\"type\": \"ForkEvent\", \"count\": 355}\n{\"type\": \"GollumEvent\", \"count\": 61}\n{\"type\": \"IssueCommentEvent\", \"count\": 844}\n{\"type\": \"DeleteEvent\", \"count\": 260}\n{\"type\": \"PullRequestReviewCommentEvent\", \"count\": 136}\n{\"type\": \"CommitCommentEvent\", \"count\": 73}\n{\"type\": \"MemberEvent\", \"count\": 25}\n{\"type\": \"PublicEvent\", \"count\": 2}\n18874368 bytes (18.000 MiB) scanned in 1.475857ms 12.5GiB/s\n```\n\nSee our [SQL reference](https://sneller.io/docs/sql-reference) for more information\non the Sneller SQL dialect.\n\nIf you don't have access to a physical machine with AVX512 support,\nwe recommend renting a VM from one of the major cloud providers with\none of these instance families:\n\n - AWS: c6i, m6i, r6i\n - GCP: N2, M2, C2, C3\n - Azure: Dv4, Ev4\n\n## Sneller Cloud\n\nOur [cloud platform](https://console.sneller.ai/) simplifies the Sneller SQL\nuser experience by giving you instant access to thousands of CPU cores to run your queries.\nSneller Cloud also provides automatic synchronization between your source data and your\nSQL tables, so you don't have any batch processes to manage in order to keep your tables\nup-to-date. Our cloud solution has a simple usage-based pricing model that depends entirely\non the amount of data your queries scan. (Since Sneller Cloud doesn't store any of your\ndata, there are no additional storage charges.)\n\n## Performance\n\nSneller is generally able to provide end-to-end scanning performance in excess of 1GB/s/core\non high-core-count machines. The core SQL engine is typically able to saturate the memory\nbandwidth of the machine; generally about half of the query execution time is spent\ndecompressing the source data, and the other half is spent in the SQL engine itself.\nScanning performance scales linearly with the number of CPU cores available,\nso for example a 1000-CPU cluster would generally provide scanning performance\nin excess of 1TB/s.\n\nThe `zion` [compression format that the SQL engine consumes is \"bucketized\"](https://sneller.io/blog/zion-format/) so that\nqueries that don't touch all of the fields in the source data consume fewer cycles\nduring decompression. Concretely, the top-level fields in each record are hashed\ninto one of 16 buckets, and each of these buckets is compressed separately.\nThe query planner determines which fields are referenced by each query, and at\nexecution time only the buckets that contain fields necessary to compute the final\nquery result are actually decompressed. (Strictly columnar formats like Parquet\nstripe data into one bucket per column, with the restriction that the columns\nand their types are known in advance. Since Sneller operates on un-structured\ndata, our solution needed to be completely agnostic to the structure of the data itself.)\n\n\u003c!-- FIXME: add a link to a blog post about the zion format --\u003e\n\n## License\n\nSneller is released under the Apache 2.0 license. See the LICENSE file for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSnellerInc%2Fsneller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSnellerInc%2Fsneller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSnellerInc%2Fsneller/lists"}