{"id":14155077,"url":"https://github.com/kamu-data/kamu-cli","last_synced_at":"2026-02-17T12:19:01.508Z","repository":{"id":37774889,"uuid":"187523334","full_name":"kamu-data/kamu-cli","owner":"kamu-data","description":"Next-generation decentralized data lakehouse and a multi-party stream processing network","archived":false,"fork":false,"pushed_at":"2025-05-13T10:49:40.000Z","size":38773,"stargazers_count":318,"open_issues_count":141,"forks_count":15,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-05-13T11:39:50.585Z","etag":null,"topics":["blockchain","data-as-code","data-management","data-science","datafusion","flink","jupyter","kamu","open-data","open-data-fabric","spark","sql"],"latest_commit_sha":null,"homepage":"https://kamu.dev","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kamu-data.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-05-19T20:12:35.000Z","updated_at":"2025-05-12T18:58:23.000Z","dependencies_parsed_at":"2024-03-11T04:22:33.575Z","dependency_job_id":"365cc813-c2ee-475d-b178-afceea6ffb64","html_url":"https://github.com/kamu-data/kamu-cli","commit_stats":{"total_commits":1122,"total_committers":10,"mean_commits":112.2,"dds":"0.14349376114082002","last_synced_commit":"011dd4e91aa37cc7f4736c38f451c7290dee5e91"},"previous_names":[],"tags_count":390,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamu-data%2Fkamu-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamu-data%2Fkamu-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamu-data%2Fkamu-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kamu-data%2Fkamu-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kamu-data","download_url":"https://codeload.github.com/kamu-data/kamu-cli/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253938907,"owners_count":21987405,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blockchain","data-as-code","data-management","data-science","datafusion","flink","jupyter","kamu","open-data","open-data-fabric","spark","sql"],"created_at":"2024-08-17T08:01:47.973Z","updated_at":"2026-02-17T12:19:01.478Z","avatar_url":"https://github.com/kamu-data.png","language":"Rust","funding_links":[],"categories":["blockchain"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg alt=\"Kamu: Planet-scale data pipeline\" src=\"docs/readme_files/kamu_logo.png\" width=300/\u003e\n\n[Website] | [Docs] | [Demo] | [Tutorials] | [Examples] | [FAQ] | [Chat]\n\n[![Release](https://img.shields.io/github/v/release/kamu-data/kamu-cli?include_prereleases\u0026logo=rust\u0026logoColor=orange\u0026style=for-the-badge)](https://github.com/kamu-data/kamu-cli/releases/latest)\n[![Docs](https://img.shields.io/static/v1?logo=gitbook\u0026logoColor=white\u0026label=\u0026message=Docs\u0026color=gray\u0026style=for-the-badge)](https://docs.kamu.dev/cli/)\n[![CI](https://img.shields.io/github/actions/workflow/status/kamu-data/kamu-cli/build.yaml?logo=githubactions\u0026label=CI\u0026logoColor=white\u0026style=for-the-badge\u0026branch=master)](https://github.com/kamu-data/kamu-cli/actions)\n[![Chat](https://shields.io/discord/898726370199359498?style=for-the-badge\u0026logo=discord\u0026label=Discord)](https://discord.gg/nU6TXRQNXC)\n\n\n\u003c/p\u003e\n\u003c/div\u003e\n\n## About\n`kamu` *(pronounced [kæmˈuː](https://en.wikipedia.org/wiki/Albert_Camus))* is a command-line tool for management and verifiable processing of structured data.\n\nIt's a green-field project that aims to enable **global collaboration on data** on the same scale as seen today in software.\n\nYou can think of `kamu` as:\n- ***Local-first data lakehouse*** - a free alternative to Databricks / Snowflake / Microsoft Fabric that can run on your laptop without any accounts, and scale to a large on-prem cluster\n- ***Kubernetes for data pipelines*** - an *infrastructure-as-code* framework for building ETL pipelines using [wide range of open-source SQL engines](https://docs.kamu.dev/cli/supported-engines/)\n- ***Git for data*** - a tamper-proof ledger that handles data ownership and preserves full history of changes to source data\n- ***Blockchain for data*** - a verifiable computing system for transforming data and recording fine-grain provenance and lineage\n- ***Peer-to-peer data network*** - a set of [open data formats and protocols](https://github.com/open-data-fabric/open-data-fabric/) for:\n  - Non-custodial data sharing\n  - Federated querying of global data as if one giant database\n  - Processing pipelines that can span across multiple organizations.\n\n\n### Featured Video\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://www.youtube.com/watch?v=c9UCjJdvJAU\"\u003e\u003cimg alt=\"Kamu: Unified On/Off-Chain Analytics Tutorial\" src=\"https://img.youtube.com/vi/c9UCjJdvJAU/0.jpg\" width=\"50%\"/\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\n## Quick Start\nUse the installer script _(Linux / MacOSX / WSL2)_:\n```sh\ncurl -s \"https://get.kamu.dev\" | sh\n```\n\n* Watch [introductory videos](https://www.youtube.com/watch?v=oUTiWW6W78A\u0026list=PLV91cS45lwVG20Hicztbv7hsjN6x69MJk) to see `kamu` in action\n* Follow the [\"Getting Started\"](https://docs.kamu.dev/welcome/) guide through an online demo and installation instructions.\n\n\n## How it Works\n\n### Ingest from any source\n`kamu` works well with popular data extractors like Debezium and provides [many built-in sources](https://docs.kamu.dev/cli/ingest/) ranging from polling data on the web to MQTT broker and blockchain logs.\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"Ingesting data\" src=\"docs/readme_files/pull-multi.gif\" width=\"65%\"/\u003e\n\u003c/div\u003e\n\n\n### Track tamper-proof history\nData is stored in [Open Data Fabric](https://github.com/open-data-fabric/open-data-fabric/) (ODF) format - an open **Web3-native format** inspired by Apache Iceberg and Delta.\n\nIn addition to \"table\" abstraction on top of Parquet files, ODF provides:\n- Cryptographic integrity and commitments\n- Stable references over real-time data\n- Decentralized identity, ownership, attribution, and permissions (based on [W3C DIDs](https://www.w3.org/TR/did-core/))\n- Rich extensible metadata (e.g. licenses, attachments, semantics)\n- Compatibility with decentralized storages like [IPFS](https://ipfs.tech)\n\nUnlike Iceberg and Delta that encourage continuous loss of history through Change-Data-Capture, ODF format is **history-preserving**. It encourages working with data in the [event form](https://www.kamu.dev/blog/a-brief-history-of-time-in-data-modelling-olap-systems/), and dealing with inaccuracies through [explicit retractions and corrections](https://docs.kamu.dev/cli/transform/retractions-corrections/).\n\n\n### Explore, query, document\n`kamu` offers a wide range of [integrations](https://docs.kamu.dev/cli/integrations/), including:\n- Embedded SQL shell for quick EDA\n- Integrated Jupyter notebooks for ML/AI\n- Embedded Web UI with SQL editor and metadata explorer\n- Apache Superset and many other BI solutions\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"SQL Shell\" src=\"docs/readme_files/sql.gif\" width=\"65%\"/\u003e\n\n\u003cimg alt=\"Integrated Jupyter notebook\" src=\"docs/readme_files/notebook-005.png\" width=\"65%\"/\u003e\n\u003c/div\u003e\n\n\n### Build enterprise-grade ETL pipelines\nData in `kamu` can only be [transformed through code](https://docs.kamu.dev/cli/transform/). An SQL query that cleans one dataset or combines two via JOIN can be used to create a **derivative dataset**.\n\n`kamu` doesn't implement data processing itself - it integrates [many popular data engines](https://docs.kamu.dev/cli/supported-engines/) *(Flink, Spark, DataFusion...)* as plugins, so you can build an ETL flow that uses the strengths of different engines at different steps of the pipeline:\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg alt=\"Complex ETL pipeline in Kamu Web UI\" src=\"docs/readme_files/web-ui.png\" width=\"65%\"/\u003e\n\u003c/div\u003e\n\n\n### Get near real-time consistent results\nAll derivative datasets use **stream processing** that results in some [revolutionary qualities](https://www.kamu.dev/blog/end-of-batch-era/):\n- Input data is only read once, minimizing the traffic\n- Configurable balance between low-latency and high-consistency\n- High autonomy - once pipeline is written it can run and deliver fresh data forever with little to no maintenance. \n\n\n### Share datasets with others\nODF datasets can be shared via any [conventional](https://docs.kamu.dev/cli/collab/repositories/) (S3, GCS, Azure) and [decentralized](https://docs.kamu.dev/cli/collab/ipfs/) (IPFS) storage and easily replicated. Sharing a large dataset is simple as:\n\n```sh\nkamu push covid19.case-details \"s3://datasets.example.com/covid19.case-details/\"\n```\n\nBecause dataset **identity is an inseparable part of the metadata** - dataset can be copied, but everyone on the network will know who the owner is.\n\n\n### Reuse verifiable data\n`kamu` will store the transformation code in the dataset metadata and ensure that it's **deterministic and reproducible**. This is a form of **verifiable computing**.\n\nYou can send a dataset to someone else and they can confirm that the data they see in fact corresponds to the inputs and code:\n\n```sh\n# Download the dataset\nkamu pull \"s3://datasets.example.com/covid19.case-details/\"\n\n# Attempt to verify the transformations\nkamu verify --recursive covid19.case-details\n```\n\nVerifiability allows you to [establish trust](https://docs.kamu.dev/cli/collab/validity/) in data processed by someone you don't even know and detect if they act maliciously.\n\nVerifiable trust allows people to **reuse and collaborate** on data on a global scale, similarly to open-source software.\n\n\n### Query world's data as one big database\nThrough federation, data in different locations can be queried as if it was in one big data lakehouse - `kamu` will take care of how to compute results most optimally, potentially delegating parts of the processing to other nodes.\n\nEvery query result is accompanied by a **cryptographic commitment** that you can use to reproduce the same query days or even months later.\n\n\n### Start small and scale progressively\n`kamu` offers unparalleled flexibility of deployment options:\n- You can build, test, and debug your data projects and pipelines on a laptop\n- Incorporate online storage for larger volumes, but keep processing it locally\n- When you need real-time processing and 24/7 querying you can run the same pipelines with [`kamu-node`](https://github.com/kamu-data/kamu-node/) as a small server\n- A node can be deployed in Kubernetes and scale to a large cluster.\n\n\n### Get data to and from blockchains\nUsing `kamu` you can easily [read on-chain data](https://docs.kamu.dev/cli/ingest/blockchain-source/) to run analytics on smart contracts, and provide data to blockchains via novel [Open Data Fabric oracle](https://docs.kamu.dev/node/protocols/oracle/).\n\n\n## Community\nIf you like what we're doing - support us by starring the repo, this helps us a lot!\n\nSubscribe to our [YouTube channel](https://www.youtube.com/channel/UCWciDIWI_HsJ6Md_DdyJPIQ) to get fresh tech talks and deep dives.\n\nStop by and say \"hi\" in our [Discord Server](https://discord.gg/nU6TXRQNXC) - we're always happy to chat about data.\n\nIf you'd like to contribute [start here](https://docs.kamu.dev/contrib/).\n\n---\n\n\u003cdiv align=\"center\"\u003e\n  \n[Website] | [Docs] | [Tutorials] | [Examples] | [FAQ] | [Chat] | [Contributing] | [Developer Guide] | [License]\n\n[![dependency status](https://deps.rs/repo/github/kamu-data/kamu-cli/status.svg?\u0026style=for-the-badge)](https://deps.rs/repo/github/kamu-data/kamu-cli)\n\n\n\u003c/div\u003e\n\n[Tutorials]: https://docs.kamu.dev/cli/learn/learning-materials/\n[Examples]: https://docs.kamu.dev/cli/learn/examples/\n[Docs]: https://docs.kamu.dev/welcome/\n[Demo]: https://demo.kamu.dev/\n[FAQ]: https://docs.kamu.dev/cli/get-started/faq/\n[Chat]: https://discord.gg/nU6TXRQNXC\n[Contributing]: https://docs.kamu.dev/contrib/\n[Developer Guide]: ./DEVELOPER.md\n[License]: https://docs.kamu.dev/contrib/license/\n[Website]: https://kamu.dev\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkamu-data%2Fkamu-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkamu-data%2Fkamu-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkamu-data%2Fkamu-cli/lists"}