{"id":13615958,"url":"https://github.com/moby/datakit","last_synced_at":"2025-05-16T05:06:40.632Z","repository":{"id":37677902,"uuid":"51462605","full_name":"moby/datakit","owner":"moby","description":"Connect processes into powerful data pipelines with a simple git-like filesystem interface","archived":false,"fork":false,"pushed_at":"2023-08-21T22:34:00.000Z","size":3910,"stargazers_count":1099,"open_issues_count":33,"forks_count":155,"subscribers_count":43,"default_branch":"master","last_synced_at":"2025-04-08T15:08:19.960Z","etag":null,"topics":["data-flow","database","datakit","docker","filesystem-api","pipeline"],"latest_commit_sha":null,"homepage":"","language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moby.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-10T18:38:37.000Z","updated_at":"2025-03-28T01:19:10.000Z","dependencies_parsed_at":"2024-05-23T04:44:11.358Z","dependency_job_id":"a09ba758-c983-45b9-9d39-7e6172c44324","html_url":"https://github.com/moby/datakit","commit_stats":null,"previous_names":["docker/datakit"],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moby%2Fdatakit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moby%2Fdatakit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moby%2Fdatakit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moby%2Fdatakit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moby","download_url":"https://codeload.github.com/moby/datakit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471061,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-flow","database","datakit","docker","filesystem-api","pipeline"],"created_at":"2024-08-01T20:01:21.249Z","updated_at":"2025-05-16T05:06:38.667Z","avatar_url":"https://github.com/moby.png","language":"OCaml","readme":"## DataKit -- Orchestrate applications using a Git-like dataflow\n\n*DataKit* is a tool to orchestrate applications using a Git-like dataflow. It\nrevisits the UNIX pipeline concept, with a modern twist: streams of\ntree-structured data instead of raw text. DataKit allows you to define\ncomplex build pipelines over version-controlled data.\n\nDataKit is currently used as the coordination\nlayer for [HyperKit](http://github.com/docker/hyperkit), the\nhypervisor component of\n[Docker for Mac and Windows](https://blog.docker.com/2016/03/docker-for-mac-windows-beta/), and\nfor the [DataKitCI][] continuous integration system.\n\n---\n\n[![Build Status (OSX, Linux)](https://travis-ci.org/moby/datakit.svg)](https://travis-ci.org/moby/datakit)\n[![Build status (Windows)](https://ci.appveyor.com/api/projects/status/6qrdgiqbhi4sehmy/branch/master?svg=true)](https://ci.appveyor.com/project/moby/datakit/branch/master)\n[![docs](https://img.shields.io/badge/doc-online-blue.svg)](https://docker.github.io/datakit/)\n\nThere are several components in this repository:\n\n- `src` contains the main DataKit service. This is a Git-like database to which other services can connect.\n- `ci` contains [DataKitCI][], a continuous integration system that uses DataKit to monitor repositories and store build results.\n- `ci/self-ci` is the CI configuration for DataKitCI that tests DataKit itself.\n- `bridge/github` is a service that monitors repositories on GitHub and syncs their metadata with a DataKit database.\n  e.g. when a pull request is opened or updated, it will commit that information to DataKit. If you commit a status message to DataKit, the bridge will push it to GitHub.\n- `bridge/local` is a drop-in replacement for `bridge/github` that just monitors a local Git repository. This is useful for local testing.\n\n### Quick Start\n\nThe easiest way to use DataKit is to start both the server and the client in containers.\n\nTo expose a Git repository as a 9p endpoint on port 5640 on a private network, run:\n\n```shell\n$ docker network create datakit-net # create a private network\n$ docker run -it --net datakit-net --name datakit -v \u003cpath/to/git/repo\u003e:/data datakit/db\n```\n\n*Note*: The `--name datakit` option is mandatory.  It will allow the client\nto connect to a known name on the private network.\n\nYou can then start a DataKit client, which will mount the 9p endpoint and\nexpose the database as a filesystem API:\n\n```shell\n# In an other terminal\n$ docker run -it --privileged --net datakit-net datakit/client\n$ ls /db\nbranch     remotes    snapshots  trees\n```\n\n*Note*: the `--privileged` option is needed because the container will have\nto mount the 9p endpoint into its local filesystem.\n\nNow you can explore, edit and script `/db`. See the\n[Filesystem API][]\nfor more details.\n\n### Building\n\nThe easiest way to build the DataKit project is to use [docker](https://docker.com),\n(which is what the\n[start-datakit.sh](https://github.com/moby/datakit/blob/master/scripts/start-datakit.sh) script\ndoes under the hood):\n\n```shell\ndocker build -t datakit/db -f Dockerfile .\ndocker run -p 5640:5640 -it --rm datakit/db --listen-9p=tcp://0.0.0.0:5640\n```\nThese commands will expose the database's 9p endpoint on port 5640.\n\nIf you want to build the project from source without Docker, you will need to install\n[ocaml](http://ocaml.org/) and [opam](http://opam.ocaml.org/). Then write:\n\n```shell\n$ make depends\n$ make \u0026\u0026 make test\n```\n\nFor information about command-line options:\n\n```shell\n$ datakit --help\n```\n\n## Prometheus metric reporting\n\nRun with `--listen-prometheus 9090` to expose metrics at `http://*:9090/metrics`.\n\nNote: there is no encryption and no access control. You are expected to run the\ndatabase in a container and to not export this port to the outside world. You\ncan either collect the metrics by running a Prometheus service in a container\non the same Docker network, or front the service with nginx or similar if you\nwant to collect metrics remotely.\n\n## Language bindings\n\n* **Go** bindings are in the `api/go` directory.\n* **OCaml** bindings are in the `api/ocaml` directory. See `examples/ocaml-client` for an example.\n\n## Licensing\n\nDataKit is licensed under the Apache License, Version 2.0. See\n[LICENSE](https://github.com/moby/datakit/blob/master/LICENSE.md) for the full\nlicense text.\n\nContributions are welcome under the terms of this license. You may wish to browse\nthe [weekly reports](reports) to read about overall activity in the repository.\n\n[DataKitCI]: https://github.com/moby/datakit/tree/master/ci\n[Filesystem API]: https://github.com/moby/datakit/tree/master/9p.md\n","funding_links":[],"categories":["OCaml","pipeline"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoby%2Fdatakit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoby%2Fdatakit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoby%2Fdatakit/lists"}