{"id":13671721,"url":"https://github.com/iesahin/xvc","last_synced_at":"2025-06-28T13:02:50.834Z","repository":{"id":61644401,"uuid":"552048819","full_name":"iesahin/xvc","owner":"iesahin","description":"A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)","archived":false,"fork":false,"pushed_at":"2025-04-22T13:29:37.000Z","size":7045,"stargazers_count":51,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-22T13:52:41.542Z","etag":null,"topics":["command-line-tool","data","data-engineering","data-pipelines","data-science","devops","machine-learning","machine-learning-engineering","mlops","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iesahin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-10-15T17:44:29.000Z","updated_at":"2025-04-09T05:45:14.000Z","dependencies_parsed_at":"2023-11-13T17:26:04.171Z","dependency_job_id":"6021f9f2-9b0c-4ba4-b95f-16181674ca09","html_url":"https://github.com/iesahin/xvc","commit_stats":{"total_commits":303,"total_committers":3,"mean_commits":101.0,"dds":0.04950495049504955,"last_synced_commit":"49a56e450b73d012998b4627501144cf3937eba5"},"previous_names":[],"tags_count":57,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iesahin%2Fxvc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iesahin%2Fxvc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iesahin%2Fxvc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iesahin%2Fxvc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iesahin","download_url":"https://codeload.github.com/iesahin/xvc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251187280,"owners_count":21549613,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","data","data-engineering","data-pipelines","data-science","devops","machine-learning","machine-learning-engineering","mlops","rust"],"created_at":"2024-08-02T09:01:17.106Z","updated_at":"2025-06-28T13:02:50.825Z","avatar_url":"https://github.com/iesahin.png","language":"Rust","funding_links":[],"categories":["Rust","rust","data"],"sub_categories":[],"readme":"# xvc\n\n[![codecov](https://codecov.io/gh/iesahin/xvc/branch/main/graph/badge.svg?token=xa3ru5KhRq)](https://codecov.io/gh/iesahin/xvc)\n[![build](https://img.shields.io/github/actions/workflow/status/iesahin/xvc/rust.yml?branch=main)](https://github.com/iesahin/xvc/actions/workflows/rust.yml)\n[![crates.io](https://img.shields.io/crates/v/xvc)](https://crates.io/crates/xvc)\n[![docs.rs](https://img.shields.io/docsrs/xvc)](https://docs.rs/xvc/)\n[![unsafe forbidden](https://img.shields.io/badge/unsafe-forbidden-success.svg)](https://github.com/rust-secure-code/safety-dance/)\n\nManage your data next to code in Git repositories and run commands when they change. \n\n## ⌛ Why Xvc?\n\n- You have image, audio, media, document or asset files to\n[track/version/backup][xvc-file-track] along with the code, but [don't want to\ncopy][xvc-file-recheck] that huge data to all Git clones.\n\n- You want to [manage][xvc-file-list] files in multiple locations with\n[different subsets][xvc-file-copy], some (e.g. training data) being read-only\nand some (e.g. models, executables) change frequently, all versioned along with\nthe code. \n\n- You want to [store][xvc-s-n] this data in [S3-compatible cloud\nstorages][xvc-s-n-s3] or [local][xvc-s-n-local] directories, or your\npreconfigured [Rsync][xvc-s-n-rsync] and [Rclone][xvc-s-n-rclone] remotes to\nshare with the repository. \n\n- You want to [specify commands][xvc-p-s-n] that [run][xvc-p-r] when\nonly input data changes, define [pipelines][xvc-p-n] with steps that\nrun when only their [dependencies][xvc-p-s-d] change.\n\n- You want to define these dependencies with\n[files][xvc-p-s-d-file],\n[globs][xvc-p-s-d-glob] spanning multiple files, text file\nlines defined by [ranges][xvc-p-s-d-line] or\n[regexes][xvc-p-s-d-line],\n[URLs][xvc-p-s-d-url],\n[parameters][xvc-p-s-d-params] in the YAML or JSON files,\n[SQLite queries][xvc-p-s-d-sqlite] or [any\ncommand][xvc-p-s-d-generic] that produces output. \n\n\u003cdetails\u003e\n  \u003csummary\u003e \u003cstrong\u003e 🔽 Installation\u003c/strong\u003e\u003c/summary\u003e\n\nYou can get the binary files for Linux, macOS, and Windows from [releases]\npage. Extract and copy the file to your `$PATH`.\n\n\nAlternatively, if you have Rust [installed], you can build xvc:\n\n```shell\n$ cargo install xvc\n```\n\n\nIf you want to use Xvc with Python console and Jupyter notebooks, you can also\ninstall it with `pip`:\n\n```shell\n$ pip install xvc\n```\n\nNote that pip installation doesn't make `xvc` available as a shell command.\nPlease see [xvc.py] for details.\n\n\n### Completions\n\nXvc supports dynamic completions for bash, zsh, elvish, fish and powershell. For example, run the following to add completions for bash:\n\n```bash\necho \"source \u003c(COMPLETE=bash xvc)\" \u003e\u003e ~/.bashrc\n```\n\nSee [completions] section in the docs for others.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e🚀\n    \u003cstrong\u003e Initialize a directory for Xvc\u003c/strong\u003e\n  \u003c/summary\u003e\n\n```bash\n$ xvc init\n```\n\n[This command][xvc-init] initializes the `.xvc/` directory and adds a\n`.xvcignore` file for specifying paths you wish to hide from Xvc.\n\n  \u003e 💡\n  \u003e Git is **not required** to run Xvc. However running Xvc with Git is usually\n  \u003e a good idea. Xvc can stage/commit metadata files (under `.xvc/`) used to\n  \u003e track binary files and you can use branches for versioning as well. By\n  \u003e default, you won't have to deal with Git commands to commit these metadata\n  \u003e files. Xvc can manage the files it updates and hides your binary files from\n  \u003e Git by default. \n  \u003e \n  \u003e If you don't want to use Xvc with Git, use `--no-git` option when\n  \u003e initializing.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n    👣\n    \u003cstrong\u003eTrack binary files\u003c/strong\u003e\n  \u003c/summary\u003e\n\nAdd your data files and directories for tracking:\n\n```shell\n$ xvc file track my-data/\n```\n\n[This command][xvc-file-track] calculates content\nhashes for data (using BLAKE-3, by default) and records them. Files are moved\nto content-addressed directories under `.xvc/b3`. Then they are copied to the\nworkspace. \n\n  \u003e 💡**Tip**:\n  \u003e You can specify different [recheck (checkout)\n  \u003e methods][xvc-file-recheck] for files and\n  \u003e directories depending on your use case. Symlinks and hardlinks to the\n  \u003e files under Xvc cache don't consume additional space but they are readonly.\n  \u003e You can also use (copy-on-write) reflinks if your file system supports it\n  \u003e and Xvc is built with `reflink` feature. \n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e🫧 \n    \u003cstrong\u003eCheckout a subset of files as symlinks\u003c/strong\u003e\n\u003c/summary\u003e\n\nYou can [copy][xvc-file-copy] and [recheck][xvc-file-recheck] (checkout)\nsubsets of files from Xvc cache as symlinks to create multiple _views_. This is\nuseful when you need a read-only access that won't consume additional space.\n\n```bash\n$ xvc file copy my-data/ another-view-to-my-data/\n$ xvc file recheck another-view-to-my-data/ --as symlink\n```\n\n  \u003e 💡\n  \u003e [`xvc file copy`][xvc-file-copy] and [`xvc file move`][xvc-file-move]\n  \u003e doesn't require file contents to be available. Xvc works only with their\n  \u003e metadata and you can organize files without their content copied to\n  \u003e workspace or cache. \n  \n  \u003e 💡 If you installed [completions] to your shell, Xvc completes file names\n  \u003e even if they are not available in your local paths.\n\n\u003c/details\u003e\n\n\n\n\u003cdetails\u003e\n  \u003csummary\u003e 🌁 \n    \u003cstrong\u003eSend files to the cloud services\u003c/strong\u003e\n  \u003c/summary\u003e\n\nConfigure a cloud storage to share the files you track with Xvc.\n\n```shell\n$ xvc storage new s3 --name my-storage --region us-east-1 --bucket-name xvc\n```\n\nYou can send the files to this storage.\n\n```shell\n$ xvc file send --to my-storage\n```\n\nYou can also send a subset of the files.\n\n```shell\n$ xvc file send 'my-data/training/*' --to my-storage\n```\n\nXvc [supports][xvc-s-n] [external directories][xvc-s-n-local], [rclone\nremotes][xvc-s-n-rclone], [Rsync][xvc-s-n-rsync], [AWS S3][xvc-s-n-s3], [Google\nCloud Storage][xvc-s-n-gcs], [MinIO][xvc-s-n-minio], [Cloudflare\nR2][xvc-s-n-r2], [Wasabi][xvc-s-n-wasabi], [Digital Ocean Spaces][xvc-s-n-do].\nPlease [create an issue] if you want Xvc to support another cloud storage\nservice.\n\n\u003e 💡 Xvc also supports any command to upload/download files. If your favorite\n\u003e service is not listed or you want to use another tool (s5cmd, rclone, etc.),\n\u003e you can specify a [generic][xvc-s-n-generic] storage by supplying shell\n\u003e commands to upload and download. \n\n\u003e 📌 **Important**:\n\u003e Xvc never stores credentials to your connections and expects them to be\n\u003e available in the environment. It _never_ makes network requests (for\n\u003e tracking, statistics, etc.) without your knowledge. You can [compile] without\n\u003e cloud connection support in case you want to make sure that it makes no\n\u003e connections to outside services.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e 🪣 \n    \u003cstrong\u003eGet files from cloud services\u003c/strong\u003e\n  \u003c/summary\u003e\n\nWhen you (or someone else) want to access these files later, you can clone the\nGit repository and [get the files][xvc-file-bring] from the storage.\n\n```shell\n$ git clone https://example.com/my-machine-learning-project\nCloning into 'my-machine-learning-project'...\n\n$ cd my-machine-learning-project\n$ xvc file bring my-data/ --from my-storage\n\n```\n\nThis approach ensures convenient access to files from the shared storage when\nneeded.\n\n  \u003e 💡**Tip**:\n  \u003e You don't have to reconfigure the storage after cloning, but you need to\n  \u003e have valid credentials as environment variables to access the storage. Xvc\n  \u003e never stores any credentials.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e 🫖\n    \u003cstrong\u003eShare files from cloud storages for a limited time\u003c/strong\u003e \n  \u003c/summary\u003e\n  \n  You can share Xvc tracked files from S3 compatible storages for a specified period.\n\n```shell\n$ xvc file share --storage my-storage dir-0001/file-0001.bin --duration 1h\nhttps://my-storage.s3.eu-central-1.amazonaws.com/xvc....\n```\n\nYou can share the link with others and they will be able to access to the file\nhour. The default period is 24 hours.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e 🥤\u003cstrong\u003eCreate a data pipeline\u003c/strong\u003e\u003c/summary\u003e\n\nSuppose you have a script to preprocess files in a directory and you want to\nrun this when the files in `my-data/train` directory changes. We first define a\nstep in the pipeline that will run the script.\n\n```bash\n$ xvc pipeline step new --step-name preprocess --command 'python3 src/preprocess.py'\n```\n\nEach command is associated with a step and each step has a command.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e 🔗 \u003cstrong\u003eAdd a dependency to a pipeline step\u003c/strong\u003e\u003c/summary\u003e\n\nWhen we want to create a dependency for a command, we use [`xvc pipeline step\ndependency`][xvc-pipeline-step-dependency] command with various parameters. \n\nWe want to define to dependencies for the `preprocess` step we created previously. \nWe'll make `preprocess` step to depend on:\n\n- The `src/preprocess.py` source file itself, so when we change the script, we'll run the step again\n\n```bash\n$ xvc pipeline step dependency --step-name preprocess --file src/preprocess.py\n```\n\n- `data/raw/*.jpg` files that the script works on.\n\n```bash\n$ xvc pipeline step dependency -s preprocess --glob 'data/raw/*jpg'\n```\n\n\u003e ⚠️ Most of the shells expand globs before running the command, so you need to\n\u003e quote glob to pass these as strings without expansion. Xvc expands these\n\u003e globs itself. \n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e 🛝 \u003cstrong\u003eRun pipeline\u003c/strong\u003e\u003c/summary\u003e\n\nAfter you define the pipeline, you can run it by:\n\n```bash\n$ xvc pipeline run\n[DONE] preprocess (python3 src/preprocess.py)\n[OUT] [preprocess] \n...\n\n[DONE] preprocess (python3 src/preprocess.py)\n\n```\n\n\u003e 💡 Xvc runs pipeline steps in parallel if they are not interdependent. You\n\u003e can specify the maximum number of parallel processes.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e 🪡 \n    \u003cstrong\u003eAdd fine grained dependencies to steps\u003c/strong\u003e\n  \u003c/summary\u003e\n\nXvc allows many kinds of dependencies: \n\n- Steps can explicitly depend on [other steps][xvc-p-s-d-step] when they are required to run serially. \n\n- Steps can depend on [single files][xvc-p-s-d-file] or groups of files defined\nby [globs][xvc-p-s-d-glob]. For globs, you can also get which files are added,\ndeleted or updated with [glob-items][xvc-p-s-d-glob-items].\n\n  \u003e 💡 Similar to Git, Xvc doesn't track directories per se. You can define\n  \u003e glob dependencies that describe files in directory like `dir/*` when you\n  \u003e want to track all files in in. \n\n\n- You can specify steps to depend only to a subset of lines in a file with\n[line ranges][xvc-p-s-d-line] or [regular expressions][xvc-p-s-d-regex]. You\ncan also get which lines are added, deleted or updated with more granular\n[line-items][xvc-p-s-d-line-items] or [regex-items][xvc-p-s-d-regex-items]\ndependencies. \n\n- If you track (hyper)parameters for building/model training process in JSON or\nYAML files, you can specify steps to [depend on these parameters][xvc-p-s-d-params]. \n\n- If you want your steps to run when an HTTP(S) URL's content change, you can\nspecify this with [URL dependencies][xvc-p-s-d-url]\n\n\n- If you want your step to run when the output from an SQLite query change, you can specify it with [SQLite dependencies.][xvc-p-s-d-sqlite]\n\n- If none of the dependency types are fit for your needs, you can also specify a [command][xvc-p-s-d-generic] that will be run to check if a step is invalidated. \n\n  \u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e 🖇️ \u003cstrong\u003eExample to add a dependency when only certain lines in a file change\u003c/strong\u003e\u003c/summary\u003e\n\nSuppose you have a list of IQ scores in a file. \n\n```csv\nAda Harris,128\nAlan Thompson,125\nBrian Shaffer,122\nBrian Wilson,94\nDr. Brittany Chang,103\nBrittany Smith,104\nDavid Brown,113\nEmily Davis,97\nGrace White,130\nJames Taylor,101\nDr. Jane Doe,105\nJessica Lee,102\nJohn Smith,110\nLaura Martinez,110\nDr. Linus Martin,118\nMallory Johnson,105\nMallory Payne MD,99\nMargaret Clark,122\nMichael Johnson,92\nRobert Anderson,105\nSarah Wilson,104\nSherry Brown,115\nSherry Leonard,117\nSusan Davis,107\nDr. Susan Swanson,132\n```\n\n\nWe're only interested in the IQ scores of those with _Dr._ in front of\ntheir names. Let's create a regex search dependency to run a command when only\na line with a _Dr._ title is added to the file. \n\n\nOur command will be collecting all lines with an initial _Dr._ to another file. \n\n```bash\n$ xvc pipeline step new --step-name dr-iq --command 'echo \"${XVC_ADDED_REGEX_ITEMS}\" \u003e\u003e dr-iq-scores.csv '\n$ xvc pipeline step dependency --step-name dr-iq --regex-items 'iq-scores.csv:/^Dr\\..*'\n```\n\nThe first line specifies a command, when run writes `${XVC_ADDED_REGEX_ITEMS}`\nenvironment variable to `dr-iq-scores.csv` file.\n\nThe second line specifies the dependency which will also populate the\n`${XVC_ADDED_REGEX_ITEMS}` environment variable in the command.\n\nSome dependency types like [regex items][xvc-p-s-d-regex-items], [line\nitems][xvc-p-s-d-line-items] and [glob items][xvc-p-s-d-glob-items] inject\nenvironment variables to the shells running the step commands. If you have\nthousands of files specified by a glob, but want to run a script only on the\nadded files after the last run, you can use these environment variables.\n\nWhen you run the pipeline, a file named `dr-iq-scores.csv` will be created. \n\n```bash\n$ xvc pipeline run\n[DONE] dr-iq (echo \"${XVC_ADDED_REGEX_ITEMS}\" \u003e\u003e dr-iq-scores.csv )\n\n$ cat dr-iq-scores.csv\nDr. Brittany Chang,103\nDr. Jane Doe,105\nDr. Linus Martin,118\nDr. Susan Swanson,132\n\n```\n\nWhen the file changes, e.g. another line matching the dependency regex added\nto the `iq-scores.csv` file, the command will add to\n`dr-iq-scores.csv` file.\n\n```bash\n$ zsh -cl 'echo \"Dr. John Doe,123\" \u003e\u003e iq-scores.csv'\n\n$ xvc pipeline run\n[DONE] dr-iq (echo \"${XVC_ADDED_REGEX_ITEMS}\" \u003e\u003e dr-iq-scores.csv )\n\n$ cat dr-iq-scores.csv\nDr. Brian Shaffer,122\nDr. Brittany Chang,82\nDr. Mallory Payne MD,70\nDr. Sherry Leonard,93\nDr. Susan Swanson,81\nDr. John Doe,123\n\n```\n\nNote that, `${XVC_ADDED_REGEX_ITEMS}` has only the added lines, not all of the\nlines the regex match. So, we can just work on the added elements, without\nrerunning the commands for all matching elements. \n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e 🛃 \n      \u003cstrong\u003eExport, edit and import a pipeline with YAML or JSON files\u003c/strong\u003e\n    \u003c/summary\u003e\n\nUnlike some other tools, Xvc doesn't require (or allow) to specify pipelines in\nYAML files. Nevertheless, you can [export][xvc-p-e] and [import][xvc-p-i] the pipeline to JSON or\nYAML to edit in your editor. You can fix typos in commands, remove steps\ncompletely, or duplicate the pipeline with a new name this way. \n\n\n```bash\n$ xvc pipeline export --file my-pipeline.json\n\n$ cat my-pipeline.json\n{\n  \"name\": \"default\",\n  \"steps\": [\n    {\n      \"command\": \"python3 -m pip install --quiet --user -r requirements.txt\",\n      \"dependencies\": [\n        {\n          \"File\": {\n            \"content_digest\": {\n              \"algorithm\": \"Blake3\",\n              \"digest\": [\n                43,\n                86,\n                244,\n                111,\n                13,\n                243,\n                28,\n                110,\n                140,\n                213,\n                105,\n                20,\n                239,\n                62,\n                73,\n                75,\n                13,\n                146,\n                82,\n                17,\n                148,\n                152,\n                66,\n                86,\n                154,\n                230,\n                154,\n                246,\n                213,\n                214,\n                40,\n                119\n              ]\n            },\n            \"path\": \"requirements.txt\",\n            \"xvc_metadata\": {\n              \"file_type\": \"File\",\n              \"modified\": {\n                \"nanos_since_epoch\": [..],\n                \"secs_since_epoch\": [..]\n              },\n              \"size\": 14\n            }\n          }\n        }\n      ],\n      \"invalidate\": \"ByDependencies\",\n      \"name\": \"install-deps\",\n      \"outputs\": []\n    },\n    {\n      \"command\": \"python3 generate_data.py\",\n      \"dependencies\": [\n        {\n          \"Step\": {\n            \"name\": \"install-deps\"\n          }\n        }\n      ],\n      \"invalidate\": \"ByDependencies\",\n      \"name\": \"generate-data\",\n      \"outputs\": []\n    },\n    {\n      \"command\": \"echo /\"${XVC_ADDED_REGEX_ITEMS}/\" \u003e\u003e dr-iq-scores.csv \",\n      \"dependencies\": [\n        {\n          \"RegexItems\": {\n            \"lines\": [\n              \"Dr. Brian Shaffer,122\",\n              \"Dr. Susan Swanson,81\",\n              \"Dr. Brittany Chang,82\",\n              \"Dr. Mallory Payne MD,70\",\n              \"Dr. Sherry Leonard,93\",\n              \"Dr. Albert Einstein,144\"\n            ],\n            \"path\": \"iq-scores.csv\",\n            \"regex\": \"^Dr//..*\",\n            \"xvc_metadata\": {\n              \"file_type\": \"File\",\n              \"modified\": {\n                \"nanos_since_epoch\": [..],\n                \"secs_since_epoch\": [..]\n              },\n              \"size\": 19021\n            }\n          }\n        }\n      ],\n      \"invalidate\": \"ByDependencies\",\n      \"name\": \"dr-iq\",\n      \"outputs\": [\n        {\n          \"File\": {\n            \"path\": \"dr-iq-scores.csv\"\n          }\n        }\n      ]\n    },\n    {\n      \"command\": \"python3 visualize.py\",\n      \"dependencies\": [\n        {\n          \"File\": {\n            \"content_digest\": null,\n            \"path\": \"dr-iq-scores.csv\",\n            \"xvc_metadata\": null\n          }\n        }\n      ],\n      \"invalidate\": \"ByDependencies\",\n      \"name\": \"visualize\",\n      \"outputs\": []\n    }\n  ],\n  \"version\": 1,\n  \"workdir\": \"\"\n}\n```\n\nAfter you edit the file with changes, you can import the file to check its\nconsistency and update the pipeline definition. \n\n```bash\n$ xvc pipeline import --file my-pipeline.json --overwrite\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e 🎋 \n      \u003cstrong\u003eVisualize a pipeline in Graphviz or Mermaid\u003c/strong\u003e\n  \u003c/summary\u003e\n\nYou can get the pipeline in Graphviz DOT format to convert to an image.\n\n```bash\n$ zsh -cl 'xvc pipeline dag --format graphviz | dot -opipeline.png'\n\n```\n\nYou can also ask for a [mermaid] diagram;\n\n\n```bash\nxvc pipeline dag --format mermaid\nflowchart TD\n    n0[\"preprocess\"]\n    n1[\"data/*\"] --\u003e n0\n    n2[\"train\"]\n    n0[\"preprocess\"] --\u003e n2\n\n```\n\nYou can embed this output in Markdown files, Github PRs or Jupyter notebooks.\n\n```mermaid\nflowchart TD\n    n0[\"preprocess\"]\n    n1[\"data/*\"] --\u003e n0\n    n2[\"train\"]\n    n0[\"preprocess\"] --\u003e n2\n```\n\n\u003c/details\u003e\n\nPlease check [`docs.xvc.dev`][docs] for documentation.\n\n## 🤟 Big Thanks\n\nxvc stands on the following crates:\n\n- Xvc has a deep CLI that has subcommands of subcommands (e.g. `xvc storage new\ns3`), and all these work with minimum bugs thanks to [clap]. With its dynamic\ncompletion support through [clap_complete], Xvc can complete almost anything in\nyour shell. \n\n- [serde] allows all data structures to be stored in text files. Special thanks\nfrom [`xvc-ecs`] for serializing components in an ECS with a single line of\ncode.\n\n- Xvc processes files in parallel with pipelines and parallel iterators thanks\nto [crossbeam] and [rayon].\n\n- Thanks to [strum], Xvc uses enums extensively and converts almost everything\nto typed values from strings.\n\n- Xvc uses [rust-s3] to connect to S3 and compatible storage services. It\nemploys excellent [tokio] for fast async Rust. These cloud storage features can\nbe turned off thanks to Rust conditional compilation.\n\n- Without implementations of [BLAKE3], BLAKE2, SHA-2 and SHA-3 from Rust\n[crypto] crate, Xvc couldn't detect file changes so fast.\n\n- Xvc handles Git operations through calling the Git binary and (more and more)\nwith [gix].\n\n- [trycmd] is used to run all example commands in this file, [reference, and\nhow-to documentation][docs] at every PR. It makes sure that the documentation\nis always up-to-date and shown commands work as described. We start development\nby writing documentation and implementing them thanks to [trycmd].\n\n- Many thanks to small and well built crates, [reflink], [relative-path],\n[path-absolutize], [fast-glob] for file system and glob handling.\n\n- Thanks to [sad_machine] for providing a State Machine implementation that I\nused in `xvc pipeline run`. A DAG composed of State Machines made running\npipeline steps in parallel with a clean separation of process states.\n\n- Thanks to [thiserror] and [anyhow] for making error handling a breeze. These\ntwo crates make me feel I'm doing something good for the humanity when handling\nerrors.\n\n- Xvc is split into many crates and owes this organization to [cargo workspaces].\n\n[Rust]: https://rust-lang.org\n[`xvc-ecs`]: https://docs.rs/xvc-ecs/\n[anyhow]: https://docs.rs/anyhow/\n[blake3]: https://docs.rs/blake3/\n[cargo workspaces]: https://crates.io/crates/cargo-workspaces\n[clap]: https://docs.rs/clap/\n[clap_complete]: https://docs.rs/clap_complete/\n[crossbeam]: https://docs.rs/crossbeam/\n[crypto]: https://docs.rs/rust-crypto/\n[fast-glob]: https://docs.rs/fast-glob/\n[gix]: https://docs.rs/gix/\n[path-absolutize]: https://docs.rs/path-absolutize/\n[rayon]: https://docs.rs/rayon/\n[reflink]: https://docs.rs/reflink/\n[relative-path]: https://docs.rs/relative-path/\n[rust-s3]: https://docs.rs/rust-s3/\n[sad_machine]: https://docs.rs/sad_machine/\n[serde]: https://serde.rs\n[strum]: https://docs.rs/strum/\n[thiserror]: https://docs.rs/thiserror/\n[tokio]: https://tokio.rs\n[trycmd]: https://docs.rs/trycmd/\nAnd, biggest thanks to Rust designers, developers and contributors. It's a\nfabulous language and environment to work with.\n\n## 🚁 Support\n\n- If you found a bug, please [create an issue]. \n\n- You can use [discussions] to ask questions. I'll answer as much as possible.\nThank you.\n\n- I don't follow any other sites regularly. You can also reach me at\n[emre@xvc.dev](mailto:emre@xvc.dev)\n\n## 👐 Contributing\n\n- Star this repo. I feel very happy for every star and send my best wishes to\nyou. That's a certain win to spend your two seconds for me. Thanks. \n\n- Use xvc. Tell me how it works for you, read the [documentation][docs],\n[report bugs][create an issue], [discuss features][discussions].\n\n- Please note that I don't accept large code PRs. Please open an issue to\ndiscuss your idea and write/modify documentation before sending a PR. I'm\nhappy to discuss and help you to implement your idea. \n\n## 📜 License\n\nXvc is licensed under the [GNU GPL 3.0\nLicense](https://github.com/iesahin/xvc/blob/main/LICENSE). \n\nIn the future, some crates can be licensed with other licenses for easier\nintegration. If you want to use the some crates in your project with other\nlicenses, please contact me from `emre@xvc.dev`\n\nAny contribution to Xvc is assumed to be aware that licenses can be changed.\n\n## 🌦️ Future and Maintenance\n\nI'm using Xvc daily for repositories up to 2TB and I'm happy with it. Tracking\nall my files with Git via arbitrary servers and cloud providers is something I\nalways need. I'm happy to improve and maintain it as long as I use it.\n\nGiven that I'm working on this for the last three years for pure technical bliss,\nyou can expect me to work on it more.\n\n## ⚠️ Disclaimer\n\nThis software is fresh and ambitious. Although I use it and test it close to\nreal-world conditions, it didn't go under the test of time. Please backup.\n\n[discussions]: https://github.com/iesahin/xvc/discussions\n[compile]: https://docs.xvc.dev/intro/compile-without-default-features\n[completions]: https://docs.xvc.dev/intro/completions\n[create an issue]: https://github.com/iesahin/xvc/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen\n[docs]: https://docs.xvc.dev\n[installed]: https://www.rust-lang.org/tools/install\n[mermaid]: https://mermaid.js.org\n[releases]: https://github.com/iesahin/xvc/releases/latest\n\n[xvc-file-bring]: https://docs.xvc.dev/ref/xvc-file-bring\n[xvc-file-copy]: https://docs.xvc.dev/ref/xvc-file-copy\n[xvc-file-list]: https://docs.xvc.dev/ref/xvc-file-list\n[xvc-file-move]: https://docs.xvc.dev/ref/xvc-file-move\n[xvc-file-recheck]: https://docs.xvc.dev/ref/xvc-file-recheck\n[xvc-file-send]: https://docs.xvc.dev/ref/xvc-file-send\n[xvc-file-track]: https://docs.xvc.dev/ref/xvc-file-track\n[xvc-init]: https://docs.xvc.dev/ref/xvc-init\n\n[xvc-p-e]:  https://docs.xvc.dev/ref/xvc-pipeline-export\n[xvc-p-i]:  https://docs.xvc.dev/ref/xvc-pipeline-import\n[xvc-p-n]: https://docs.xvc.dev/ref/xvc-pipeline-new\n[xvc-p-r]: https://docs.xvc.dev/ref/xvc-pipeline-run\n[xvc-p-s-d-file]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#file\n[xvc-p-s-d-generic]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#generic\n[xvc-p-s-d-glob-items]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#glob-items\n[xvc-p-s-d-glob]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#glob\n[xvc-p-s-d-line-items]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#line-items\n[xvc-p-s-d-line]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#line\n[xvc-p-s-d-params]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#hyper-parameter\n[xvc-p-s-d-regex-items]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#regex-items\n[xvc-p-s-d-regex]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#regex\n[xvc-p-s-d-sqlite]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#sqlite-query-dependency\n[xvc-p-s-d-step]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#step\n[xvc-p-s-d-url]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency#url-dependencies\n[xvc-p-s-d]: https://docs.xvc.dev/ref/xvc-pipeline-step-dependency\n[xvc-p-s-n]: https://docs.xvc.dev/ref/xvc-pipeline-step-new\n\n[xvc-s-n-do]: https://docs.xvc.dev/ref/xvc-storage-new-digital-ocean\n[xvc-s-n-gcs]: https://docs.xvc.dev/ref/xvc-storage-new-gcs\n[xvc-s-n-generic]: https://docs.xvc.dev/ref/xvc-storage-new-generic\n[xvc-s-n-local]: https://docs.xvc.dev/ref/xvc-storage-new-local\n[xvc-s-n-minio]: https://docs.xvc.dev/ref/xvc-storage-new-minio\n[xvc-s-n-r2]: https://docs.xvc.dev/ref/xvc-storage-new-r2\n[xvc-s-n-rsync]: https://docs.xvc.dev/ref/xvc-storage-new-rsync\n[xvc-s-n-rclone]: https://docs.xvc.dev/ref/xvc-storage-new-rclone\n[xvc-s-n-s3]: https://docs.xvc.dev/ref/xvc-storage-new-s3\n[xvc-s-n-wasabi]: https://docs.xvc.dev/ref/xvc-storage-new-wasabi \n[xvc-s-n]: https://docs.xvc.dev/ref/xvc-storage-new \n[xvc.py]: https://github.com/iesahin/xvc.py\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiesahin%2Fxvc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiesahin%2Fxvc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiesahin%2Fxvc/lists"}