{"id":27730704,"url":"https://github.com/huggingface/xet-core","last_synced_at":"2026-06-09T00:01:34.609Z","repository":{"id":271929681,"uuid":"855423236","full_name":"huggingface/xet-core","owner":"huggingface","description":"xet client tech, used in huggingface_hub","archived":false,"fork":false,"pushed_at":"2026-06-01T23:43:35.000Z","size":7566,"stargazers_count":498,"open_issues_count":28,"forks_count":76,"subscribers_count":10,"default_branch":"main","last_synced_at":"2026-06-02T01:15:50.968Z","etag":null,"topics":["huggingface-hub","rust","storage","xet"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huggingface.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-09-10T20:57:29.000Z","updated_at":"2026-06-01T23:43:38.000Z","dependencies_parsed_at":"2026-03-20T19:09:56.985Z","dependency_job_id":null,"html_url":"https://github.com/huggingface/xet-core","commit_stats":null,"previous_names":["huggingface/xet-core"],"tags_count":91,"template":false,"template_full_name":null,"purl":"pkg:github/huggingface/xet-core","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fxet-core","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fxet-core/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fxet-core/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fxet-core/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huggingface","download_url":"https://codeload.github.com/huggingface/xet-core/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huggingface%2Fxet-core/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34085321,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["huggingface-hub","rust","storage","xet"],"created_at":"2025-04-28T06:02:40.618Z","updated_at":"2026-06-09T00:01:34.591Z","avatar_url":"https://github.com/huggingface.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"\u003c!---\nCopyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n--\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/huggingface/xet-core/blob/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/huggingface/xet-core.svg?color=blue\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/huggingface/xet-core/releases\"\u003e\u003cimg alt=\"GitHub release\" src=\"https://img.shields.io/github/release/huggingface/xet-core.svg\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/huggingface/xet-core/blob/main/CODE_OF_CONDUCT.md\"\u003e\u003cimg alt=\"Contributor Covenant\" src=\"https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e\n  \u003cp\u003e🤗 xet-core - xet client tech, used in \u003ca target=\"_blank\" href=\"https://github.com/huggingface/huggingface_hub/\"\u003ehuggingface_hub\u003c/a\u003e\u003c/p\u003e\n\u003c/h3\u003e\n\n## Welcome\n\nxet-core enables huggingface_hub to utilize xet storage for uploading and downloading to HF Hub. Xet storage provides chunk-based deduplication, efficient storage/retrieval with local disk caching, and backwards compatibility with Git LFS. This library is not meant to be used directly, and is instead intended to be used from [huggingface_hub](https://pypi.org/project/huggingface-hub).\n\n## Key features\n\n♻ **chunk-based deduplication implementation**: avoid transferring and storing chunks that are shared across binary files (models, datasets, etc).\n\n🤗 **Python bindings**: bindings for [huggingface_hub](https://github.com/huggingface/huggingface_hub/) package.\n\n↔ **network communications**: concurrent communication to HF Hub Xet backend services (CAS).\n\n🔖 **local disk caching**: chunk-based cache that sits alongside the existing [huggingface_hub disk cache](https://huggingface.co/docs/huggingface_hub/guides/manage-cache).\n\n## Packages\n\nThis repository produces the following packages:\n\n### Rust Crates (crates.io)\n\n| Crate | Description |\n|-------|-------------|\n| [`hf-xet`](https://crates.io/crates/hf-xet) | High-level client library for uploading and downloading files with chunk-based deduplication |\n| [`xet-client`](https://crates.io/crates/xet-client) | HTTP client for communicating with Hugging Face Xet storage servers |\n| [`xet-data`](https://crates.io/crates/xet-data) | Data processing pipeline for chunking, deduplication, and file reconstruction |\n| [`xet-core-structures`](https://crates.io/crates/xet-core-structures) | Core data structures including MerkleHash, metadata shards, and Xorb objects |\n| [`xet-runtime`](https://crates.io/crates/xet-runtime) | Async runtime, configuration, logging, and utility infrastructure |\n\n### Python Package (PyPI)\n\n| Package | Description |\n|---------|-------------|\n| [`hf-xet`](https://pypi.org/project/hf-xet/) | Python bindings for the Xet storage system, used by [huggingface_hub](https://github.com/huggingface/huggingface_hub) |\n\nBuilt from the [`hf_xet/`](./hf_xet) directory using [maturin](https://github.com/PyO3/maturin).\n\n### CLI Binary\n\n| Binary | Description |\n|--------|-------------|\n| `git-xet` | Git LFS compatible command-line tool for Xet storage |\n\nBuilt from the [`git_xet/`](./git_xet) directory. Distributed via [GitHub releases](https://github.com/huggingface/xet-core/releases).\n\n## Contributions (feature requests, bugs, etc.) are encouraged \u0026 appreciated 💙💚💛💜🧡❤️\n\nPlease join us in making xet-core better. We value everyone's contributions. Code is not the only way to help. Answering questions, helping each other, improving documentation, filing issues all help immensely. If you are interested in contributing (please do!), check out the [contribution guide](https://github.com/huggingface/xet-core/blob/main/CONTRIBUTING.md) for this repository.\n\n## Issues, Diagnostics \u0026 Debugging\n\nIf you encounter an issue with `hf-xet`, please collect diagnostic information\nand attach it when creating a [new Issue](https://github.com/huggingface/xet-core/issues/new/choose).\n\nThe [`scripts/diag/`](scripts/diag/) directory contains platform-specific scripts\nthat download debug symbols, configure logging, and capture periodic stack traces\nand core dumps:\n\n| OS | Script |\n|----|--------|\n| Linux | [`scripts/diag/hf-xet-diag-linux.sh`](scripts/diag/hf-xet-diag-linux.sh) |\n| macOS | [`scripts/diag/hf-xet-diag-macos.sh`](scripts/diag/hf-xet-diag-macos.sh) |\n| Windows (Git-Bash) | [`scripts/diag/hf-xet-diag-windows.sh`](scripts/diag/hf-xet-diag-windows.sh) |\n\n```bash\n# prefix your failing command with the script for your OS, e.g.:\n./scripts/diag/hf-xet-diag-macos.sh -- python my-script.py\n```\n\nSee [**scripts/diag/README.md**](scripts/diag/README.md) for full usage, output layout, dump analysis instructions, and how to install debug symbols manually.\n\nQuick debugging environment variables:\n\n```bash\nRUST_BACKTRACE=full          # full Rust backtraces on panic\nRUST_LOG=info                # enable hf-xet logging\nHF_XET_LOG_FILE=/tmp/xet.log # write logs to a file (defaults to stdout)\n```\n\n## Local Development\n\n### Repo Organization\n\n* [`xet_pkg/`](./xet_pkg) (`hf-xet`): High-level session API for uploading and downloading files with deduplication.\n* [`xet_client/`](./xet_client) (`xet-client`): HTTP client for CAS and Hub backend services.\n* [`xet_data/`](./xet_data) (`xet-data`): Chunking, deduplication, and file reconstruction pipeline.\n* [`xet_core_structures/`](./xet_core_structures) (`xet-core-structures`): MerkleHash, metadata shards, Xorb objects, and shared data structures.\n* [`xet_runtime/`](./xet_runtime) (`xet-runtime`): Async runtime, configuration, logging, and utilities.\n* [`hf_xet/`](./hf_xet): Python bindings (maturin/PyO3), produces the `hf-xet` PyPI package.\n* [`git_xet/`](./git_xet): Git LFS compatible CLI tool (`git-xet`).\n* [`wasm/`](./wasm): WebAssembly builds (`hf_xet_wasm`, `hf_xet_thin_wasm`).\n* [`simulation/`](./simulation): Simulation and benchmarking infrastructure.\n\n### Build, Test \u0026 Benchmark\n\nTo build xet-core, look at requirements in [GitHub Actions CI Workflow](.github/workflows/ci.yml) for the Rust toolchain to install. Follow Rust documentation for installing rustup and that version of the toolchain. Use the following steps for building, testing, benchmarking.\n\nMany of us on the team use [VSCode](https://code.visualstudio.com/), so we have checked in some settings in the .vscode directory. Install the rust-analyzer extension.\n\nBuild:\n\n```\ncargo build\n```\n\nTest:\n\n```\ncargo test\n```\n\nBenchmark:\n```\ncargo bench\n```\n\nLinting:\n```\ncargo clippy -r --verbose -- -D warnings\n```\n\nFormatting (requires nightly toolchain):\n```\ncargo +nightly fmt --manifest-path ./Cargo.toml --all\n```\n\n### Building Python package and running locally (on *nix systems):\n\n1. Create Python3 virtualenv: `python3 -mvenv ~/venv`\n2. Activate virtualenv: `source ~/venv/bin/activate`\n3. Install maturin: `pip3 install maturin ipython`\n4. Go to hf_xet crate: `cd hf_xet`\n5. Build: `maturin develop`\n6. Test: \n```\nipython\nimport hf_xet as hfxet\nhfxet.upload_files()\nhfxet.download_files()\n```\n\n#### Developing with tokio console\n\n\u003e Prerequisite is installing tokio-console (`cargo install tokio-console`). See [https://github.com/tokio-rs/console](https://github.com/tokio-rs/console)\n\nTo use tokio-console with hf-xet there are compile hf_xet with the following command:\n```sh\nRUSTFLAGS=\"--cfg tokio_unstable\" maturin develop -r --features tokio-console\n```\n\nThen while hf_xet is running (via a `hf` cli command or `huggingface_hub` python code), `tokio-console` will be able to connect.\n\n### Ex.\n\n```bash\n# In one terminal:\npip install huggingface_hub\nRUSTFLAGS=\"--cfg tokio_unstable\" maturin develop -r --features tokio-console\nhf download openai/gpt-oss-20b\n\n# In another terminal\ncargo install tokio-console\ntokio-console\n```\n\n#### Building universal whl for MacOS:\n\nFrom hf_xet directory:\n```\nMACOSX_DEPLOYMENT_TARGET=10.9 maturin build --release --target universal2-apple-darwin --features openssl_vendored\n```\n\nNote: You may need to install x86_64: `rustup target add x86_64-apple-darwin`\n\n### Testing\n\nUnit-tests are run with `cargo test`, benchmarks are run with `cargo bench`. Some crates have a main.rs that can be run for manual testing.\n\n## References \u0026 History\n\n* [Technical Blog posts](https://xethub.com/)\n* [Git is for Data 'CIDR paper](https://xethub.com/blog/git-is-for-data-published-in-cidr-2023)\n* History: xet-core is adapted from [xet-core](https://github.com/xetdata/xet-core), which contains deep git integration, along with very different backend services implementation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fxet-core","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuggingface%2Fxet-core","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuggingface%2Fxet-core/lists"}