{"id":15035522,"url":"https://github.com/nvidia/aistore","last_synced_at":"2025-05-13T20:08:47.721Z","repository":{"id":37276192,"uuid":"114185437","full_name":"NVIDIA/aistore","owner":"NVIDIA","description":"AIStore: scalable storage for AI applications","archived":false,"fork":false,"pushed_at":"2025-05-13T17:13:53.000Z","size":92940,"stargazers_count":1497,"open_issues_count":0,"forks_count":204,"subscribers_count":46,"default_branch":"main","last_synced_at":"2025-05-13T17:50:03.731Z","etag":null,"topics":["batch-jobs","distributed-shuffle","erasure-coding","etl-offload","kubernetes","linear-scalability","multiple-backends","network-of-clusters","object-storage","sds","software-defined"],"latest_commit_sha":null,"homepage":"https://aistore.nvidia.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-12-14T01:07:30.000Z","updated_at":"2025-05-13T17:13:57.000Z","dependencies_parsed_at":"2023-12-01T02:29:09.125Z","dependency_job_id":"7fdf1e95-f7f8-4dfa-ba61-3b6d7cd476e8","html_url":"https://github.com/NVIDIA/aistore","commit_stats":{"total_commits":7665,"total_committers":55,"mean_commits":"139.36363636363637","dds":0.5424657534246575,"last_synced_commit":"69041a17abca954c6d418666b996c5a59fb11b56"},"previous_names":["nvidia/dfcpub"],"tags_count":34,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Faistore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Faistore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Faistore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Faistore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA","download_url":"https://codeload.github.com/NVIDIA/aistore/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254020605,"owners_count":22000752,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch-jobs","distributed-shuffle","erasure-coding","etl-offload","kubernetes","linear-scalability","multiple-backends","network-of-clusters","object-storage","sds","software-defined"],"created_at":"2024-09-24T20:28:51.385Z","updated_at":"2025-05-13T20:08:47.715Z","avatar_url":"https://github.com/NVIDIA.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"**AIStore: High-Performance, Scalable Storage for AI Workloads**\n\n![License](https://img.shields.io/badge/license-MIT-blue.svg)\n![Version](https://img.shields.io/badge/version-v3.28-green.svg)\n![Go Report Card](https://goreportcard.com/badge/github.com/NVIDIA/aistore)\n\nAIStore (AIS) is a lightweight distributed storage stack tailored for AI applications. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. Built from scratch, AIS provides linear scale-out, consistent performance, and a flexible deployment model.\n\nAIS consistently shows balanced I/O distribution and linear scalability across an arbitrary number of clustered nodes. The system supports fast data access, reliability, and rich customization for data transformation workloads.\n\n## Features\n\n* ✅ **Multi-Cloud Access:** Seamlessly access and manage content across multiple [cloud backends](/docs/overview.md#at-a-glance) (including AWS S3, GCS, Azure, OCI), with an additional benefit of fast-tier performance and configurable data redundancy.\n* ✅ **Deploy Anywhere:** AIS runs on any Linux machine, virtual or physical. Deployment options range from a single [Docker container](https://github.com/NVIDIA/aistore/blob/main/deploy/prod/docker/single/README.md) and [Google Colab](https://aistore.nvidia.com/blog/2024/09/18/google-colab-aistore) to petascale [Kubernetes clusters](https://github.com/NVIDIA/ais-k8s). There are [no built-in limitations](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#no-limitations-principle) on deployment size or functionality.\n* ✅ **High Availability:** Redundant control and data planes. Self-healing, end-to-end protection, n-way mirroring, and erasure coding. Arbitrary number of lightweight access points.\n* ✅ **HTTP-based API:** A feature-rich, native API (with user-friendly SDKs for Go and Python), and compliant [Amazon S3 API](/docs/s3compat.md) for running unmodified S3 clients.\n* ✅ **Unified Namespace:** Attach AIS clusters together to provide fast, unified access to the entirety of hosted datasets, allowing users to reference shared buckets with cluster-specific identifiers.\n* ✅ **Turn-key Cache:** In addition to robust data protection features, AIS offers a per-bucket configurable LRU-based cache with eviction thresholds and storage capacity watermarks.\n* ✅ **ETL Offload:** Execute I/O intensive data transformations [close to the data](/docs/etl.md), either inline (on-the-fly as part of each read request) or offline (batch processing, with the destination bucket populated with transformed results).\n* ✅ **Existing File Datasets:** Ingest file datasets from any local or remote source, either on-demand (ad-hoc) or through asynchronous [batch](/docs/overview.md#promote-local-or-shared-files).\n* ✅ **Data Consistency:** Guaranteed [consistency](/docs/overview.md#read-after-write-consistency) across all gateways, with [write-through](/docs/overview.md#write-through) semantics in presence of [remote backends](/docs/overview.md#backend-provider).\n* ✅ **Small File Optimization:** AIS supports TAR, ZIP, TAR.GZ, and TAR.LZ4 serialization for batching and processing small files. Features include [initial sharding](https://aistore.nvidia.com/blog/2024/08/16/ishard), distributed shuffle (re-sharding), appending to existing shards, listing contained files, and [more](/docs/overview.md#shard).\n* ✅ **Kubernetes:** For production deployments, we developed the [AIS/K8s Operator](https://github.com/NVIDIA/ais-k8s/tree/main/operator). A dedicated GitHub [repository](https://github.com/NVIDIA/ais-k8s) contains Ansible scripts, Helm charts, and deployment guidance.\n* ✅ **Authentication and Access Control:** OAuth 2.0-compatible [authentication server (AuthN)](/docs/authn.md).\n* ✅ **Batch Jobs:** Start, monitor, and control cluster-wide [batch operations](/docs/batch.md).\n\nThe feature set is actively growing and also includes: [adding/removing nodes at runtime](/docs/lifecycle_node.md), managing [TLS certificates](/docs/cli/x509.md) at runtime, listing, copying, prefetching, and transforming [virtual directories](/docs/howto_virt_dirs.md), executing [presigned S3 requests](/docs/s3compat.md#presigned-s3-requests), adaptive [rate limiting](/docs/rate_limit.md), and more.\n\n\u003e For the original **white paper** and design philosophy, please see [AIStore Overview](/docs/overview.md), which also includes high-level block diagram, terminology, APIs, CLI, and more.\n\u003e For our 2024 KubeCon presentation, please see [AIStore: Enhancing petascale Deep Learning across Cloud backends](https://www.youtube.com/watch?v=N-d9cbROndg).\n\n## CLI\n\nAIS includes an integrated, scriptable [CLI](/docs/cli.md) for managing clusters, buckets, and objects, running and monitoring batch jobs, viewing and downloading logs, generating performance reports, and more:\n\n```console\n$ ais \u003cTAB-TAB\u003e\n\nadvanced         config           get              prefetch         show\nalias            cp               help             put              space-cleanup\narchive          create           job              remote-cluster   start\nauth             download         log              rmb              stop\nblob-download    dsort            ls               rmo              storage\nbucket           etl              object           scrub            tls\ncluster          evict            performance      search           wait\n```\n\n## Developer Tools\n\nAIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux `tar(1)`, `scp(1)`, `rsync(1)` and similar.\n\nFor developers and data scientists, there's also:\n\n* [Go API](https://github.com/NVIDIA/aistore/tree/main/api) used in [CLI](/docs/cli.md) and [benchmarking tools](/docs/aisloader.md)\n* [Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore/sdk) + [Reference Guide](/docs/python_sdk.md)\n* [PyTorch integration](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch) and usage examples\n* [Boto3 support](https://github.com/NVIDIA/aistore/tree/main/python/aistore/botocore_patch)\n\n## Quick Start\n\n1. Read the [Getting Started Guide](/docs/getting_started.md) for a 5-minute local install, or\n2. Run a [minimal](https://github.com/NVIDIA/aistore/tree/main/deploy/prod/docker/single) AIS cluster consisting of a single gateway and a single storage node, or\n3. Clone the repo and run `make kill cli aisloader deploy` followed by `ais show cluster`\n\n---------------------\n\n## Deployment options\n\nAIS deployment options, as well as intended (development vs. production vs. first-time) usages, are all [summarized here](https://github.com/NVIDIA/aistore/blob/main/deploy/README.md).\n\nSince the prerequisites essentially boil down to having Linux with a disk the deployment options range from [all-in-one container](https://github.com/NVIDIA/aistore/tree/main/deploy/prod/docker/single) to a petascale bare-metal cluster of any size, and from a single VM to multiple racks of high-end servers. Practical use cases require, of course, further consideration.\n\nSome of the most popular deployment options include:\n\n| Option | Use Case |\n| --- | ---|\n| [Local playground](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#local-playground) | AIS developers or first-time users, Linux or Mac OS. Run `make kill cli aisloader deploy \u003c\u003c\u003c $'N\\nM'`, where `N` is a number of [targets](/docs/overview.md#target), `M` - [gateways](/docs/overview.md#proxy) |\n| Minimal production-ready deployment | This option utilizes preinstalled docker image and is targeting first-time users or researchers (who could immediately start training their models on smaller datasets) |\n| [Docker container](https://github.com/NVIDIA/aistore/tree/main/deploy/prod/docker/single) | Quick testing and evaluation; single-node setup |\n| [GCP/GKE automated install](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#kubernetes-deployments) | Developers, first-time users, AI researchers |\n| [Large-scale production deployment](https://github.com/NVIDIA/ais-k8s) | Requires Kubernetes; provided via [ais-k8s](https://github.com/NVIDIA/ais-k8s) |\n\n\u003e For performance tuning, see [performance](/docs/performance.md) and [AIS K8s Playbooks](https://github.com/NVIDIA/ais-k8s/tree/main/playbooks/host-config).\n\n## Existing Datasets\n\nAIS supports multiple ingestion modes:\n\n* ✅ **On Demand:** Transparent cloud access during workloads.\n* ✅ **PUT:** Locally accessible files and directories.\n* ✅ **Promote:** Import local target directories and/or NFS/SMB shares mounted on AIS targets.\n* ✅ **Copy:** Full buckets, virtual subdirectories (recursively or non-recursively), lists or ranges (via Bash expansion).\n* ✅ **Download:** HTTP(S)-accessible datasets and objects.\n* ✅ **Prefetch:** Remote buckets or selected objects (from remote buckets), including subdirectories, lists, and/or ranges.\n* ✅ **Archive:** [Group and store](https://aistore.nvidia.com/blog/2024/08/16/ishard) related small files from an original dataset.\n\n## Install from Release Binaries\n\nYou can install the CLI and benchmarking tools using:\n\n```console\n./scripts/install_from_binaries.sh --help\n```\n\nThe script installs [aisloader](/docs/aisloader.md) and [CLI](/docs/cli.md) from the latest or previous GitHub [release](https://github.com/NVIDIA/aistore/releases) and enables CLI auto-completions.\n\n## PyTorch integration\n\nPyTorch integration is a growing set of datasets (both iterable and map-style), samplers, and dataloaders:\n\n* [Taxonomy of abstractions and API reference](/docs/pytorch.md)\n* [AIS plugin for PyTorch: usage examples](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch/README.md)\n* [Jupyter notebook examples](https://github.com/NVIDIA/aistore/tree/main/python/examples/aisio-pytorch/)\n\n## AIStore Badge\n\nLet others know your project is powered by high-performance AI storage:\n\n[![aistore](https://img.shields.io/badge/powered%20by-AIStore-76B900?style=flat\u0026labelColor=000000)](https://github.com/NVIDIA/aistore)\n\n```markdown\n[![aistore](https://img.shields.io/badge/powered%20by-AIStore-76B900?style=flat\u0026labelColor=000000)](https://github.com/NVIDIA/aistore)\n```\n\n## More Docs \u0026 Guides\n\n* [Overview and Design](/docs/overview.md)\n* [Getting Started](/docs/getting_started.md)\n* [Buckets and Bucket Management](/docs/bucket.md)\n* [Technical Blog](https://aistore.nvidia.com/blog)\n* [S3 Compatibility](/docs/s3compat.md)\n* [Batch Jobs](/docs/batch.md)\n* [Performance](/docs/performance.md) and [CLI: performance](/docs/cli/performance.md)\n* [CLI Reference](/docs/cli.md)\n* [Authentication](/docs/authn.md)\n* [Prometheus \u0026 Metrics](/docs/metrics.md)\n* [Production Deployment: Kubernetes Operator, Ansible Playbooks, Helm Charts, Monitoring](https://github.com/NVIDIA/ais-k8s)\n\n### How to find information\n\n* See [Extended Index](/docs/docs.md)\n* Use CLI `search` command, e.g.: `ais search copy`\n* Clone the repository and run `git grep`, e.g.: `git grep -n out-of-band -- \"*.md\"`\n\n## License\n\nMIT\n\n## Author\n\nAlex Aizman (NVIDIA)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Faistore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvidia%2Faistore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Faistore/lists"}