{"id":13461659,"url":"https://github.com/skypilot-org/skypilot","last_synced_at":"2026-04-02T11:47:08.066Z","repository":{"id":50266290,"uuid":"395140743","full_name":"skypilot-org/skypilot","owner":"skypilot-org","description":"SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.","archived":false,"fork":false,"pushed_at":"2025-05-12T02:36:47.000Z","size":155941,"stargazers_count":8069,"open_issues_count":470,"forks_count":645,"subscribers_count":72,"default_branch":"master","last_synced_at":"2025-05-12T02:43:20.669Z","etag":null,"topics":["cloud-computing","cloud-management","cost-management","cost-optimization","data-science","deep-learning","distributed-training","finops","gpu","hyperparameter-tuning","job-queue","job-scheduler","llm-serving","llm-training","machine-learning","ml-infrastructure","ml-platform","multicloud","spot-instances","tpu"],"latest_commit_sha":null,"homepage":"https://docs.skypilot.co/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skypilot-org.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-08-11T23:32:15.000Z","updated_at":"2025-05-12T02:37:57.000Z","dependencies_parsed_at":"2023-10-16T14:52:18.771Z","dependency_job_id":"49d458a1-2105-48d5-8157-87e2a7d35d7c","html_url":"https://github.com/skypilot-org/skypilot","commit_stats":{"total_commits":1513,"total_committers":73,"mean_commits":"20.726027397260275","dds":0.6563119629874421,"last_synced_commit":"70a7435581c35fb76dbe3c1317136ec141937d1e"},"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skypilot-org%2Fskypilot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skypilot-org%2Fskypilot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skypilot-org%2Fskypilot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skypilot-org%2Fskypilot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skypilot-org","download_url":"https://codeload.github.com/skypilot-org/skypilot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253672696,"owners_count":21945480,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud-computing","cloud-management","cost-management","cost-optimization","data-science","deep-learning","distributed-training","finops","gpu","hyperparameter-tuning","job-queue","job-scheduler","llm-serving","llm-training","machine-learning","ml-infrastructure","ml-platform","multicloud","spot-instances","tpu"],"created_at":"2024-07-31T11:00:51.119Z","updated_at":"2026-04-02T11:47:08.060Z","avatar_url":"https://github.com/skypilot-org.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/skypilot-wide-dark-1k.png\"\u003e\n    \u003cimg alt=\"SkyPilot\" src=\"https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/skypilot-wide-light-1k.png\" width=55%\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://docs.skypilot.co/\"\u003e\n    \u003cimg alt=\"Documentation\" src=\"https://img.shields.io/badge/docs-gray?logo=readthedocs\u0026logoColor=f5f5f5\"\u003e\n  \u003c/a\u003e\n\n  \u003ca href=\"https://github.com/skypilot-org/skypilot/releases\"\u003e\n    \u003cimg alt=\"GitHub Release\" src=\"https://img.shields.io/github/release/skypilot-org/skypilot.svg\"\u003e\n  \u003c/a\u003e\n\n  \u003ca href=\"http://slack.skypilot.co\"\u003e\n    \u003cimg alt=\"Join Slack\" src=\"https://img.shields.io/badge/SkyPilot-Join%20Slack-blue?logo=slack\"\u003e\n  \u003c/a\u003e\n\n  \u003ca href=\"https://github.com/skypilot-org/skypilot/releases\"\u003e\n    \u003cimg alt=\"Downloads\" src=\"https://img.shields.io/pypi/dm/skypilot\"\u003e\n  \u003c/a\u003e\n\n\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003e\n    Run AI on Any Infrastructure\n\u003c/h3\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n#### [🌟 **SkyPilot Demo** 🌟: Click to see a 1-minute tour](https://demo.skypilot.co/dashboard/)\n\n\u003c/div\u003e\n\n\nSkyPilot is a system to run, manage, and scale AI workloads on any AI infrastructure.\n\nSkyPilot gives **AI teams** a simple interface to run jobs on any infra.\n**Infra teams** get a unified control plane to manage any AI compute — with advanced scheduling, scaling, and orchestration.\n\n\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"./docs/source/images/skypilot-abstractions-long-2-dark.png\"\u003e\n  \u003cimg src=\"./docs/source/images/skypilot-abstractions-long-2.png\" alt=\"SkyPilot Abstractions\"\u003e\n\u003c/picture\u003e\n\n-----\n\n:fire: *News* :fire:\n- [Mar 2026] **Scaling Karpathy's Autoresearch**: Autoresearch runs 1 experiment at a time. We gave it 16 GPUs and let it run in parallel: [**blog**](https://blog.skypilot.co/scaling-autoresearch/), [**HackerNews**](https://news.ycombinator.com/item?id=47442435)\n- [Mar 2026] **SkyPilot Agent Skills**: GPU access and job management for AI agents: [**docs**](https://docs.skypilot.co/en/latest/getting-started/skill.html)\n- [Jan 2026] **Shopify case study**: Shopify runs all AI training workloads on SkyPilot: [**case study**](https://shopify.engineering/skypilot)\n- [Dec 2025] **SkyPilot v0.11** released: Multi-Cloud Pools, Fast Managed Jobs, Enterprise-Readiness at Large Scale, Programmability. [**Release notes**](https://github.com/skypilot-org/skypilot/releases/tag/v0.11.0)\n- [Dec 2025] Train **an agent to use Google Search** as a tool with RL on your Kubernetes or clouds: [**blog**](https://blog.skypilot.co/verl-tool-calling/), [**example**](./llm/verl/)\n- [Oct 2025] Run **RL training for LLMs** with SkyRL on your Kubernetes or clouds: [**example**](./llm/skyrl/)\n\n## Overview\n\nSkyPilot **is easy to use for AI teams**:\n- Quickly spin up compute on your own infra\n- Environment and job as code — simple and portable\n- Easy job management: queue, run, and auto-recover many jobs\n\nSkyPilot **makes Kubernetes easy for AI \u0026 Infra teams**:\n\n- Slurm-like ease of use, cloud-native robustness\n- Local dev experience on K8s: SSH into pods, sync code, or connect IDE\n- Turbocharge your clusters: gang scheduling, multi-cluster, and scaling\n\nSkyPilot **unifies multiple clusters, clouds, and hardware**:\n- One interface to use reserved GPUs, Kubernetes clusters, Slurm clusters, or 20+ clouds\n- [Flexible provisioning](https://docs.skypilot.co/en/latest/examples/auto-failover.html) of GPUs, TPUs, CPUs, with auto-retry\n- [Team deployment](https://docs.skypilot.co/en/latest/reference/api-server/api-server.html) and resource sharing\n\nSkyPilot **cuts your cloud costs \u0026 maximizes GPU availability**:\n* Autostop: automatic cleanup of idle resources\n* [Spot instance support](https://docs.skypilot.co/en/latest/examples/managed-jobs.html#running-on-spot-instances): 3-6x cost savings, with preemption auto-recovery\n* Intelligent scheduling: automatically run on the cheapest \u0026 most available infra\n\nSkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.\n\nInstall with pip:\n```bash\n# Choose your clouds:\npip install -U \"skypilot[kubernetes,aws,gcp,azure,oci,nebius,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,seeweb,shadeform,verda]\"\n```\nTo get the latest features and fixes, use the nightly build or [install from source](https://docs.skypilot.co/en/latest/getting-started/installation.html):\n```bash\n# Choose your clouds:\npip install \"skypilot-nightly[kubernetes,aws,gcp,azure,oci,nebius,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,seeweb,shadeform,verda]\"\n```\n\nTo use SkyPilot directly with your agent (Claude Code, Codex, etc.), install the [SkyPilot Skill](https://docs.skypilot.co/en/latest/getting-started/skill.html). Tell your agent:\n```\nFetch and follow https://github.com/skypilot-org/skypilot/blob/HEAD/agent/INSTALL.md to install the skypilot skill\n```\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/source/_static/intro.gif\" alt=\"SkyPilot\"\u003e\n\u003c/p\u003e\n\nCurrent supported infra: Kubernetes, Slurm, AWS, GCP, Azure, OCI, CoreWeave, Nebius, Lambda Cloud, RunPod, Fluidstack,\nCudo, Digital Ocean, Paperspace, Cloudflare, Samsung, IBM, Vast.ai, VMware vSphere, Seeweb, Prime Intellect, Shadeform, Verda Cloud, VastData, Crusoe.\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/cloud-logos-dark.png\"\u003e\n    \u003cimg alt=\"SkyPilot\" src=\"https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/cloud-logos-light.png\" width=85%\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\u003c!-- source xcf file: https://drive.google.com/drive/folders/1S_acjRsAD3T14qMeEnf6FFrIwHu_Gs_f?usp=drive_link --\u003e\n\n\n## Getting started\nYou can find our documentation [here](https://docs.skypilot.co/).\n- [Installation](https://docs.skypilot.co/en/latest/getting-started/installation.html)\n- [Quickstart](https://docs.skypilot.co/en/latest/getting-started/quickstart.html)\n- [CLI reference](https://docs.skypilot.co/en/latest/reference/cli.html)\n\n## SkyPilot in 1 minute\n\nA SkyPilot task specifies: resource requirements, data to be synced, setup commands, and the task commands.\n\nOnce written in this [**unified interface**](https://docs.skypilot.co/en/latest/reference/yaml-spec.html) (YAML or Python API), the task can be launched on any available infra (Kubernetes, Slurm, cloud, etc.).  This avoids vendor lock-in, and allows easily moving jobs to a different provider.\n\nPaste the following into a file `my_task.yaml`:\n\n```yaml\nresources:\n  accelerators: A100:8  # 8x NVIDIA A100 GPU\n\nnum_nodes: 1  # Number of VMs to launch\n\n# Working directory (optional) containing the project codebase.\n# Its contents are synced to ~/sky_workdir/ on the cluster.\nworkdir: ~/torch_examples\n\n# Commands to be run before executing the job.\n# Typical use: pip install -r requirements.txt, git clone, etc.\nsetup: |\n  cd mnist\n  pip install -r requirements.txt\n\n# Commands to run as a job.\n# Typical use: launch the main program.\nrun: |\n  cd mnist\n  python main.py --epochs 1\n```\n\nPrepare the workdir by cloning:\n```bash\ngit clone https://github.com/pytorch/examples.git ~/torch_examples\n```\n\nLaunch with `sky launch` (note: [access to GPU instances](https://docs.skypilot.co/en/latest/cloud-setup/quota.html) is needed for this example):\n```bash\nsky launch my_task.yaml\n```\n\nSkyPilot then performs the heavy-lifting for you, including:\n1. Find the cheapest \u0026 available infra across your clusters or clouds\n2. Provision the GPUs (pods or VMs), with auto-failover if the infra returned capacity errors\n3. Sync your local `workdir` to the provisioned cluster\n4. Auto-install dependencies by running the task's `setup` commands\n5. Run the task's `run` commands, and stream logs\n\nSee [Quickstart](https://docs.skypilot.co/en/latest/getting-started/quickstart.html) to get started with SkyPilot.\n\n## Runnable examples\n\nSee [**SkyPilot examples**](https://docs.skypilot.co/en/docs-examples/examples/index.html) that cover: development, training, serving, LLM models, AI apps, and common frameworks.\n\nLatest featured examples:\n\n| Task | Examples |\n|----------|----------|\n| Training | [Verl](https://docs.skypilot.co/en/latest/examples/training/verl.html), [Finetune Llama 4](https://docs.skypilot.co/en/latest/examples/training/llama-4-finetuning.html), [TorchTitan](https://docs.skypilot.co/en/latest/examples/training/torchtitan.html), [PyTorch](https://docs.skypilot.co/en/latest/getting-started/tutorial.html), [DeepSpeed](https://docs.skypilot.co/en/latest/examples/training/deepspeed.html), [NeMo](https://docs.skypilot.co/en/latest/examples/training/nemo.html), [Ray](https://docs.skypilot.co/en/latest/examples/training/ray.html), [Unsloth](https://docs.skypilot.co/en/latest/examples/training/unsloth.html), [Jax/TPU](https://docs.skypilot.co/en/latest/examples/training/tpu.html), [OpenRLHF](https://docs.skypilot.co/en/latest/examples/training/openrlhf.html) |\n| Serving | [vLLM](https://docs.skypilot.co/en/latest/examples/serving/vllm.html), [SGLang](https://docs.skypilot.co/en/latest/examples/serving/sglang.html), [Ollama](https://docs.skypilot.co/en/latest/examples/serving/ollama.html) |\n| Models | [DeepSeek-R1](https://docs.skypilot.co/en/latest/examples/models/deepseek-r1.html), [Llama 4](https://docs.skypilot.co/en/latest/examples/models/llama-4.html), [Llama 3](https://docs.skypilot.co/en/latest/examples/models/llama-3.html), [CodeLlama](https://docs.skypilot.co/en/latest/examples/models/codellama.html), [Qwen](https://docs.skypilot.co/en/latest/examples/models/qwen.html), [Kimi-K2](https://docs.skypilot.co/en/latest/examples/models/kimi-k2.html), [Kimi-K2-Thinking](https://docs.skypilot.co/en/latest/examples/models/kimi-k2-thinking.html), [Mixtral](https://docs.skypilot.co/en/latest/examples/models/mixtral.html) |\n| AI apps | [RAG](https://docs.skypilot.co/en/latest/examples/applications/rag.html), [vector databases](https://docs.skypilot.co/en/latest/examples/applications/vector_database.html) (ChromaDB, CLIP) |\n| Common frameworks | [Airflow](https://docs.skypilot.co/en/latest/examples/frameworks/airflow.html), [Jupyter](https://docs.skypilot.co/en/latest/examples/frameworks/jupyter.html), [marimo](https://docs.skypilot.co/en/latest/examples/frameworks/marimo.html)  |\n\nSource files can be found in [`llm/`](https://github.com/skypilot-org/skypilot/tree/master/llm) and [`examples/`](https://github.com/skypilot-org/skypilot/tree/master/examples).\n\n## More information\nTo learn more, see [SkyPilot Overview](https://docs.skypilot.co/en/latest/overview.html), [SkyPilot docs](https://docs.skypilot.co/en/latest/), and [SkyPilot blog](https://blog.skypilot.co/).\n\nSkyPilot adopters: [Testimonials and Case Studies](https://blog.skypilot.co/case-studies/)\n\nPartners and integrations: [Community Spotlights](https://blog.skypilot.co/community/)\n\nFollow updates:\n- [Slack](http://slack.skypilot.co)\n- [X / Twitter](https://twitter.com/skypilot_org)\n- [LinkedIn](https://www.linkedin.com/company/skypilot-oss/)\n- [SkyPilot Blog](https://blog.skypilot.co/) ([Introductory blog post](https://blog.skypilot.co/introducing-skypilot/))\n\nRead the research:\n- [SkyPilot paper](https://www.usenix.org/system/files/nsdi23-yang-zongheng.pdf) and [talk](https://www.usenix.org/conference/nsdi23/presentation/yang-zongheng) (NSDI 2023)\n- [Sky Computing whitepaper](https://arxiv.org/abs/2205.07147)\n- [Sky Computing vision paper](https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s02-stoica.pdf) (HotOS 2021)\n- [SkyServe: AI serving across regions and clouds](https://arxiv.org/pdf/2411.01438) (EuroSys 2025)\n- [Managed jobs spot instance policy](https://www.usenix.org/conference/nsdi24/presentation/wu-zhanghao)  (NSDI 2024)\n\nSkyPilot was initially started at the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley and has since gained many industry contributors. To read about the project's origin and vision, see [Concept: Sky Computing](https://docs.skypilot.co/en/latest/sky-computing.html).\n\n## Questions and feedback\nWe are excited to hear your feedback:\n* For issues and feature requests, please [open a GitHub issue](https://github.com/skypilot-org/skypilot/issues/new).\n* For questions, please use [GitHub Discussions](https://github.com/skypilot-org/skypilot/discussions).\n\nFor general discussions, join us on the [SkyPilot Slack](http://slack.skypilot.co).\n\n## Contributing\nWe welcome all contributions to the project! See [CONTRIBUTING](CONTRIBUTING.md) for how to get involved.\n","funding_links":[],"categories":["Python","Tools for deploying LLM","Deployment and Serving","LLM Deployment","Inference \u0026 Deployment","其他_机器学习与深度学习","推理 Inference","Repos","LLM Inference","What's New","Models and Projects"],"sub_categories":["Cloud \u0026 Container Deployment","🆕 Recently Added (January 2026)","Ray-Project"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskypilot-org%2Fskypilot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskypilot-org%2Fskypilot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskypilot-org%2Fskypilot/lists"}