{"id":51076943,"url":"https://github.com/zozo123/airflow-provider-sandbox","last_synced_at":"2026-06-23T15:01:57.750Z","repository":{"id":366621669,"uuid":"1277071502","full_name":"zozo123/airflow-provider-sandbox","owner":"zozo123","description":"Run each Apache Airflow task in an ephemeral cloud sandbox (local/Daytona/E2B/Modal/islo) — pluggable SandboxExecutor","archived":false,"fork":false,"pushed_at":"2026-06-22T15:37:42.000Z","size":56,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-22T17:07:29.492Z","etag":null,"topics":["airflow","airflow-provider","daytona","e2b","executor","modal","sandbox"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zozo123.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-22T14:55:12.000Z","updated_at":"2026-06-22T15:37:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zozo123/airflow-provider-sandbox","commit_stats":null,"previous_names":["zozo123/airflow-provider-sandbox"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/zozo123/airflow-provider-sandbox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fairflow-provider-sandbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fairflow-provider-sandbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fairflow-provider-sandbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fairflow-provider-sandbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zozo123","download_url":"https://codeload.github.com/zozo123/airflow-provider-sandbox/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zozo123%2Fairflow-provider-sandbox/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34694786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-23T02:00:07.161Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airflow-provider","daytona","e2b","executor","modal","sandbox"],"created_at":"2026-06-23T15:01:56.861Z","updated_at":"2026-06-23T15:01:57.745Z","avatar_url":"https://github.com/zozo123.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# airflow-provider-sandbox\n\n\u003e Run each Apache Airflow task instance in an **ephemeral cloud sandbox** —\n\u003e behind a pluggable provider layer (local · Daytona · E2B · Modal · islo).\n\n`SandboxExecutor` does for ephemeral sandboxes what `KubernetesExecutor` does\nfor pods: every queued task leases a fresh, isolated sandbox, runs the task\ninside it, reports the exit state, and tears the sandbox down. The provider\nabstraction is modelled on [crabbox](https://github.com/openclaw/crabbox)'s\n\"provision → sync → run → cleanup\" contract, so one executor targets many\nbackends via a single config key — like `crabbox --provider`.\n\n```\nScheduler ──workload──▶ SandboxExecutor ──▶ SandboxProvider ──▶ ephemeral sandbox\n                              ▲                                   (task runs here)\n                              └────── polling watcher ◀── exit code / state\n```\n\n## Why\n\n- **Isolation by default** — untrusted code, per-task dependency stacks, ML jobs\n  that shouldn't share a worker.\n- **Vendor-neutral** — the same DAG runs on a local subprocess (zero setup) or\n  on Daytona / E2B / Modal / islo by changing one line of config.\n- **Airflow-3 native** — the in-sandbox Task SDK supervisor heartbeats and ships\n  logs straight to the api-server; the executor only reconciles exit state. The\n  api-server stays the single source of truth for task state.\n\n## Install\n\n```bash\npip install airflow-provider-sandbox            # local backend only\npip install 'airflow-provider-sandbox[daytona]' # + Daytona SDK\npip install 'airflow-provider-sandbox[e2b]'     # + E2B, [modal], [islo]\n```\n\n## Quickstart — 20-second demo, no credentials\n\n```bash\npython examples/demo.py          # runs a task in a local subprocess \"sandbox\"\nSANDBOX_PROVIDER=daytona python examples/demo.py   # needs DAYTONA_API_KEY\n```\n\nIt drives the exact lifecycle the executor uses\n(`create → upload → run → poll → logs → destroy`) and prints each step.\n\n## Two ways to use it\n\n### 1. `SandboxOperator` / `@task.sandbox` — run *one* task in a sandbox (recommended start)\n\nNo special executor needed — a normal task runs a command in a sandbox. This is\nthe incremental, low-risk way to adopt sandboxes (the KubernetesPodOperator\npattern). Verified end-to-end against Airflow 3 via `dag.test()`.\n\n```python\nfrom airflow.decorators import task\nfrom airflow_provider_sandbox.operators.sandbox import SandboxOperator\n\n# Operator form\nSandboxOperator(\n    task_id=\"build\",\n    provider=\"daytona\",\n    image=\"python:3.12-slim\",\n    command=\"pip install -r req.txt \u0026\u0026 pytest\",\n)\n\n# TaskFlow form (mirrors @task.bash: return the command to run in the sandbox)\n@task.sandbox(provider=\"e2b\", env={\"ANTHROPIC_API_KEY\": \"{{ var.value.anthropic_api_key }}\"})\ndef run_agent() -\u003e str:\n    return \"python /opt/agent.py\"   # runs in an isolated sandbox; creds injected only there\n```\n\n#### Killer use case: run an LLM agent in an isolated sandbox\n\nLLM agents execute *model-generated* code — exactly what you don't want on a\nshared worker. Inject the model key into the sandbox only (resolved from an\nAirflow Variable/Connection), give the agent a fresh disposable environment, and\nkeep the blast radius at one task. See [`examples/agent_in_sandbox_dag.py`](examples/agent_in_sandbox_dag.py).\nThis composes with an agent toolset — a future `@task.agent` / `AgentOperator`\ncan hand the sandbox an `AgentSkillsToolset` plus the injected credentials.\n\n### 2. `SandboxExecutor` — route *every* task through a sandbox\n\n## Configure\n\n```ini\n[core]\n# Run everything in sandboxes, or alias it for per-task routing (recommended):\nexecutor = LocalExecutor,sandbox:airflow_provider_sandbox.executors.sandbox_executor.SandboxExecutor\n\n[logging]\n# REQUIRED: the executor refuses to start without it — logs live in ephemeral\n# sandboxes and must be shipped to remote storage by the in-sandbox supervisor.\nremote_logging = True\nremote_base_log_folder = s3://my-airflow-logs/\n\n[sandbox]\nprovider = daytona        # local | daytona | e2b | modal | islo | module:Class\npoll_interval = 5\ncreation_batch_size = 8\ndefault_timeout = 600\n```\n\nPer-task sizing/image mirrors `KubernetesExecutor`'s `pod_override`:\n\n```python\nBashOperator(\n    task_id=\"train\",\n    bash_command=\"python train.py\",\n    executor=\"sandbox\",\n    executor_config={\"sandbox_override\": {\"image\": \"daytona-medium\", \"cpu\": 4, \"memory_mb\": 8192}},\n)\n```\n\n## Providers \u0026 capabilities\n\n| Provider | `kind` | file upload | async exec | kill | reattach (adopt) | SDK |\n|----------|--------|:-----------:|:----------:|:----:|:----------------:|-----|\n| `local`  | delegated-run | ✅ | ✅ | ✅ | ❌ | none (subprocess) |\n| `daytona`| delegated-run | ✅ | ✅ | ✅ | ✅ (labelled) | `daytona` |\n| `e2b`    | delegated-run | ✅ | ✅ | ✅ | ❌ (opaque handle) | `e2b` |\n| `modal`  | delegated-run | ❌ (image-baked) | ✅ | ✅ | ❌ | `modal` |\n| `islo`   | delegated-run | ❌ (image/git) | ✅ | ❌ | ✅ (named) | `islo` |\n\nEvery SaaS backend's call sites are checked against the real SDKs by\n[`scripts/verify_sdk_conformance.py`](scripts/verify_sdk_conformance.py)\n(49/49 passing for daytona/e2b/modal/islo). islo additionally supports\npause/resume. Add your own backend by subclassing `SandboxProvider` and pointing\n`[sandbox] provider` at `your_module:YourProvider`.\n\n## Design notes \u0026 honest limitations\n\n- **Cold start**: one fresh sandbox per task try means seconds of startup\n  (`create` + bundle transfer + `import airflow` + supervisor boot) on every\n  backend. Best for long-running, heavyweight, isolation-sensitive tasks — not a\n  drop-in replacement for a warm Celery pool on high-volume short tasks.\n- **Cost**: one billable sandbox per task instance; retries multiply it.\n- **Logs**: `remote_logging` is mandatory. `get_task_log` is a best-effort\n  fallback only (it usually runs in the api-server process, where the executor's\n  in-memory handle map is empty).\n- **Adoption**: clean for named/labelled providers (Daytona, islo); best-effort\n  for opaque-handle providers (E2B, Modal) — a scheduler crash can strand a\n  sandbox.\n\n## Status\n\nAlpha, but **proven end-to-end and reproducible**: `SandboxOperator` and\n`@task.sandbox` both run real Airflow 3 tasks inside a sandbox, including\ncredential injection. Reproduce it from a clean checkout with\n`examples/sandbox_e2e_proof_dag.py`:\n\n```bash\nexport AIRFLOW_HOME=/tmp/sbx AIRFLOW__CORE__DAGS_FOLDER=$(pwd)/examples AIRFLOW__CORE__LOAD_EXAMPLES=False\nairflow db migrate \u0026\u0026 airflow dags test sandbox_e2e_proof   # -\u003e state=success, AGENT_RESULT=ok\n```\n\nSaaS backend call sites are verified against the real SDKs by\n`scripts/verify_sdk_conformance.py` (49/49 — **requires** `pip install daytona\ne2b modal islo`; it reports \"nothing verified\" if they are absent). 17 unit\ntests pass.\n\nThis is a standalone third-party provider — the sanctioned first step for a new\nAirflow executor (see [`docs/UPSTREAMING.md`](docs/UPSTREAMING.md)). It is\n**not** an `apache/airflow` monorepo package; the `apache-airflow-providers-*`\nname is ASF-reserved. Tracking issue: apache/airflow#68845.\n\n## License\n\nApache-2.0.\n\n---\n\nThis project was developed with the assistance of Claude (Anthropic); all code\nwas reviewed by a human maintainer who takes full responsibility for it.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzozo123%2Fairflow-provider-sandbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzozo123%2Fairflow-provider-sandbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzozo123%2Fairflow-provider-sandbox/lists"}