{"id":43673862,"url":"https://github.com/tensorzero/durable","last_synced_at":"2026-02-05T00:40:23.301Z","repository":{"id":328042073,"uuid":"1107923268","full_name":"tensorzero/durable","owner":"tensorzero","description":"Durable execution in Postgres","archived":false,"fork":false,"pushed_at":"2026-01-30T22:53:07.000Z","size":583,"stargazers_count":3,"open_issues_count":12,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-31T12:31:16.993Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tensorzero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-01T19:38:30.000Z","updated_at":"2026-01-30T22:22:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/tensorzero/durable","commit_stats":null,"previous_names":["tensorzero/durable"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tensorzero/durable","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorzero%2Fdurable","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorzero%2Fdurable/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorzero%2Fdurable/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorzero%2Fdurable/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tensorzero","download_url":"https://codeload.github.com/tensorzero/durable/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tensorzero%2Fdurable/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29103441,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-04T22:44:52.815Z","status":"ssl_error","status_checked_at":"2026-02-04T22:44:16.428Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-05T00:40:19.003Z","updated_at":"2026-02-05T00:40:23.294Z","avatar_url":"https://github.com/tensorzero.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# durable\n\nA Rust SDK for building durable, fault-tolerant workflows using PostgreSQL.\nThis project is derived from [absurd](https://github.com/earendil-works/absurd).\nIt is experimental software to be used in TensorZero.\nUse at your own risk.\n\n## Overview\n\n`durable` enables you to write long-running tasks that can:\n\n- **Checkpoint progress** - Steps are persisted, so tasks resume where they left off after crashes\n- **Sleep and wait** - Suspend execution for durations or until specific times\n- **Await events** - Pause until external events arrive (with optional timeouts)\n- **Retry on failure** - Configurable retry strategies with exponential backoff\n- **Scale horizontally** - Multiple workers can process tasks concurrently\n\nUnlike exception-based durable execution systems (Python, TypeScript), this SDK uses Rust's `Result` type for suspension control flow, making it idiomatic and type-safe.\n\n## Why Durable Execution?\n\nTraditional background job systems execute tasks once and hope for the best. Durable execution is different - it provides **guaranteed progress** even when things go wrong:\n\n- **Crash recovery** - If your process dies mid-workflow, tasks resume exactly where they left off. No lost progress, no duplicate work.\n- **Long-running workflows** - Execute workflows that span hours or days. Sleep for a week waiting for a subscription to renew, then continue.\n- **External event coordination** - Wait for webhooks, human approvals, or other services. The task suspends until the event arrives.\n- **Reliable retries** - Transient failures (network issues, rate limits) are automatically retried with configurable backoff.\n- **Exactly-once semantics** - Checkpointed steps don't re-execute on retry. Combined with idempotency keys, achieve exactly-once side effects.\n\nUse durable execution when your workflow is too important to fail silently, too long to hold in memory, or too complex for simple retries.\n\n## Installation\n\nAdd to your `Cargo.toml`:\n\n```toml\n[dependencies]\ndurable = \"0.1\"\n```\n\n## Quick Start\n\n```rust\nuse durable::{Durable, MIGRATOR, Task, TaskContext, TaskResult, WorkerOptions, async_trait};\nuse serde::{Deserialize, Serialize};\n\n// Define your task parameters and output\n#[derive(Serialize, Deserialize)]\nstruct ResearchParams {\n    query: String,\n}\n\n#[derive(Serialize, Deserialize)]\nstruct ResearchResult {\n    summary: String,\n    sources: Vec\u003cString\u003e,\n}\n\n// Implement the Task trait\nstruct ResearchTask;\n\n#[async_trait]\nimpl Task for ResearchTask {\n    fn name() -\u003e Cow\u003c'static, str\u003e { Cow::Borrowed(\"research\") }\n    type Params = ResearchParams;\n    type Output = ResearchResult;\n\n    async fn run(params: Self::Params, mut ctx: TaskContext) -\u003e TaskResult\u003cSelf::Output\u003e {\n        // Phase 1: Find relevant sources (checkpointed)\n        // If the task crashes after this step, it won't re-run on retry\n        let sources: Vec\u003cString\u003e = ctx.step(\"find-sources\", (), |_, _| async {\n            // Search logic here...\n            Ok(vec![\n                \"https://example.com/article1\".into(),\n                \"https://example.com/article2\".into(),\n            ])\n        }).await?;\n\n        // Phase 2: Analyze the sources (checkpointed)\n        let analysis: String = ctx.step(\"analyze\", (), |_, _| async {\n            // Analysis logic here...\n            Ok(\"Key findings from sources...\".into())\n        }).await?;\n\n        // Phase 3: Generate summary (checkpointed)\n        let summary: String = ctx.step(\"summarize\", params, |params, _| async {\n            // Summarization logic here...\n            Ok(format!(\"Research summary for '{}': {}\", params.query, analysis))\n        }).await?;\n\n        Ok(ResearchResult { summary, sources })\n    }\n}\n\n#[tokio::main]\nasync fn main() -\u003e anyhow::Result\u003c()\u003e {\n    // Create the client\n    let client = Durable::builder()\n        .database_url(\"postgres://localhost/myapp\")\n        .queue_name(\"research\")\n        .build()\n        .await?;\n\n    // Run migrations (idempotent - safe to call on every startup)\n    MIGRATOR.run(client.pool()).await?;\n\n    // Create the queue (idempotent - safe to call on every startup)\n    client.create_queue(None).await?;\n\n    // Register your task\n    client.register::\u003cResearchTask\u003e().await?;\n\n    // Spawn a task\n    let result = client.spawn::\u003cResearchTask\u003e(ResearchParams {\n        query: \"distributed systems consensus algorithms\".into(),\n    }).await?;\n\n    println!(\"Spawned task: {}\", result.task_id);\n\n    // Start a worker to process tasks\n    let worker = client.start_worker(WorkerOptions::default()).await;\n\n    // Wait for shutdown signal\n    tokio::signal::ctrl_c().await?;\n    worker.shutdown().await;\n\n    Ok(())\n}\n```\n\n## Core Concepts\n\n### Tasks\n\nTasks are defined by implementing the [`Task`] trait:\n\n```rust\n#[async_trait]\nimpl Task for MyTask {\n    fn name() -\u003e Cow\u003c'static, str\u003e { Cow::Borrowed(\"my-task\") }  // Unique identifier\n    type Params = MyParams;                                       // Input (JSON-serializable)\n    type Output = MyOutput;                                       // Output (JSON-serializable)\n\n    async fn run(params: Self::Params, mut ctx: TaskContext) -\u003e TaskResult\u003cSelf::Output\u003e {\n        // Your task logic here\n    }\n}\n```\n\n### User Errors\n\nReturn user errors with structured data using `TaskError::user()`:\n\n```rust\n// With structured data (message extracted from \"message\" field if present)\nErr(TaskError::user(json!({\"message\": \"Not found\", \"code\": 404})))\n\n// With any serializable type\nErr(TaskError::user(MyError { code: 404, details: \"...\" }))\n\n// Simple string message\nErr(TaskError::user_message(\"Something went wrong\"))\n```\n\nThe error data is serialized to JSON and stored in the database for debugging and analysis.\n\n### TaskContext\n\nThe [`TaskContext`] provides methods for durable execution:\n\n- **`step(name, params, closure)`** - Execute a checkpointed operation. The closure receives `(params, state)`. If the step completed in a previous run with the same name and params, returns the cached result.\n- **`spawn::\u003cT\u003e(name, params, options)`** - Spawn a subtask and return a handle.\n- **`spawn_by_name(name, task_name, params, options)`** - Spawn a subtask by task name (dynamic version).\n- **`join(handle)`** - Wait for a subtask to complete and get its result.\n- **`sleep_for(name, duration)`** - Suspend the task for a duration.\n- **`await_event(name, timeout)`** - Wait for an external event.\n- **`emit_event(name, payload)`** - Emit an event to wake waiting tasks.\n- **`heartbeat(duration)`** - Extend the task lease for long operations.\n- **`rand()`** - Generate a durable random value in [0, 1). Checkpointed.\n- **`now()`** - Get the current time as a durable checkpoint.\n- **`uuid7()`** - Generate a durable UUIDv7. Checkpointed.\n\n### Checkpointing\n\nSteps provide \"at-least-once\" execution. To achieve \"exactly-once\" semantics for side effects, use the `task_id` as an idempotency key:\n\n```rust\nctx.step(\"charge-payment\", ctx.task_id, |task_id, state| async {\n    let idempotency_key = format!(\"{}:charge\", task_id);\n    stripe::charge(amount, \u0026idempotency_key).await\n}).await?;\n```\n\n### Events\n\nTasks can wait for and emit events:\n\n```rust\n// In one task: wait for an event\nlet shipment: ShipmentEvent = ctx.await_event(\n    \u0026format!(\"packed:{}\", order_id),\n    Some(Duration::from_secs(7 * 24 * 3600)), // 7 day timeout\n).await?;\n\n// From another task or service: emit the event\nclient.emit_event(\n    \u0026format!(\"packed:{}\", order_id),\n    \u0026ShipmentEvent { tracking: \"1Z999\".into() },\n    None,\n).await?;\n```\n\n### Subtasks (Spawn/Join)\n\nTasks can spawn subtasks and wait for their results using `spawn()` and `join()`:\n\n```rust\nasync fn run(params: Self::Params, mut ctx: TaskContext) -\u003e TaskResult\u003cSelf::Output\u003e {\n    // Spawn subtasks (runs on same queue)\n    let h1 = ctx.spawn::\u003cProcessItem\u003e(\"item-1\", Item { id: 1 }, Default::default()).await?;\n    let h2 = ctx.spawn::\u003cProcessItem\u003e(\"item-2\", Item { id: 2 }, SpawnOptions {\n        max_attempts: Some(3),\n        ..Default::default()\n    }).await?;\n\n    // Do local work while subtasks run...\n    let local = ctx.step(\"local-work\", (), |_params, _state| async { Ok(compute()) }).await?;\n\n    // Wait for subtask results\n    let r1: ItemResult = ctx.join(h1).await?;\n    let r2: ItemResult = ctx.join(h2).await?;\n\n    Ok(Output { local, children: vec![r1, r2] })\n}\n```\n\n**Key behaviors:**\n\n- **Checkpointed** - Spawns and joins are cached. If the parent retries, it gets the same subtask handles and results.\n- **Cascade cancellation** - When a parent fails or is cancelled, all its subtasks are automatically cancelled.\n- **Error propagation** - If a subtask fails, `join()` returns an error that the parent can handle.\n- **Same queue** - Subtasks run on the same queue as their parent.\n\n### Event-Based Coordination\n\nFor coordination between independent tasks (not parent-child), use events:\n\n```rust\n// Task A: Waits for a signal from Task B\nlet approval: ApprovalPayload = ctx.await_event(\n    \u0026format!(\"approved:{}\", request_id),\n    Some(Duration::from_secs(24 * 3600)), // 24 hour timeout\n).await?;\n\n// Task B (or external service): Sends the signal\nclient.emit_event(\n    \u0026format!(\"approved:{}\", request_id),\n    \u0026ApprovalPayload { approved_by: \"admin\".into() },\n    None,\n).await?;\n```\n\n### Transactional Spawning\n\nYou can atomically enqueue a task as part of a larger database transaction. This ensures that either both your write and the task spawn succeed, or neither does:\n\n```rust\nlet mut tx = client.pool().begin().await?;\n\n// Your application write\nsqlx::query(\"INSERT INTO orders (id, status) VALUES ($1, $2)\")\n    .bind(order_id)\n    .bind(\"pending\")\n    .execute(\u0026mut *tx)\n    .await?;\n\n// Enqueue task in the same transaction\nclient.spawn_with::\u003cProcessOrder, _\u003e(\u0026mut *tx, ProcessOrderParams { order_id }).await?;\n\ntx.commit().await?;\n// Both succeed or both fail - atomic\n```\n\nThis is useful when you need to guarantee that a task is only enqueued if related data was successfully persisted. The `_with` variants accept any SQLx executor:\n\n- `spawn_with(executor, params)` - Spawn with default options\n- `spawn_with_options_with(executor, params, options)` - Spawn with custom options\n- `spawn_by_name_with(executor, task_name, params, options)` - Dynamic spawn by name\n\n## API Overview\n\n### Client\n\n| Type | Description |\n|------|-------------|\n| [`Durable`] | Main client for spawning tasks and managing queues |\n| [`DurableBuilder`] | Builder for configuring the client |\n| [`Worker`] | Background worker that processes tasks |\n\n### Task Definition\n\n| Type | Description |\n|------|-------------|\n| [`Task`] | Trait for defining task types |\n| [`TaskContext`] | Context passed to task execution |\n| [`TaskResult\u003cT\u003e`] | Result type alias for task returns |\n| [`TaskError`] | Error type with control flow signals and user errors |\n| [`TaskError::user()`] | Helper to create user errors with JSON data |\n| [`TaskError::user_message()`] | Helper to create string user errors |\n| [`TaskHandle\u003cT\u003e`] | Handle to a spawned subtask (returned by `ctx.spawn()`) |\n\n### Configuration\n\n| Type | Description |\n|------|-------------|\n| [`SpawnOptions`] | Options for spawning tasks (retries, headers, queue) |\n| [`WorkerOptions`] | Options for worker configuration (concurrency, timeouts) |\n| [`RetryStrategy`] | Retry behavior: `None`, `Fixed`, or `Exponential` |\n| [`CancellationPolicy`] | Auto-cancel tasks based on delay or duration |\n\n### Results\n\n| Type | Description |\n|------|-------------|\n| [`SpawnResult`] | Returned when spawning a task (task_id, run_id, attempt) |\n| [`ControlFlow`] | Signals for suspension and cancellation |\n\n## Environment Variables\n\n- `DURABLE_DATABASE_URL` - Default PostgreSQL connection string (if not provided to builder)\n\n## Benchmarks\n\nPerformance benchmarks run automatically on every push to `main` using [Criterion](https://github.com/bheisler/criterion.rs). Results are published to GitHub Pages:\n\n**[View Benchmark Results](https://tensorzero.github.io/durable/dev/bench/)**\n\nTo run benchmarks locally:\n\n```bash\ncargo bench\n```\n\n## License\n\nSee LICENSE file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorzero%2Fdurable","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftensorzero%2Fdurable","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftensorzero%2Fdurable/lists"}