{"id":49379097,"url":"https://github.com/extra-chill/data-machine","last_synced_at":"2026-05-10T03:04:26.281Z","repository":{"id":304781630,"uuid":"960815192","full_name":"Extra-Chill/data-machine","owner":"Extra-Chill","description":"Agentic infrastructure for WordPress. ","archived":false,"fork":false,"pushed_at":"2026-04-24T01:06:26.000Z","size":15980,"stargazers_count":25,"open_issues_count":47,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-24T01:33:58.627Z","etag":null,"topics":["abilities-api","action-scheduler","ai-agents","wordpress-plugin","wp-cli"],"latest_commit_sha":null,"homepage":"https://chubes.net/docs/data-machine","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Extra-Chill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-04-05T05:57:23.000Z","updated_at":"2026-04-23T23:29:47.000Z","dependencies_parsed_at":"2025-12-06T10:12:10.630Z","dependency_job_id":null,"html_url":"https://github.com/Extra-Chill/data-machine","commit_stats":null,"previous_names":["chubes4/data-machine","extra-chill/data-machine"],"tags_count":154,"template":false,"template_full_name":null,"purl":"pkg:github/Extra-Chill/data-machine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Extra-Chill%2Fdata-machine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Extra-Chill%2Fdata-machine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Extra-Chill%2Fdata-machine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Extra-Chill%2Fdata-machine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Extra-Chill","download_url":"https://codeload.github.com/Extra-Chill/data-machine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Extra-Chill%2Fdata-machine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32365519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"online","status_checked_at":"2026-04-28T02:00:07.250Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abilities-api","action-scheduler","ai-agents","wordpress-plugin","wp-cli"],"created_at":"2026-04-28T04:00:43.814Z","updated_at":"2026-05-10T03:04:26.249Z","avatar_url":"https://github.com/Extra-Chill.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Machine\n\nAgentic workflow automation for WordPress.\n\n## What It Does\n\nData Machine turns a WordPress site into an agent runtime — persistent identity, memory, pipelines, abilities, and tools that AI agents use to operate autonomously.\n\n- **Pipelines** — Multi-step workflows: fetch content, process with AI, publish anywhere\n- **Abilities API** — Typed, permissioned functions that agents and extensions call (`datamachine/upload-media`, `datamachine/validate-media`, etc.)\n- **Agent memory** — Layered markdown files (SOUL.md + MEMORY.md in agent layer, USER.md in user layer) injected into every AI context\n- **Multi-agent** — Multiple agents with scoped pipelines, flows, jobs, and filesystem directories\n- **Self-scheduling** — Agents schedule their own recurring tasks using flows, prompt queues, and Agent Pings\n\nData Machine builds on [Agents API](https://github.com/Automattic/agents-api) for generic agent runtime contracts and durable agent primitives. Data Machine owns the WordPress automation product layer: pipelines, flows, jobs, handlers, tools, abilities, memory files, system tasks, and admin/CLI surfaces.\n\n## Architecture\n\n### Pipelines\n\n```\n┌─────────────┐     ┌─────────────┐     ┌─────────────┐\n│    FETCH    │ ──▶ │     AI      │ ──▶ │   PUBLISH   │\n│  RSS, API,  │     │  Enhance,   │     │  WordPress, │\n│  WordPress  │     │  Transform  │     │   Social,   │\n└─────────────┘     └─────────────┘     └─────────────┘\n```\n\n**Pipelines** define the workflow template. **Flows** schedule when they run. **Jobs** track each execution with full undo support.\n\n### Agent Modes\n\nOne agent, three operational modes — same identity and memory, different guidance and tools:\n\n| Mode | Purpose | Tools |\n|---------|---------|-------|\n| **Pipeline** | Automated workflow execution | Handler-specific tools scoped to the current step |\n| **Chat** | Conversational interface in wp-admin | 30+ management tools (flows, pipelines, jobs, logs, memory, content) |\n| **System** | Background infrastructure tasks | Alt text, daily memory, image generation, internal linking, meta descriptions (GitHub issues in data-machine-code extension) |\n\nBuilt-in mode guidance is injected by `AgentModeDirective` at runtime and extensions can register more modes through `AgentModeRegistry`. Configure AI provider and model per mode in Settings. Each mode falls back to the global default if no override is set.\n\n### Agent Memory\n\nPersistent markdown files injected into every AI context:\n\n```\nshared/\n  SITE.md                  — Site-wide context\nagents/{slug}/\n  SOUL.md                  — Identity, voice, rules\n  MEMORY.md                — Accumulated knowledge\n  daily/YYYY/MM/DD.md      — Automatic daily journals\nusers/{id}/\n  USER.md                  — Information about the human\n```\n\nDiscovery: `wp datamachine memory paths --allow-root`\n\n### Abilities API\n\nTyped, permissioned functions registered via WordPress's Abilities API. Extensions and agents consume them instead of reaching into internals:\n\n| Ability | Description |\n|---------|-------------|\n| `datamachine/query-posts` | Query WordPress posts for pipeline/content operations |\n| `datamachine/publish-wordpress` | Publish canonical content to WordPress |\n| `datamachine/update-wordpress` | Update existing WordPress content |\n| `datamachine/generate-alt-text` | Generate alt text for media |\n| `datamachine/generate-meta-description` | Generate SEO meta descriptions |\n| `datamachine/run-flow` | Execute a flow programmatically |\n| ... | Additional core abilities across pipelines, flows, jobs, memory, media, SEO, email, and infrastructure |\n\nSocial publishing, workspace, and GitHub abilities live in extension plugins such as data-machine-socials and data-machine-code.\n\n### Content Formats\n\nContent and publish abilities accept `content_format` (`markdown`, `html`, or `blocks`) as the caller's source format. Data Machine stores content in the post type's canonical format from `datamachine_post_content_format`, converting through its bundled Block Format Bridge substrate.\n\n### Multi-Agent\n\nAgents are scoped by user. Each agent gets its own:\n\n- Filesystem directory (`agents/{slug}/`)\n- Memory files (SOUL.md, MEMORY.md)\n- Pipelines, flows, and jobs (scoped by `user_id`)\n\nSingle-agent mode (`user_id=0`) works out of the box. Multi-agent adds scoping without breaking existing setups.\n\n## Step Types \u0026 Handlers\n\nPipelines are built from **step types**. Some use pluggable **handlers** — interchangeable implementations that define *how* the step operates.\n\n### Steps with handlers\n\n| Step Type | Core Handlers | Extension Handlers |\n|-----------|---------------|-------------------|\n| **Fetch** | RSS, WordPress (local posts), WordPress API (remote), WordPress Media, Files | GitHub, Google Sheets, Reddit, social platforms (in extensions) |\n| **Publish** | WordPress | Workspace (data-machine-code), Twitter, Instagram, Facebook, Threads, Bluesky, Pinterest, Google Sheets, Slack, Discord (in extensions) |\n| **Update** | WordPress posts with AI enhancement | — |\n\n### Self-contained steps\n\n| Step Type | Description |\n|-----------|-------------|\n| **AI** | Process content with the configured AI provider |\n| **Agent Ping** | Outbound webhook to trigger external agents |\n| **Webhook Gate** | Pause pipeline until an external webhook callback fires |\n| **System Task** | Background tasks (alt text, image generation, daily memory, etc.) |\n\n## Media Primitives\n\nCore provides platform-agnostic media handling that extensions consume:\n\n```\nPipeline flow:\n\n  Fetch step → video_file_path / image_file_path in engine data\n    → PublishHandler.resolveMediaUrls(engine)\n      → MediaValidator (ImageValidator or VideoValidator)\n      → FileStorage.get_public_url()\n    → Platform API (Instagram, Twitter, etc.)\n```\n\n- **MediaValidator** — Abstract base with ImageValidator and VideoValidator subclasses\n- **VideoMetadata** — ffprobe extraction with graceful degradation\n- **EngineData** — `getImagePath()` and `getVideoPath()` for pipeline media flow\n- **PublishHandler** — `resolveMediaUrls()`, `validateImage()`, `validateVideo()` on the base class\n\n## Theming\n\nData Machine exposes two aligned theming surfaces: CSS custom properties for browser-rendered UI and `BrandTokens` for PHP/GD-rendered image templates. See [`docs/theming.md`](docs/theming.md) for the decision matrix and token catalogs.\n\n## System Tasks\n\nBackground AI tasks that run on hooks or schedules:\n\n| Task | Description |\n|------|-------------|\n| **Alt Text** | Generate alt text for images missing it |\n| **Image Generation** | AI image creation with content-gap placement |\n| **Daily Memory** | Consolidate MEMORY.md, archive to daily files |\n| **Internal Linking** | AI-powered internal link suggestions |\n| **Meta Descriptions** | Generate SEO meta descriptions |\n| **GitHub Issues** | Create issues from pipeline findings (in data-machine-code extension) |\n\nTasks support undo via the Job Undo system (revision-based rollback for post content, meta, attachments, featured images).\n\n## Self-Scheduling\n\n```\nAgent queues task → Flow runs → Agent Ping fires →\nAgent executes → Agent queues next task → Loop continues\n```\n\n- **Flows** run on schedules — daily, hourly, or cron expressions\n- **Prompt queues** — AI and Agent Ping steps pop tasks from persistent queues\n- **Webhook triggers** — `POST /datamachine/v1/trigger/{flow_id}` with Bearer token auth\n- **Agent Ping** — Outbound webhook with context for receiving agents\n\n## WP-CLI\n\n```bash\nwp datamachine agents           # Agent management and path discovery\nwp datamachine pipelines        # Pipeline CRUD\nwp datamachine flows            # Flow CRUD and queue management\nwp datamachine jobs             # Job management, monitoring, undo\nwp datamachine settings         # Plugin settings\nwp datamachine posts            # Query Data Machine-created posts\nwp datamachine logs             # Log operations\nwp datamachine memory           # Agent memory read/write\nwp datamachine handlers         # List registered handlers\nwp datamachine step-types       # List registered step types\nwp datamachine chat             # Chat agent interface\nwp datamachine alt-text         # AI alt text generation\nwp datamachine links            # Internal linking\nwp datamachine blocks           # Gutenberg block operations\nwp datamachine image            # Image generation\nwp datamachine meta-description # SEO meta descriptions\nwp datamachine auth             # OAuth provider management\nwp datamachine taxonomy         # Taxonomy operations\nwp datamachine batch            # Batch operations\nwp datamachine system           # System task management\nwp datamachine analytics        # Analytics and tracking\n```\n\n## REST API\n\nFull REST API under `datamachine/v1`:\n\n- `POST /execute` — Execute a flow\n- `POST /trigger/{flow_id}` — Webhook trigger with Bearer token auth\n- `POST /chat` — Chat agent interface\n- `GET|POST /pipelines` — Pipeline CRUD\n- `GET|POST /flows` — Flow CRUD with queue management\n- `GET|POST /jobs` — Job management\n- `POST /jobs/{id}/undo` — Job undo\n- `GET /agent/paths` — Agent file path discovery\n\n## Extensions\n\n| Plugin | Description |\n|--------|-------------|\n| [data-machine-code](https://github.com/Extra-Chill/data-machine-code) | Workspace management, GitHub integration, git operations |\n| [data-machine-socials](https://github.com/Extra-Chill/data-machine-socials) | Publish to Instagram (images, carousels, Reels, Stories), Twitter (text + media + video), Facebook, Threads, Bluesky, Pinterest (image + video pins). Reddit fetch. |\n| [data-machine-business](https://github.com/Extra-Chill/data-machine-business) | Google Sheets (fetch + publish), Slack, Discord integrations |\n| [data-machine-editor](https://github.com/Extra-Chill/data-machine-editor) | Gutenberg inline diff visualization, accept/reject review, editor sidebar |\n| [data-machine-frontend-chat](https://github.com/Extra-Chill/data-machine-frontend-chat) | Floating agent chat widget for any WordPress site |\n| [data-machine-chat-bridge](https://github.com/Extra-Chill/data-machine-chat-bridge) | Message queue, webhook delivery, and REST API for external chat clients |\n| [data-machine-events](https://github.com/Extra-Chill/data-machine-events) | Event calendar automation with AI + Gutenberg blocks |\n| [datamachine-recipes](https://github.com/Sarai-Chinwag/datamachine-recipes) | Recipe content extraction and schema processing |\n| [data-machine-quiz](https://github.com/Sarai-Chinwag/data-machine-quiz) | Quiz creation and management tools |\n\n### Skills\n\n| Package | Description |\n|---------|-------------|\n| [data-machine-skills](https://github.com/Extra-Chill/data-machine-skills) | Agent skills — discoverable instruction sets that coding agents load on demand |\n\n### Integrations\n\n| Project | Description |\n|---------|-------------|\n| [mautrix-data-machine](https://github.com/Extra-Chill/mautrix-data-machine) | Matrix/Beeper bridge — chat with your WordPress AI agent via any Matrix client |\n\n## AI Providers\n\nOpenAI, Anthropic, Google, Grok, OpenRouter — configure a global default per-site, with per-mode overrides for pipeline, chat, and system.\n\n## Runtime Adapters\n\nData Machine's runtime seams use Agents API vocabulary. The conversation loop is swappable through `agents_api_conversation_runner`, letting another durable agent runtime take over while Data Machine still provides pipelines, flows, jobs, tool resolution, abilities, and memory integration.\n\n```php\nadd_filter(\n    'agents_api_conversation_runner',\n    function ( $result, $messages, $tools, $provider, $model, $context, $payload, $max_turns, $single_turn ) {\n        // Return an array matching AIConversationLoop::execute()'s shape to\n        // replace the built-in loop, or null to let Data Machine run it.\n        return my_runtime_run( ... );\n    },\n    10,\n    9\n);\n```\n\nThis mirrors the provider pattern used by the bundled AI HTTP Client: providers swap how the LLM is called; runtime adapters swap how the conversation is run. Data Machine makes no assumptions about the host runtime — the filter is the entire contract.\n\nSee [`docs/core-system/ai-conversation-loop.md`](docs/core-system/ai-conversation-loop.md#runtime-adapters) for the full adapter contract and return-shape reference.\n\n## Memory Storage Adapters\n\nAgent memory files (MEMORY.md, SOUL.md, USER.md, NETWORK.md, AGENTS.md, plus any custom files registered through `MemoryFileRegistry`) persist on the local filesystem by default. The persistence layer is swappable through a single Agents API-shaped filter (`agents_api_memory_store`), enabling DB-backed implementations on managed hosts that don't expose a writable filesystem.\n\n```php\nadd_filter(\n    'agents_api_memory_store',\n    function ( $store, $scope ) {\n        // Return an WP_Agent_Memory_Store to replace the disk default\n        // for this scope, or null to let Data Machine read/write through\n        // the filesystem.\n        return new My_DB_Agent_Memory_Store();\n    },\n    10,\n    2\n);\n```\n\nSection parsing, scaffolding, and editability gating stay in Data Machine; the store is just the bytes layer underneath. All consumer paths — section reads/writes (`AgentMemory`), the React Agent UI (`AgentFileAbilities`), and AI context injection (`CoreMemoryFilesDirective`) — flow through the same store, so a single swap makes the entire memory surface backend-agnostic.\n\nSee [`docs/development/hooks/core-filters.md`](docs/development/hooks/core-filters.md#agentmemorystoreinterface-inccorefilesrepositoryagentmemorystoreinterfacephp) for the full interface contract.\n\n## Requirements\n\n- WordPress 6.9+ (Abilities API)\n- PHP 8.2+\n- Action Scheduler (bundled)\n\n## Development\n\n```bash\nhomeboy test data-machine    # PHPUnit tests\nhomeboy audit data-machine   # Architecture and convention audits\nhomeboy build data-machine   # Test, lint, build, package\nhomeboy lint data-machine    # PHPCS with WordPress standards\n```\n\n## Documentation\n\n- [docs/](docs/) — User documentation\n- [docs/architecture/pipeline-execution-axes.md](docs/architecture/pipeline-execution-axes.md) — Four orthogonal axes of work expansion in a pipeline\n- Data Machine skill and agent instruction files are generated into consumer environments rather than stored in this plugin tree\n- [docs/CHANGELOG.md](docs/CHANGELOG.md) — Version history\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=Extra-Chill/data-machine\u0026type=date\u0026legend=top-left)](https://www.star-history.com/#Extra-Chill/data-machine\u0026type=date\u0026legend=top-left)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fextra-chill%2Fdata-machine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fextra-chill%2Fdata-machine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fextra-chill%2Fdata-machine/lists"}