{"id":51152193,"url":"https://github.com/stefen-taime/mako-main","last_synced_at":"2026-06-26T07:01:54.974Z","repository":{"id":344022140,"uuid":"1180139032","full_name":"Stefen-Taime/mako-main","owner":"Stefen-Taime","description":"Declarative real-time data pipelines Framework. YAML in, events out.","archived":false,"fork":false,"pushed_at":"2026-03-12T18:41:52.000Z","size":1635,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-13T00:44:49.735Z","etag":null,"topics":["data","datapipeline","declarative-config","declarative-pipeline","declarative-programming","declarative-workflows","framework","open-source"],"latest_commit_sha":null,"homepage":"https://mcsedition.org/fr","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Stefen-Taime.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-12T18:33:33.000Z","updated_at":"2026-03-12T18:44:45.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Stefen-Taime/mako-main","commit_stats":null,"previous_names":["stefen-taime/mako-main"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Stefen-Taime/mako-main","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stefen-Taime%2Fmako-main","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stefen-Taime%2Fmako-main/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stefen-Taime%2Fmako-main/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stefen-Taime%2Fmako-main/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Stefen-Taime","download_url":"https://codeload.github.com/Stefen-Taime/mako-main/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stefen-Taime%2Fmako-main/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34806448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","datapipeline","declarative-config","declarative-pipeline","declarative-programming","declarative-workflows","framework","open-source"],"created_at":"2026-06-26T07:01:54.069Z","updated_at":"2026-06-26T07:01:54.960Z","avatar_url":"https://github.com/Stefen-Taime.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"img/logo.png\" alt=\"Mako\" width=\"280\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eMako\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eDeclarative real-time data pipelines. YAML in, events out.\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cem\u003eNamed after the shortfin mako -- fastest shark in the ocean. Your data deserves the same speed.\u003c/em\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#connector-catalog\"\u003eCatalog\u003c/a\u003e \u0026middot;\n  \u003ca href=\"examples/sources/\"\u003eSources\u003c/a\u003e \u0026middot;\n  \u003ca href=\"examples/sinks/\"\u003eSinks\u003c/a\u003e \u0026middot;\n  \u003ca href=\"examples/transforms/\"\u003eTransforms\u003c/a\u003e \u0026middot;\n  \u003ca href=\"examples/workflows/\"\u003eWorkflows\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#observability\"\u003eObservability\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#grafana-dashboard\"\u003eGrafana\u003c/a\u003e \u0026middot;\n  \u003ca href=\"CONTRIBUTING.md\"\u003eContributing\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n```yaml\npipeline:\n  name: order-events\n  source:\n    type: kafka\n    topic: events.orders\n  transforms:\n    - name: pii_mask\n      type: hash_fields\n      fields: [email, phone, ssn]\n    - name: filter_prod\n      type: filter\n      condition: \"environment = production\"\n  sink:\n    type: snowflake\n    database: ANALYTICS\n    schema: RAW\n    table: ORDER_EVENTS\n  monitoring:\n    freshnessSLA: 5m\n    metrics:\n      enabled: true\n      port: 9090\n```\n\n```bash\nmako init                            # Starter template (stdout, zero deps)\nmako init --full pipeline-full.yaml  # Full reference with all connectors\nmako validate pipeline.yaml\nmako dry-run pipeline.yaml \u003c events.jsonl\nmako run pipeline.yaml\nmako workflow workflow.yaml          # DAG orchestration\n```\n\n---\n\n## Connector Catalog\n\n\u003ch3 align=\"center\"\u003eSources\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\u003cem\u003eWhere your data comes from\u003c/em\u003e\u003c/p\u003e\n\n\u003ctable align=\"center\"\u003e\n\u003cthead\u003e\n\u003ctr\u003e\u003cth\u003eConnector\u003c/th\u003e\u003cth\u003eType\u003c/th\u003e\u003cth\u003eHighlights\u003c/th\u003e\u003cth\u003eExamples\u003c/th\u003e\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eHTTP / REST API\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003ehttp\u003c/code\u003e\u003c/td\u003e\u003ctd\u003ePagination, OAuth2, Bearer, Basic, API Key, rate limiting\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/http/\"\u003e8 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eFile\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003efile\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eJSON, CSV, Parquet, gzip -- local or remote URL\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/file/\"\u003e5 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eApache Kafka\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003ekafka\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eConsumer groups, earliest/latest offset (franz-go)\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/kafka/\"\u003e2 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003ePostgreSQL CDC\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003epostgres_cdc\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eSnapshot, CDC, snapshot+CDC (pglogrepl)\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/postgres-cdc/\"\u003e1 pipeline\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eDuckDB\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003educkdb\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eEmbedded SQL, native Parquet/CSV, S3/GCS/Azure\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/duckdb/\"\u003e3 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003cp align=\"center\"\u003e\u003ca href=\"docs/sources.md\"\u003eFull source documentation \u0026rarr;\u003c/a\u003e\u003c/p\u003e\n\n---\n\n\u003ch3 align=\"center\"\u003eSinks\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\u003cem\u003eWhere your data goes\u003c/em\u003e\u003c/p\u003e\n\n\u003ctable align=\"center\"\u003e\n\u003cthead\u003e\n\u003ctr\u003e\u003cth\u003eConnector\u003c/th\u003e\u003cth\u003eType\u003c/th\u003e\u003cth\u003eHighlights\u003c/th\u003e\u003cth\u003eExamples\u003c/th\u003e\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003ePostgreSQL\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003epostgres\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eAuto-flatten, COPY protocol, Vault secrets\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/postgres/\"\u003e2 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eSnowflake\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003esnowflake\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eAuto-DDL, flatten mode, batch loading\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/snowflake/\"\u003e2 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eDuckDB\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003educkdb\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eAuto-table, schema evolution, Parquet/CSV export\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/duckdb/\"\u003e2 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eGoogle Cloud Storage\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003egcs\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eParquet + CSV, Snappy compression\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/gcs/\"\u003e3 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eApache Kafka\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003ekafka\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eSchema Registry validation, DLQ\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/kafka/\"\u003e1 pipeline\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eBigQuery\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003ebigquery\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eStreaming inserter\u003c/td\u003e\u003ctd\u003e--\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eClickHouse\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003eclickhouse\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eclickhouse-go v2, flatten mode\u003c/td\u003e\u003ctd\u003e--\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eS3\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003es3\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eParquet + CSV, AWS SDK v2\u003c/td\u003e\u003ctd\u003e--\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eStdout\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003estdout\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eDebug output to console\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/stdout/\"\u003e1 pipeline\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003cp align=\"center\"\u003e\u003ca href=\"docs/sinks.md\"\u003eFull sink documentation \u0026rarr;\u003c/a\u003e\u003c/p\u003e\n\n---\n\n\u003ch3 align=\"center\"\u003eTransforms\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\u003cem\u003eHow your data is processed\u003c/em\u003e\u003c/p\u003e\n\n\u003ctable align=\"center\"\u003e\n\u003cthead\u003e\n\u003ctr\u003e\u003cth\u003eTransform\u003c/th\u003e\u003cth\u003eType\u003c/th\u003e\u003cth\u003eDescription\u003c/th\u003e\u003cth\u003eExamples\u003c/th\u003e\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eSQL Enrichment\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003esql\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eCASE WHEN, computed fields, DuckDB functions\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/transforms/sql/\"\u003e2 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eWASM Plugins\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003eplugin\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eCustom logic in Go (TinyGo) or Rust\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/transforms/wasm/\"\u003e2 pipelines + source\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eSchema Validation\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003eschema\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eConfluent Schema Registry (log / reject / DLQ)\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/transforms/schema/\"\u003e3 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eData Quality\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003edq_check\u003c/code\u003e\u003c/td\u003e\u003ctd\u003enot_null, range, in_set, regex, type checks\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/transforms/dq-check/\"\u003e2 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003ePII Masking\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003ehash_fields\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eSHA-256 hash for emails, phones, cards, SSNs\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/transforms/pii/\"\u003e2 pipelines\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eFilter\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003efilter\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eKeep/discard events by condition\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/transforms/filter/\"\u003e1 pipeline\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eRename Fields\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003erename_fields\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eRename columns for target convention\u003c/td\u003e\u003ctd\u003eused across examples\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eDrop Fields\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003edrop_fields\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eRemove unnecessary columns\u003c/td\u003e\u003ctd\u003eused across examples\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eCast Fields\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003ecast_fields\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eType conversion (string, int, float, bool)\u003c/td\u003e\u003ctd\u003eused across examples\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eFlatten\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003eflatten\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eFlatten nested JSON objects\u003c/td\u003e\u003ctd\u003eused across examples\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eDefault Values\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003edefault_values\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eSet defaults for missing fields\u003c/td\u003e\u003ctd\u003eused across examples\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003cstrong\u003eDeduplicate\u003c/strong\u003e\u003c/td\u003e\u003ctd\u003e\u003ccode\u003ededuplicate\u003c/code\u003e\u003c/td\u003e\u003ctd\u003eRemove duplicates by key\u003c/td\u003e\u003ctd\u003eused across examples\u003c/td\u003e\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003cp align=\"center\"\u003e\u003ca href=\"docs/transforms.md\"\u003eFull transform documentation \u0026rarr;\u003c/a\u003e\u003c/p\u003e\n\n---\n\n\u003ch3 align=\"center\"\u003eWorkflows\u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\u003cem\u003eDAG orchestration for multi-pipeline jobs\u003c/em\u003e\u003c/p\u003e\n\n\u003ctable align=\"center\"\u003e\n\u003cthead\u003e\n\u003ctr\u003e\u003cth\u003eWorkflow\u003c/th\u003e\u003cth\u003eSteps\u003c/th\u003e\u003cth\u003eHighlights\u003c/th\u003e\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\u003ctd\u003e\u003ca href=\"examples/workflows/nyc-tlc-star-schema/\"\u003e\u003cstrong\u003eNYC TLC Star Schema\u003c/strong\u003e\u003c/a\u003e\u003c/td\u003e\u003ctd\u003e9 + quality gate\u003c/td\u003e\u003ctd\u003eStar schema from 700K+ taxi trips, 6 dimensions, fact table, daily aggregation, SQL assertions\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003ca href=\"examples/workflows/multi-source-demo/\"\u003e\u003cstrong\u003eMulti-Source Demo\u003c/strong\u003e\u003c/a\u003e\u003c/td\u003e\u003ctd\u003e3 (parallel)\u003c/td\u003e\u003ctd\u003eHTTP + CSV sources, DuckDB + PostgreSQL sinks, parallel execution\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\u003ca href=\"examples/workflows/etl-demo/\"\u003e\u003cstrong\u003eETL Demo\u003c/strong\u003e\u003c/a\u003e\u003c/td\u003e\u003ctd\u003e3 (sequential)\u003c/td\u003e\u003ctd\u003eSimple ingest \u0026rarr; transform \u0026rarr; load chain\u003c/td\u003e\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003cp align=\"center\"\u003e\u003ca href=\"docs/workflows.md\"\u003eFull workflow documentation \u0026rarr;\u003c/a\u003e\u003c/p\u003e\n\n---\n\n\u003ch3 align=\"center\"\u003eCross-Reference: Connectors as Source \u0026 Sink\u003c/h3\u003e\n\n\u003ctable align=\"center\"\u003e\n\u003cthead\u003e\n\u003ctr\u003e\u003cth\u003eConnector\u003c/th\u003e\u003cth\u003eAs Source\u003c/th\u003e\u003cth\u003eAs Sink\u003c/th\u003e\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\u003ctd\u003ePostgreSQL\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/postgres-cdc/\"\u003eCDC source\u003c/a\u003e\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/postgres/\"\u003eSink\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003eDuckDB\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/duckdb/\"\u003eQuery source\u003c/a\u003e\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/duckdb/\"\u003eSink + export\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003eKafka\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sources/kafka/\"\u003eConsumer\u003c/a\u003e\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/kafka/\"\u003eProducer\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003eGCS\u003c/td\u003e\u003ctd\u003evia DuckDB httpfs\u003c/td\u003e\u003ctd\u003e\u003ca href=\"examples/sinks/gcs/\"\u003eSink\u003c/a\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n---\n\n## Quick Start\n\n```bash\n# Clone and build\ngit clone https://github.com/Stefen-Taime/mako.git\ncd mako\ngo build -o bin/mako .\n\n# Create your first pipeline (HTTP source → stdout, zero dependencies)\n./bin/mako init\n\n# Validate\n./bin/mako validate pipeline.yaml\n\n# Run pipeline — fetches 100 commerce records, applies transforms, prints to stdout\n./bin/mako run pipeline.yaml\n\n# Run a workflow (DAG of multiple pipelines)\n./bin/mako workflow workflow.yaml\n```\n\n**Output of `mako run`:**\n\n```\n🔌 Preflight checks...\n   ✅ source — ready\n   ✅ stdout — connected\n🚀 Pipeline \"commerce-ingest\" started\n📥 Source:    http (https://raw.githubusercontent.com/.../json_bank_20240116_1.json)\n🔄 Transforms:\n   └─ pii_mask (hash_fields)\n   └─ cleanup (drop_fields)\n   └─ filter_price (filter)\n📤 Sinks:\n   └─ stdout\n{\"_pii_processed\":true,\"color\":\"yellow\",\"department\":\"Kitchen\",\"id\":3592,...}\n...\n📊 Final stats: 100 in → stdout → 58 out, 0 errors\n```\n\nThe starter pipeline fetches commerce data from [open-source-data](https://github.com/Stefen-Taime/open-source-data), hashes `user_id` (PII compliance), drops unnecessary fields, and filters items with `price \u003e 50`. Zero infrastructure needed.\n\n### Full template\n\nTo see **all** available sources, sinks, transforms, and monitoring options:\n\n```bash\n./bin/mako init --full pipeline-full.yaml\n```\n\nThis generates a reference YAML with every connector and option as commented blocks. Uncomment the sections you need.\n\n---\n\n## Observability\n\nEvery pipeline exposes real-time Prometheus metrics, health probes, and a status API on a single HTTP port (default `:9090`).\n\n### Prometheus Metrics\n\n```text\nmako_events_in_total{pipeline=\"order-events\"}        15234\nmako_events_out_total{pipeline=\"order-events\"}        15230\nmako_errors_total{pipeline=\"order-events\"}            4\nmako_dlq_total{pipeline=\"order-events\"}               2\nmako_schema_failures_total{pipeline=\"order-events\"}   1\nmako_sink_latency_microseconds{pipeline=\"order-events\"} 4523\nmako_throughput_events_per_second{pipeline=\"order-events\"} 1523.40\nmako_uptime_seconds{pipeline=\"order-events\"}          3600.0\nmako_pipeline_ready{pipeline=\"order-events\"}          1\n```\n\nMetrics are synced from the pipeline engine every **500ms** for live visibility during execution, with a final sync after graceful shutdown.\n\n**Workflow mode:** All pipelines in a workflow share a single Prometheus endpoint (`:9090`) via a shared [MetricsRegistry](pkg/observability/registry.go) -- no port-per-pipeline overhead.\n\n### Grafana Dashboard\n\nA pre-built Grafana dashboard is included at [`grafana/mako-dashboard.json`](grafana/mako-dashboard.json) with 4 sections:\n\n| Section | Panels |\n|---------|--------|\n| **Overview** | Events In, Events Out, Errors, DLQ Events, Schema Failures, Uptime |\n| **Throughput** | Events/sec rate graph, Instantaneous throughput |\n| **Errors \u0026 DLQ** | Error rate per minute (bar chart), Error rate % (gauge), Pipeline Ready (UP/DOWN) |\n| **Sink Performance** | Sink write latency, Events In vs Out (cumulative) |\n\nThe dashboard auto-discovers pipelines via a `$pipeline` template variable and supports multi-select.\n\n### Local Setup (Prometheus + Grafana)\n\nThe `docker/` stack includes Prometheus and Grafana pre-configured to scrape Mako pipelines:\n\n```bash\ncd docker/\ndocker compose up -d prometheus grafana\n\n# Prometheus  → http://localhost:9091\n# Grafana     → http://localhost:3000  (admin / mako)\n```\n\nGrafana is auto-provisioned with the Prometheus datasource and the Mako dashboard -- no manual import needed. Just run a pipeline with `mako run` or `mako workflow` and open Grafana.\n\n### HTTP Endpoints\n\n| Endpoint | Description | Use |\n|----------|-------------|-----|\n| `GET /metrics` | Prometheus text format | Scraping by Prometheus/Grafana |\n| `GET /health` | Liveness probe (always 200) | Kubernetes `livenessProbe` |\n| `GET /ready` | Readiness probe (200 when running) | Kubernetes `readinessProbe` |\n| `GET /status` | Pipeline status JSON | Monitoring dashboards |\n\n### Slack Alerting\n\nSend alerts to Slack on errors, SLA breaches, and pipeline completion:\n\n```yaml\nmonitoring:\n  freshnessSLA: 5m\n  alertChannel: \"#data-alerts\"\n  slackWebhookURL: ${SLACK_WEBHOOK_URL}\n  alerts:\n    - name: high_error_rate\n      type: error_rate\n      threshold: \"0.5%\"\n      severity: critical\n    - name: volume_drop\n      type: volume\n      threshold: \"-50%\"\n      severity: warning\n```\n\nAlert rule types: **latency** (stale data), **error_rate** (% threshold), **volume** (throughput change). Each rule has a 5-minute cooldown. See [docs/observability.md](docs/observability.md) for full details.\n\n---\n\n## Architecture\n\n```text\npipeline.yaml\n       |\n       v\n  +----------+\n  |   mako   |  CLI: validate, dry-run, run, workflow\n  |   (Go)   |\n  +----+-----+\n       |\n       v\n  +------------------------------------------+\n  |  mako-runner (per-pipeline container)    |\n  |                                          |\n  |  Source --\u003e Transform Chain --\u003e Sink(s)   |\n  |  (Kafka)    hash_fields       Postgres   |\n  |  (File)     mask_fields       Snowflake  |\n  |  (PG CDC)   filter            BigQuery   |\n  |  (HTTP)     rename            ClickHouse |\n  |  (DuckDB)   sql / dq_check   DuckDB     |\n  |             wasm_plugin       S3 / GCS   |\n  |             deduplicate       Kafka      |\n  |             cast_fields       Stdout     |\n  |                                          |\n  |  Schema Registry --\u003e Validate            |\n  |  Prometheus    --\u003e /metrics              |\n  |  Health        --\u003e /health, /ready       |\n  |  DLQ + Retries + Backoff                 |\n  +------------------------------------------+\n```\n\n---\n\n## Project Structure\n\n```text\nmako/\n├── main.go                         # CLI entry point (init, run, workflow, validate, generate)\n├── api/v1/types.go                 # Pipeline + Workflow spec (the YAML DSL model)\n├── pkg/\n│   ├── config/config.go            # YAML parser + validator\n│   ├── pipeline/engine.go          # Runtime: Source -\u003e Transforms -\u003e Sink\n│   ├── source/\n│   │   ├── file.go                 # File source (JSONL, CSV, JSON, Parquet + gzip)\n│   │   ├── postgres_cdc.go         # PostgreSQL CDC (pgx + pglogrepl)\n│   │   ├── http.go                 # HTTP/API source (pagination, OAuth2)\n│   │   ├── duckdb.go               # DuckDB source (SQL, Parquet/CSV/JSON + S3/GCS)\n│   │   └── multi.go                # Multi-source with join support\n│   ├── sink/\n│   │   ├── sink.go                 # Stdout, File sinks + BuildFromSpec\n│   │   ├── postgres.go             # PostgreSQL (pgx + COPY)\n│   │   ├── snowflake.go            # Snowflake (gosnowflake)\n│   │   ├── bigquery.go             # BigQuery (streaming inserter)\n│   │   ├── clickhouse.go           # ClickHouse (clickhouse-go v2)\n│   │   ├── s3.go                   # S3 (AWS SDK v2)\n│   │   ├── gcs.go                  # GCS (cloud.google.com/go/storage)\n│   │   ├── duckdb.go               # DuckDB (embedded, Parquet/CSV export)\n│   │   ├── encode.go               # Shared Parquet + CSV encoders\n│   │   └── resolve.go              # Secret resolution (config -\u003e env -\u003e Vault)\n│   ├── transform/\n│   │   ├── transform.go            # All built-in transforms\n│   │   └── wasm.go                 # WASM plugin runtime (wazero)\n│   ├── workflow/\n│   │   ├── engine.go               # DAG engine (parallel steps, failure policies)\n│   │   └── quality_gate.go         # SQL assertions against DuckDB\n│   ├── observability/\n│   │   ├── server.go               # Prometheus metrics + health + status HTTP\n│   │   └── registry.go             # Shared metrics registry (workflow mode)\n│   ├── kafka/kafka.go              # Kafka source + sink (franz-go)\n│   ├── schema/registry.go          # Schema Registry client + validator\n│   ├── join/join.go                # Multi-source join engine\n│   ├── duckdbext/cloud.go          # DuckDB httpfs + cloud credentials\n│   ├── alerting/                   # Slack alert rules + notifications\n│   └── vault/vault.go              # HashiCorp Vault client\n├── examples/                       # Pipeline catalog (see below)\n│   ├── sources/                    # HTTP, File, Kafka, PostgreSQL CDC, DuckDB\n│   ├── sinks/                      # PostgreSQL, Snowflake, DuckDB, GCS, Kafka, Stdout\n│   ├── transforms/                 # SQL, WASM, Schema, DQ Check, PII, Filter\n│   └── workflows/                  # NYC TLC Star Schema, ETL Demo, Multi-Source\n├── docs/                           # Detailed documentation\n├── docker/                         # Local infra (Kafka, PostgreSQL, Prometheus, Grafana)\n│   ├── prometheus/prometheus.yml   # Pre-configured to scrape Mako on :9090\n│   └── grafana/provisioning/      # Auto-provision datasource + dashboard\n├── grafana/mako-dashboard.json    # Grafana dashboard (Overview, Throughput, Errors, Sink)\n├── .github/workflows/ci.yml        # CI: unit + integration tests\n└── Dockerfile                      # Production image\n```\n\n---\n\n## CI / Testing\n\nGitHub Actions runs on every push/PR:\n\n**Unit tests** (fast, no Docker):\n\n- 70+ tests covering config, validation, transforms, WASM plugins, sources, sinks\n- Benchmarks for transform chain performance\n- Example validation + dry-run\n\n**Integration tests** (Docker services):\n\n- Kafka (KRaft) + PostgreSQL + Schema Registry\n- Full pipeline: produce messages -\u003e consume -\u003e transform -\u003e write to PG\n- HTTP endpoint verification (/metrics, /health, /ready, /status)\n- File source validation\n\n```bash\n# Run locally\ngo test -v -count=1 ./...\ngo test -bench=. -benchmem ./...\n```\n\n---\n\n## Roadmap\n\n- [x] Kafka consumer/producer (franz-go)\n- [x] PostgreSQL sink (pgx + COPY)\n- [x] Snowflake sink (gosnowflake) + flatten mode\n- [x] BigQuery sink (streaming inserter)\n- [x] Schema Registry validation (JSON Schema)\n- [x] File source (JSONL, CSV, JSON + transparent gzip)\n- [x] Prometheus metrics (/metrics)\n- [x] Health/readiness probes (/health, /ready)\n- [x] Pipeline status API (/status)\n- [x] CI with integration tests (Kafka + PG + Schema Registry)\n- [x] S3/GCS object storage sinks\n- [x] Grafana dashboard templates\n- [x] ClickHouse sink (clickhouse-go v2)\n- [x] WASM plugin transforms (wazero)\n- [x] Parquet + CSV output formats for S3/GCS\n- [x] HashiCorp Vault integration (secret resolution chain)\n- [x] PostgreSQL CDC source (snapshot, cdc, snapshot+cdc)\n- [x] HTTP/API source (pagination, OAuth2, rate limiting, retries)\n- [x] Real-time observability metrics (500ms sync, sink latency)\n- [x] Rust WASM plugin example\n- [x] DuckDB embedded source + sink\n- [x] Parquet file source (native reading via parquet-go)\n- [x] DuckDB cloud storage (S3/GCS/Azure via httpfs)\n- [x] Workflow engine (DAG orchestration, parallel steps, failure policies)\n- [x] Data quality: inline `dq_check` transform\n- [x] Data quality: `quality_gate` workflow step (SQL assertions)\n- [x] Shared Prometheus metrics registry (single port for workflows)\n- [ ] Helm chart for Kubernetes deployment\n- [ ] Codegen: `mako generate --k8s` + `--tf` (Kubernetes manifests, Terraform HCL)\n\n---\n\n## License\n\nMIT\n\n---\n\n*Built by [mcsEdition](https://mcsedition.org/fr)*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstefen-taime%2Fmako-main","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstefen-taime%2Fmako-main","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstefen-taime%2Fmako-main/lists"}