{"id":31652212,"url":"https://github.com/paulobenicpv/airflow","last_synced_at":"2026-05-04T15:32:00.292Z","repository":{"id":316864816,"uuid":"1065139989","full_name":"Paulobenicpv/Airflow","owner":"Paulobenicpv","description":"Projeto completo de engenharia de dados com Airflow, dbt e DuckDB, orquestrando pipelines do raw ao BI com automação e qualidade.","archived":false,"fork":false,"pushed_at":"2025-09-27T07:01:02.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-27T08:50:57.478Z","etag":null,"topics":["airflow","automation","azure","ci-cd","data-engineering","data-pipeline","data-quality","dataops","dbt","duckdb","etl","kubernetes","orchestration","powerbi"],"latest_commit_sha":null,"homepage":"https://www.linkedin.com/in/paulobenicpv/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Paulobenicpv.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-27T06:07:00.000Z","updated_at":"2025-09-27T07:01:05.000Z","dependencies_parsed_at":"2025-09-28T14:46:10.806Z","dependency_job_id":null,"html_url":"https://github.com/Paulobenicpv/Airflow","commit_stats":null,"previous_names":["paulobenicpv/airflow"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Paulobenicpv/Airflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulobenicpv%2FAirflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulobenicpv%2FAirflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulobenicpv%2FAirflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulobenicpv%2FAirflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Paulobenicpv","download_url":"https://codeload.github.com/Paulobenicpv/Airflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulobenicpv%2FAirflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278755165,"owners_count":26040034,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","automation","azure","ci-cd","data-engineering","data-pipeline","data-quality","dataops","dbt","duckdb","etl","kubernetes","orchestration","powerbi"],"created_at":"2025-10-07T10:00:06.366Z","updated_at":"2025-10-07T10:00:08.524Z","avatar_url":"https://github.com/Paulobenicpv.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n## 🏷️ Badges\n![CI](https://img.shields.io/github/actions/workflow/status/USER/REPO/github-actions.yml?branch=main)\n![Trivy](https://img.shields.io/github/actions/workflow/status/USER/REPO/trivy.yml?label=trivy)\n![License](https://img.shields.io/badge/license-MIT-green)\n![Airflow](https://img.shields.io/badge/Airflow-2.9.3-blue)\n\n\n## ✨ Visão Geral\n\nProjeto **Airflow (produção-ready)** com:\n- Estrutura modular (ingestion/transform/serving) + **plugins** e **include**\n- **CI/CD** (lint, testes, DagBag, build \u0026 push GHCR)\n- **Kubernetes** via Helm (KubernetesExecutor) + renderer de `values`\n- **Qualidade de dados** (checks + GE-like), **alertas Slack**, **dbt** (DuckDB)\n- **Power BI refresh** (API + polling) e **Secrets Backend** (Azure Key Vault)\n\n\n## 🧱 Arquitetura (Mermaid)\n```mermaid\nflowchart LR\n  A[PTAX API] --\u003e|JSON| B[b3_ptax_ingest]\n  C[SAP sim] --\u003e|CSV| D[sap_orders_ingest]\n  B --\u003e E[Raw Layer]\n  D --\u003e E\n  E --\u003e F[ptax_transform_curated]\n  F --\u003e G[(Parquet Curated)]\n  G --\u003e H{Quality Checks + GE}\n  H --\u003e I[dbt run/test]\n  I --\u003e J[Serving CSV]\n  J --\u003e K[Power BI Refresh + Poll]\n```\n\n\n## 📁 Estrutura\n```\ndags/               # DAGs por domínio e utilitários\nplugins/            # Operators/Hooks/Sensors/Macros customizados\ninclude/            # Raw/Curated/Serving/Quality assets\nconfigs/            # variables.json / pools.json / connections.sample.json\nenv/                # .env exemplos e templates\nops/                # docker, helm, compose e ferramentas\nci/                 # pipelines GitHub Actions\ntests/              # unit e integration\ndbt/                # projeto dbt (DuckDB)\ndocs/               # guias e tutoriais\n```\n\n\n## 🚀 Quickstart (dev local)\n```bash\ndocker compose -f ops/compose/docker-compose.yml up -d airflow-init\ndocker compose -f ops/compose/docker-compose.yml up -d\n# http://localhost:8080 (admin/admin)\n```\n\n# Airflow Project – Production Skeleton\n\nEstrutura base para times com CI/CD e deploy em Kubernetes.\n\n## Pastas principais\n- `dags/`: DAGs por domínio (ingestion/transform/serving).\n- `include/`: SQL, templates Jinja, schemas.\n- `plugins/`: Operators, Hooks, Sensors, Macros reutilizáveis.\n- `configs/`: variables/pools/connections (amostras – sem segredos).\n- `env/`: arquivos `.env` de exemplo (NÃO commitar segredos reais).\n- `ops/`: Docker, Helm (K8s) e Compose (dev local).\n- `ci/`: pipelines e checks de qualidade.\n- `tests/`: testes unitários e de integração (DagBag, operators, e2e).\n- `dbt/` (opcional): projeto dbt acoplado/orquestrado pelo Airflow.\n\n## Dev rápido (docker-compose)\n```bash\ncp env/airflow.env.sample .env\ndocker compose -f ops/compose/docker-compose.yml up -d\n```\n\n## Import de variáveis/pools\n```bash\nairflow variables import configs/variables.json\nairflow pools import configs/pools.json\n```\n\n\u003e Connections devem ser criadas via UI/CLI/Secrets Backend (ver `configs/connections.sample.json`).\n\n## 🚀 Quick start (dev local com Docker)\n```bash\n# na raiz do projeto\ncp env/airflow.env.sample env/airflow.env.backup  # opcional\n# .env já está pronto na raiz; ajuste se quiser.\n\n# subir serviços (inicia banco, migra e cria usuário admin/admin)\ndocker compose -f ops/compose/docker-compose.yml up -d airflow-init\ndocker compose -f ops/compose/docker-compose.yml up -d\n\n# acessar UI do Airflow\n# http://localhost:8080 (user: admin / pass: admin)\n```\n\n## ✅ GitHub Actions\nO pipeline valida formatação, executa testes e checa o DagBag a cada push/PR em `main`.\n\n## 🔧 Personalização\n- Buckets: edite `RAW_BUCKET` e `CURATED_BUCKET` no `.env`.\n- Executor: ajuste `AIRFLOW__CORE__EXECUTOR` no `.env` (por padrão `LocalExecutor`).\n- Slack: preencha `SLACK_WEBHOOK_URL` (se usar alertas).\n\n\n## 🧩 dbt (DuckDB)\nRodar no container:\n```bash\ndocker compose -f ops/compose/docker-compose.yml exec webserver   bash -lc \"cd /opt/airflow/dbt \u0026\u0026 dbt run --profiles-dir profiles \u0026\u0026 dbt test --profiles-dir profiles\"\n```\n\n## 🐳 Build \u0026 Push da imagem (GHCR)\nO workflow `ci` já realiza build e push para `ghcr.io/\u003cUSER\u003e/\u003cREPO\u003e:latest` em pushes na `main`.\nCertifique-se que o repositório tem **Actions habilitado** e permissões de **packages: write**.\n\n## ☸️ Deploy K8s (Helm)\nEdite `ops/helm/values-prod.yaml` substituindo `Paulobenicpv/Airflow` por `seu_user/seu_repo` e aplique:\n```bash\nhelm upgrade --install airflow apache-airflow/airflow -f ops/helm/values-prod.yaml -n data-platform\n```\n\n\n## 🔐 Secrets Backend (templates)\nExemplos para ativar secrets por ambiente (ajuste no Helm ou `.env`):\n\n### AWS SSM / Secrets Manager\n```\nAIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend\nAIRFLOW__SECRETS__BACKEND_KWARGS={\"connections_prefix\":\"/airflow/connections\",\"variables_prefix\":\"/airflow/variables\",\"region_name\":\"us-east-1\"}\n```\n\n### GCP Secret Manager\n```\nAIRFLOW__SECRETS__BACKEND=airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend\nAIRFLOW__SECRETS__BACKEND_KWARGS={\"connections_prefix\":\"airflow-connections\",\"variables_prefix\":\"airflow-variables\",\"project_id\":\"SEU_PROJECT_ID\"}\n```\n\n### Azure Key Vault\n```\nAIRFLOW__SECRETS__BACKEND=airflow.providers.microsoft.azure.secrets.key_vault.AzureKeyVaultBackend\nAIRFLOW__SECRETS__BACKEND_KWARGS={\"vault_url\":\"https://SEU_VAULT.vault.azure.net/\"}\n```\n\n\n## 🔐 Azure Key Vault – Secrets Backend (prod)\n1) Crie um **Key Vault** e um **Service Principal** com permissão `secrets/get,set,list`.\n2) Forneça as credenciais ao runtime do Airflow (K8s Secret/Workload Identity):\n   - `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET`\n3) Ajuste `ops/helm/values-prod.yaml`:\n```yaml\nextraEnv:\n  - name: AIRFLOW__SECRETS__BACKEND\n    value: airflow.providers.microsoft.azure.secrets.key_vault.AzureKeyVaultBackend\n  - name: AIRFLOW__SECRETS__BACKEND_KWARGS\n    value: '{\"connections_path\":\"airflow-connections\",\"variables_path\":\"airflow-variables\",\"vault_url\":\"https://SEU_VAULT.vault.azure.net/\"}'\n```\n4) Convention de nomes:\n   - **Connections:** `airflow-connections/\u003cCONN_ID\u003e` (JSON do connection)\n   - **Variables:** `airflow-variables/\u003cVAR_KEY\u003e` (valor em texto)\n5) Teste no pod:\n```bash\nairflow connections get \u003cconn_id\u003e\nairflow variables get \u003cvar_key\u003e\n```\n\n\n## ☸️ Deploy Rápido (Helm + Renderer)\n1) Faça login no GHCR e assegure que a imagem foi publicada pelo CI (`:latest` e `:\u003cshort_sha\u003e`).\n2) Gere `values-prod.yaml` a partir do template:\n```bash\nexport NAMESPACE=airflow-prod\nexport IMAGE_REPO=ghcr.io/Paulobenicpv/Airflow\nexport IMAGE_TAG=latest\nexport VAULT_URL=https://SEU_VAULT.vault.azure.net/\npython ops/tools/render_values.py\n```\n3) Crie o Secret com credenciais do Azure (exemplo):\n```bash\nkubectl -n $NAMESPACE create secret generic airflow-azure   --from-literal=AZURE_CLIENT_ID='xxxx'   --from-literal=AZURE_TENANT_ID='xxxx'   --from-literal=AZURE_CLIENT_SECRET='xxxx'\n```\n4) Instale/atualize via Helm:\n```bash\nhelm repo add apache-airflow https://airflow.apache.org\nhelm repo update\nhelm upgrade --install airflow apache-airflow/airflow -n $NAMESPACE -f ops/helm/values-prod.yaml\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulobenicpv%2Fairflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaulobenicpv%2Fairflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulobenicpv%2Fairflow/lists"}