{"id":45034636,"url":"https://github.com/arthurmgraf/streamflow-analytics","last_synced_at":"2026-02-19T06:12:47.883Z","repository":{"id":337654327,"uuid":"1151895011","full_name":"arthurmgraf/streamflow-analytics","owner":"arthurmgraf","description":"Real-time Streaming Data Platform for E-commerce Fraud Detection — Kafka (Strimzi), Flink (PyFlink), Airflow, PostgreSQL (CloudNativePG), K3s, Terraform/Terragrunt. Medallion Architecture, 5 fraud rules, 85 tests, full CI/CD.","archived":false,"fork":false,"pushed_at":"2026-02-19T04:26:39.000Z","size":815,"stargazers_count":0,"open_issues_count":4,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-19T04:27:04.122Z","etag":null,"topics":["airflow","apache-flink","data-engineering","fraud-detection","kafka","kubernetes","machine-learning","python","real-time-analytics","terraform"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arthurmgraf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-07T03:36:18.000Z","updated_at":"2026-02-19T04:26:43.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/arthurmgraf/streamflow-analytics","commit_stats":null,"previous_names":["arthurmgraf/streamflow-analytics"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arthurmgraf/streamflow-analytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fstreamflow-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fstreamflow-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fstreamflow-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fstreamflow-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arthurmgraf","download_url":"https://codeload.github.com/arthurmgraf/streamflow-analytics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arthurmgraf%2Fstreamflow-analytics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29604552,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T05:11:50.834Z","status":"ssl_error","status_checked_at":"2026-02-19T05:11:38.921Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","apache-flink","data-engineering","fraud-detection","kafka","kubernetes","machine-learning","python","real-time-analytics","terraform"],"created_at":"2026-02-19T06:12:47.300Z","updated_at":"2026-02-19T06:12:47.868Z","avatar_url":"https://github.com/arthurmgraf.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# StreamFlow Analytics\n\n**Real-time Streaming Fraud Detection Platform for E-commerce**\n\nProduction-grade event-driven architecture built with Apache Kafka, Apache Flink, Apache Airflow, and PostgreSQL on Kubernetes. Features ML-augmented fraud detection, Dead Letter Queue, SLO-based monitoring, ArgoCD GitOps, and full observability stack.\n\n[![CI](https://github.com/arthurmaiagraf/streamflow-analytics/actions/workflows/ci.yaml/badge.svg)](https://github.com/arthurmaiagraf/streamflow-analytics/actions)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/)\n[![Coverage \u003e80%](https://img.shields.io/badge/coverage-%3E80%25-brightgreen.svg)]()\n[![Ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://docs.astral.sh/ruff/)\n[![mypy strict](https://img.shields.io/badge/mypy-strict-blue.svg)](https://mypy-lang.org/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n\n---\n\n## Why This Project Exists\n\nA Brazilian e-commerce company with 200+ stores (physical + online) needs to:\n\n- **Detect fraud** in credit card transactions within 30 seconds\n- **Score anomalies** using ML (Isolation Forest) combined with rule-based detection\n- **Handle failures gracefully** with Dead Letter Queue and schema evolution\n- **Serve dashboards** with fresh data (\u003c 1 minute latency)\n- **Run on $0 budget** using on-premise K3s\n\nStreamFlow solves this with an event-driven streaming pipeline: transactions flow through Kafka, are processed by Flink with fault-tolerant keyed state (KeyedProcessFunction + RocksDB), scored by ML + 5 business rules, persisted in PostgreSQL using Medallion architecture, orchestrated by Airflow, deployed via ArgoCD GitOps, and monitored with SLO-based Prometheus alerting.\n\n---\n\n## Key Metrics\n\n| Metric | Value |\n|--------|-------|\n| End-to-End Latency | \u003c 30 seconds |\n| Throughput | 10,000+ events/second |\n| Infrastructure Cost | $0 (on-premise K3s) |\n| Test Coverage | \u003e 80% (254 tests) |\n| Fraud Detection | 5 rules + ML (Isolation Forest) |\n| Deployment | ArgoCD GitOps with auto-sync |\n| Checkpointing | EXACTLY_ONCE with RocksDB |\n| SLO Target | 99.5% availability |\n\n---\n\n## Architecture\n\n```\n                                    ┌──────────────────────────────────────┐\n                                    │        K3s CLUSTER (Single-Node)     │\n                                    │    4 CPU | 9.7GB RAM | 598GB Disk   │\n                                    │    Managed by: Terraform + Terragrunt│\n┌─────────────────┐                 ├──────────────────────────────────────┤\n│  Event Sources   │                │                                      │\n│ ─────────────── │  Kafka Producer │  streamflow-kafka (Strimzi/KRaft)   │\n│ POS Terminals   │ ──────────────▶ │  ┌──────────────────────────────┐   │\n│ Website         │                 │  │ transactions.raw (3 partitions)│   │\n│ Mobile App      │                 │  │ fraud.alerts                  │   │\n│ Synthetic Gen   │                 │  │ metrics.realtime              │   │\n│                 │                 │  │ streamflow.dlq (Dead Letter)  │   │\n└─────────────────┘                 │  └──────────────┬───────────────┘   │\n                                    │                 │                    │\n                                    │  streamflow-processing (Flink K8s)  │\n                                    │  ┌──────────────▼───────────────┐   │\n                                    │  │ transaction-processor        │   │\n                                    │  │   └─▶ Parse + Validate      │   │\n                                    │  │   └─▶ DLQ (malformed events)│   │\n                                    │  │                              │   │\n                                    │  │ fraud-detector (KeyedProcess)│   │\n                                    │  │   └─▶ 5 Rules + ML Score   │   │\n                                    │  │   └─▶ RocksDB State        │   │\n                                    │  │   └─▶ EXACTLY_ONCE         │   │\n                                    │  │                              │   │\n                                    │  │ realtime-aggregator          │   │\n                                    │  │   └─▶ Windowed Metrics     │   │\n                                    │  └──────────────┬───────────────┘   │\n                                    │                 │                    │\n                                    │  streamflow-data (CloudNativePG)    │\n                                    │  ┌──────────────▼───────────────┐   │\n                                    │  │ PostgreSQL 16                │   │\n                                    │  │  bronze.* (raw events)      │   │\n                                    │  │  silver.* (cleaned, deduped)│   │\n                                    │  │  gold.*   (star schema)     │   │\n                                    │  └──────────────┬───────────────┘   │\n                                    │                 │                    │\n                                    │  streamflow-orchestration (Airflow) │\n                                    │  ┌──────────────▼───────────────┐   │\n                                    │  │ bronze_to_silver  (@hourly)  │   │\n                                    │  │ silver_to_gold    (@hourly)  │   │\n                                    │  │ data_quality      (*/15 min) │   │\n                                    │  │ maintenance       (@daily)   │   │\n                                    │  └─────────────────────────────┘   │\n                                    │                                      │\n                                    │  streamflow-monitoring               │\n                                    │  ┌─────────────────────────────┐    │\n                                    │  │ Prometheus + Grafana         │    │\n                                    │  │ 4 Dashboards | 9 Alert Rules│    │\n                                    │  │ 5 SLO Definitions           │    │\n                                    │  │ Error Budget Tracking        │    │\n                                    │  └─────────────────────────────┘    │\n                                    └──────────────────────────────────────┘\n```\n\n---\n\n## Fraud Detection Engine\n\n### Hybrid Approach: Rules + ML\n\nThe fraud detector uses a `KeyedProcessFunction` with fault-tolerant state backed by RocksDB. Each customer's state (transaction history, statistics, location) is checkpointed with EXACTLY_ONCE semantics.\n\n**5 Business Rules:**\n\n| Rule | ID | Logic | Weight |\n|------|----|-------|--------|\n| High Value | FR-001 | Amount \u003e 3x customer average (after 5+ txns) | 0.30 |\n| Velocity | FR-002 | \u003e 5 transactions in 10-minute window | 0.25 |\n| Geographic | FR-003 | Impossible travel (\u003e 500km in \u003c 1 hour, Haversine) | 0.20 |\n| Time Anomaly | FR-004 | Unusual hour (z-score \u003e 2.0 via Welford's algorithm) | 0.15 |\n| Blacklist | FR-005 | Customer or store on fraud list | 0.10 |\n\n**ML Scoring (FR-006):**\n\n| Component | Detail |\n|-----------|--------|\n| Algorithm | Isolation Forest (unsupervised anomaly detection) |\n| Features | 6-dimensional vector: amount_zscore, velocity_count, time_deviation, geo_speed_kmh, is_blacklisted, amount_ratio |\n| Training | Offline on synthetic data (10k transactions) |\n| Integration | `score_final = alpha * ml_score + (1 - alpha) * rules_score` (alpha=0.3) |\n| Score Range | Normalized to [0, 1] via decision_function mapping |\n\n**Final Score:** Weighted combination of rules + ML. Alert generated when `score \u003e= 0.7`.\n\n### State Management\n\n```\nCustomerFraudState (per customer_id, in RocksDB ValueState)\n├── amount_stats: RunningStats  (Welford's online algorithm, O(1) memory)\n├── hour_stats: RunningStats    (transaction hour distribution)\n├── last_location: GeoLocation  (lat, lon, timestamp)\n├── velocity_window: list[float] (10-min sliding window timestamps)\n├── is_blacklisted: bool\n└── Serialization: JSON via to_bytes()/from_bytes()\n    └── Contract tests ensure checkpoint compatibility\n```\n\n---\n\n## Dead Letter Queue\n\nMalformed events are routed to a Dead Letter Queue instead of being silently dropped:\n\n```\ntransactions.raw ──▶ transaction-processor\n                         │\n                         ├── Valid ──▶ Bronze tables\n                         │\n                         └── Invalid ──▶ streamflow.dlq topic\n                                          │\n                                          ├── original_event (truncated to 10KB)\n                                          ├── error_type + error_message\n                                          ├── source_topic + timestamp\n                                          └── schema_version (forward-compatible)\n```\n\nSchema evolution uses version-aware parsing: unknown schema versions trigger a warning but attempt best-effort parsing, ensuring forward compatibility during rolling upgrades.\n\n---\n\n## Data Model: Medallion Architecture\n\n```\n┌──────────────────┐      ┌──────────────────┐      ┌──────────────────┐\n│   BRONZE LAYER   │      │   SILVER LAYER   │      │    GOLD LAYER    │\n│   (Raw Events)   │      │   (Cleaned)      │      │  (Star Schema)   │\n│ ──────────────── │      │ ──────────────── │      │ ──────────────── │\n│ raw_transactions │ ───▶ │ clean_transactions│ ───▶ │ fact_transactions│\n│ raw_fraud_alerts │      │ customers        │      │ fact_fraud_alerts│\n│                  │      │ stores           │      │ dim_customer     │\n│                  │      │ products         │      │ dim_store        │\n│                  │      │                  │      │ dim_product      │\n│                  │      │                  │      │ dim_date         │\n│                  │      │                  │      │ agg_hourly_sales │\n│                  │      │                  │      │ agg_daily_fraud  │\n└──────────────────┘      └──────────────────┘      └──────────────────┘\n   Flink JDBC Sink           Airflow @hourly           Airflow @hourly\n   (real-time)               (incremental)             (star schema)\n```\n\nAll layers coexist as PostgreSQL schemas (not separate databases) — single connection, no cross-DB foreign data wrappers needed.\n\n---\n\n## Technology Stack\n\n| Category | Technology | Purpose |\n|----------|-----------|---------|\n| **Streaming** | Apache Kafka (Strimzi, KRaft) | Event backbone, no ZooKeeper |\n| **Processing** | Apache Flink (PyFlink, K8s Operator) | KeyedProcessFunction, RocksDB state, EXACTLY_ONCE |\n| **ML** | scikit-learn (Isolation Forest) | Unsupervised anomaly detection |\n| **Orchestration** | Apache Airflow (LocalExecutor) | Batch DAGs with TaskGroups, SLA, callbacks |\n| **Database** | PostgreSQL 16 (CloudNativePG) | Medallion architecture (Bronze/Silver/Gold) |\n| **Infrastructure** | K3s + Terraform + Terragrunt | Operator-native Kubernetes, IaC |\n| **Deployment** | ArgoCD | GitOps with auto-sync, self-heal, prune |\n| **Monitoring** | Prometheus + Grafana | 4 dashboards, 9 alert rules, 5 SLOs |\n| **Security** | NetworkPolicies + PDBs | Namespace isolation, non-root pods |\n| **CI/CD** | GitHub Actions | Lint, typecheck, test matrix, security audit, K8s validation |\n| **Quality** | ruff + mypy --strict + pytest | Zero warnings, zero type errors, 254+ tests |\n\n---\n\n## Architecture Decisions (ADRs)\n\n| # | Decision | Choice | Rationale |\n|---|----------|--------|-----------|\n| 001 | Kafka Metadata | KRaft (no ZooKeeper) | Saves ~512MB RAM, ZK deprecated in Kafka 4.0+ |\n| 002 | Airflow Executor | LocalExecutor | KubernetesExecutor too resource-hungry for single-node |\n| 003 | PostgreSQL HA | Single instance | No replica needed for portfolio project |\n| 004 | Medallion Layers | Schemas (not databases) | Single connection, no cross-DB FDW needed |\n| 005 | Infrastructure | Terraform + Terragrunt | State management, declarative, code-reviewable |\n| 006 | Code Structure | Python monorepo | Shared Pydantic models, single test suite |\n| 007 | Terraform State | Local backend | Single developer, $0 cost |\n| 008 | Fraud State | KeyedProcessFunction + RocksDB | Fault-tolerant, survives restarts, EXACTLY_ONCE |\n| 009 | ML Integration | Offline training, online scoring | Isolation Forest loaded in Flink open(), scored per-event |\n| 010 | Error Handling | Dead Letter Queue | Never lose data, investigate failures later |\n| 011 | Deployment | ArgoCD GitOps | Auto-sync, self-heal, declarative, audit trail |\n| 012 | Monitoring | SLO-based with error budgets | 99.5% availability, p99 latency \u003c 5s |\n\nFull ADR documentation: [ARCHITECTURE.md](ARCHITECTURE.md)\n\n---\n\n## Repository Structure\n\n```\nstreamflow-analytics/\n├── src/\n│   ├── models/                          # Pydantic v2 data models\n│   │   ├── transaction.py               #   Transaction event schema\n│   │   ├── customer.py                  #   Customer profile\n│   │   ├── store.py                     #   Store reference\n│   │   └── fraud_alert.py               #   Alert with FR-001..FR-006\n│   ├── flink_jobs/                      # PyFlink streaming jobs\n│   │   ├── transaction_processor.py     #   Kafka -\u003e validate -\u003e Bronze (DLQ for invalid)\n│   │   ├── fraud_detector.py            #   FraudRuleEvaluator (5 rules, pure Python)\n│   │   ├── fraud_detector_function.py   #   KeyedProcessFunction (Flink state)\n│   │   ├── fraud_pipeline.py            #   Pipeline builder (source -\u003e keyed -\u003e sinks)\n│   │   ├── common/\n│   │   │   ├── state.py                 #   CustomerFraudState, RunningStats, GeoLocation\n│   │   │   ├── serialization.py         #   Schema-versioned JSON serialization\n│   │   │   ├── dlq.py                   #   Dead Letter Queue record builder\n│   │   │   └── schemas.py              #   Flink SQL schema definitions\n│   │   └── ml/\n│   │       ├── feature_engineering.py   #   6-feature vector extraction\n│   │       └── model_scorer.py          #   Isolation Forest scorer [0,1]\n│   ├── dags/                            # Airflow DAGs\n│   │   ├── bronze_to_silver.py          #   TaskGroups, SLA, callbacks\n│   │   ├── silver_to_gold.py            #   Per-dimension/fact tasks\n│   │   ├── data_quality.py              #   4 quality checks (null, dup, fresh, amount)\n│   │   ├── maintenance.py               #   Prune + VACUUM with notifications\n│   │   └── common/callbacks.py          #   Shared failure/success/SLA callbacks\n│   ├── generators/                      # Synthetic data generation\n│   │   ├── transaction_generator.py     #   Realistic Brazilian transactions\n│   │   ├── customer_generator.py        #   Customer profiles with risk levels\n│   │   ├── store_generator.py           #   10 Brazilian cities\n│   │   ├── fraud_patterns.py            #   Fraud injection (4 patterns)\n│   │   └── kafka_producer.py            #   Confluent Kafka wrapper\n│   └── utils/\n│       ├── config.py                    #   YAML config loader with merge\n│       ├── logging.py                   #   Structured JSON logging\n│       ├── db.py                        #   PostgreSQL connection helper\n│       └── metrics.py                   #   Business metrics collector\n│\n├── tests/\n│   ├── unit/                            # 13 test files, 160+ tests\n│   │   ├── test_models.py               #   Pydantic model validation\n│   │   ├── test_fraud_detector.py       #   FraudRuleEvaluator rules\n│   │   ├── test_fraud_rule_evaluator.py #   Renamed evaluator tests\n│   │   ├── test_fraud_detector_function.py  # KeyedProcessFunction mocks\n│   │   ├── test_feature_engineering.py  #   Feature vector extraction\n│   │   ├── test_model_scorer.py         #   ML scorer range [0,1]\n│   │   ├── test_dlq.py                  #   DLQ record structure\n│   │   ├── test_serialization.py        #   Schema-versioned parsing\n│   │   ├── test_state_serialization.py  #   to_bytes()/from_bytes()\n│   │   ├── test_generators.py           #   Transaction/fraud generation\n│   │   ├── test_config.py               #   Config loading + merge\n│   │   └── test_metrics.py              #   MetricsCollector\n│   ├── integration/                     # Cross-module integration\n│   │   ├── test_full_pipeline.py        #   Generator -\u003e Fraud Engine flow\n│   │   └── test_ml_pipeline.py          #   State -\u003e Features -\u003e ML Score\n│   ├── contract/                        # Schema compatibility contracts\n│   │   ├── test_transaction_schema.py   #   Transaction producer/consumer\n│   │   ├── test_fraud_alert_schema.py   #   Alert schema guarantee\n│   │   ├── test_dlq_schema.py           #   DLQ record structure\n│   │   └── test_state_compat.py         #   State serialization roundtrip\n│   └── conftest.py                      #   Shared fixtures (state, ML, transactions)\n│\n├── sql/\n│   ├── migrations/                      #   001-004: Bronze, Silver, Gold, Indexes\n│   ├── transforms/\n│   │   ├── bronze_to_silver.sql         #   Incremental dedup transform\n│   │   ├── silver_to_gold.sql           #   Star schema build\n│   │   ├── update_customer_stats.sql\n│   │   └── gold/                        #   Per-dimension/fact SQL (7 files)\n│   └── quality/                         #   Per-check SQL (4 files)\n│\n├── infra/\n│   ├── modules/                         #   6 Terraform modules\n│   │   ├── namespaces/                  #   K8s namespace creation\n│   │   ├── strimzi-kafka/               #   Strimzi + Kafka CRD\n│   │   ├── flink-operator/              #   Flink K8s Operator\n│   │   ├── cloudnativepg/               #   CNPG + PostgreSQL CRD\n│   │   ├── airflow/                     #   Airflow Helm release\n│   │   └── monitoring/                  #   kube-prometheus-stack\n│   └── environments/dev/                #   Terragrunt DRY configs\n│\n├── k8s/\n│   ├── kafka/kafka-topics.yaml          #   4 KafkaTopics (incl. DLQ)\n│   ├── flink/                           #   3 FlinkDeployments\n│   │   ├── fraud-detector.yaml          #     RocksDB, EXACTLY_ONCE, savepoint\n│   │   ├── transaction-processor.yaml   #     Security context, Prometheus\n│   │   └── realtime-aggregator.yaml     #     Non-root, capabilities dropped\n│   ├── security/\n│   │   ├── network-policies.yaml        #   5 NetworkPolicies (namespace isolation)\n│   │   └── pod-disruption-budgets.yaml  #   3 PDBs (Kafka, PG, Flink)\n│   ├── monitoring/\n│   │   ├── service-monitors.yaml        #   Prometheus ServiceMonitors\n│   │   ├── alerting-rules.yaml          #   9 PrometheusRules\n│   │   └── slo-rules.yaml              #   5 SLO definitions + error budgets\n│   ├── argocd/application.yaml          #   GitOps Application (auto-sync, self-heal)\n│   └── kustomization.yaml              #   Kustomize base for all resources\n│\n├── scripts/\n│   ├── deploy.sh                        #   Deploy (ArgoCD/kubectl) + smoke tests\n│   ├── setup.sh                         #   Cluster bootstrap\n│   ├── generate_events.py               #   CLI event generator\n│   ├── seed_data.py                     #   Seed Silver reference data\n│   ├── run_migrations.py                #   SQL migration runner\n│   ├── verify_pipeline.py               #   E2E health check\n│   └── train_model.py                   #   Offline ML model training\n│\n├── config/                              #   YAML configurations\n├── docs/                                #   ARCHITECTURE, FRAUD_DETECTION, RUNBOOK\n├── .github/workflows/\n│   ├── ci.yaml                          #   Lint + Type + Test matrix + Security + K8s\n│   └── deploy.yaml                      #   ArgoCD/kubectl deploy + smoke tests\n├── pyproject.toml                       #   ruff, mypy, pytest config\n└── Makefile                             #   Dev commands (18 targets)\n```\n\n---\n\n## Getting Started\n\n### Prerequisites\n\n- K3s cluster (or any Kubernetes 1.28+)\n- Python 3.11+\n- Terraform 1.7+ and Terragrunt\n- kubectl configured for your cluster\n\n### Quick Setup\n\n```bash\n# 1. Clone and install\ngit clone https://github.com/arthurmaiagraf/streamflow-analytics.git\ncd streamflow-analytics\npip install -e \".[dev]\"\n\n# 2. Deploy infrastructure (5 namespaces + all operators)\nbash scripts/setup.sh\n\n# 3. Deploy application (security + Kafka + Flink + monitoring)\nmake deploy\n\n# 4. Run database migrations\npython scripts/run_migrations.py --env dev\n\n# 5. Seed reference data\npython scripts/seed_data.py --env dev\n\n# 6. Train ML model (optional — generates models/fraud_model.joblib)\npython scripts/train_model.py\n\n# 7. Start generating events\npython scripts/generate_events.py --rate 100 --duration 600\n\n# 8. Verify everything\npython scripts/verify_pipeline.py\n```\n\n### Alternative: ArgoCD GitOps Deploy\n\n```bash\n# One command — ArgoCD handles everything via Git sync\nmake deploy-argocd\n\n# ArgoCD will auto-sync on every git push to main\n# Self-heal if someone manually changes resources\n# Prune resources removed from Git\n```\n\n### Access UIs\n\n```bash\n# Grafana (4 dashboards + SLO tracking)\nkubectl port-forward -n streamflow-monitoring svc/kube-prometheus-stack-grafana 3000:80\n\n# Airflow (5 DAGs with TaskGroups)\nkubectl port-forward -n streamflow-orchestration svc/airflow-webserver 8080:8080\n\n# Flink (job dashboard + checkpoints)\nkubectl port-forward -n streamflow-processing svc/flink-jobmanager 8081:8081\n\n# ArgoCD (GitOps dashboard)\nkubectl port-forward svc/argocd-server -n argocd 8443:443\n```\n\n---\n\n## Testing\n\n```bash\n# Full test suite (254 tests)\nmake test\n\n# By category\nmake test-unit          # Unit tests (160+ tests)\nmake test-contract      # Schema contract tests (48 tests)\nmake test-integration   # Cross-module integration (10 tests)\n\n# With coverage\nmake test-cov           # Fails if \u003c 80% coverage\n\n# Quality gates\nmake lint               # ruff (strict: E, F, I, UP, B, SIM, N, RUF)\nmake typecheck          # mypy --strict (zero errors)\nmake check              # lint + typecheck + test (all in one)\n```\n\n### Test Categories\n\n| Category | Tests | Purpose |\n|----------|-------|---------|\n| **Unit** | 170+ | Individual function/class behavior + chaos tests |\n| **Contract** | 48 | Schema compatibility between producers/consumers |\n| **Integration** | 10 | Cross-module flows (Generator -\u003e Engine -\u003e Alert) |\n| **Chaos** | 34 | State corruption, extreme inputs, ML degradation |\n\n**Contract tests** guarantee that schema changes don't break downstream consumers. They validate:\n- Required fields and validation rules\n- Serialization format compatibility\n- State checkpoint roundtrip safety (to_bytes/from_bytes)\n- Forward compatibility for schema evolution\n\n---\n\n## Monitoring \u0026 Observability\n\n### Grafana Dashboards\n\n| Dashboard | Key Metrics |\n|-----------|-------------|\n| **Pipeline Overview** | Total events, throughput, latency, error rate |\n| **Kafka Metrics** | Consumer lag, messages in/out, partition health |\n| **Flink Processing** | Records/sec, checkpoint duration, backpressure |\n| **Fraud Monitoring** | Alert count, fraud rate, rule breakdown, ML scores |\n\n### Alert Rules (9)\n\n| Alert | Condition | Severity |\n|-------|-----------|----------|\n| KafkaConsumerLagHigh | Lag \u003e 10,000 events | Warning |\n| KafkaBrokerDown | Broker offline \u003e 2 min | Critical |\n| FlinkJobNotRunning | Job state != RUNNING \u003e 5 min | Critical |\n| FlinkCheckpointFailing | No checkpoint \u003e 10 min | Warning |\n| PostgresHighConnections | \u003e 80% connections used | Warning |\n| PostgresPVCNearlyFull | Disk \u003e 80% | Critical |\n| HighFraudRate | \u003e 5% fraud in 1 hour | Warning |\n| PipelineLatencyHigh | E2E latency \u003e 60s | Warning |\n| AirflowDAGFailure | DAG failure | Warning |\n\n### SLO Definitions (5)\n\n| SLO | Target | Window |\n|-----|--------|--------|\n| Availability | 99.5% | 30 days |\n| Latency (p99) | \u003c 5 seconds | 5 min |\n| Data Freshness | \u003c 5 minutes | 10 min |\n| Error Budget | Monthly burn rate | 30 days |\n| Fraud Detection (p95) | \u003c 2 seconds | 5 min |\n\n---\n\n## Batch Pipeline (Airflow DAGs)\n\n| DAG | Schedule | Tasks | Features |\n|-----|----------|-------|----------|\n| `bronze_to_silver` | `@hourly` | 6 | TaskGroups, pre/post validation, SLA |\n| `silver_to_gold` | `@hourly` | 9 | Per-dimension/fact tasks, parallel dims |\n| `data_quality` | `*/15 min` | 5 | 4 SQL checks + alert notification |\n| `maintenance` | `Daily 03:00` | 4 | Parallel prune + VACUUM ANALYZE |\n\nAll DAGs include: structured callbacks (failure/success/SLA miss), exponential retry, pool-based concurrency.\n\n---\n\n## Kubernetes Security\n\n| Control | Implementation |\n|---------|---------------|\n| **Pod Security** | runAsNonRoot, runAsUser: 9999, capabilities: drop ALL |\n| **Network Isolation** | 5 NetworkPolicies (default-deny + allow-lists) |\n| **Disruption Budget** | PDBs for Kafka, PostgreSQL, Flink JobManager |\n| **GitOps** | ArgoCD auto-sync with self-heal (drift protection) |\n| **CI Security** | pip-audit dependency scanning, kubeval manifest validation |\n\n---\n\n## CI/CD Pipeline\n\n```\nPush to main/PR\n    │\n    ├── Lint (ruff check)\n    ├── Type Check (mypy --strict)\n    ├── Security (pip-audit)\n    └── K8s Validate (kubeval)\n            │\n            ▼\n    Test Matrix (Python 3.11 + 3.12)\n        ├── Unit Tests\n        ├── Contract Tests\n        ├── Integration Tests\n        └── Coverage Report (\u003e80%)\n            │\n            ▼ (manual trigger)\n    Deploy\n        ├── Pre-deploy (contract tests + kubeval)\n        ├── Deploy (ArgoCD sync OR kubectl apply)\n        └── Post-deploy Smoke Tests\n```\n\n---\n\n## Documentation\n\n| Document | Description |\n|----------|-------------|\n| [ARCHITECTURE.md](ARCHITECTURE.md) | ADRs, component details, data flow |\n| [ROADMAP.md](ROADMAP.md) | Staff-level upgrade plan and phases |\n| [docs/SETUP_GUIDE.md](docs/SETUP_GUIDE.md) | Step-by-step setup guide |\n| [docs/FRAUD_DETECTION.md](docs/FRAUD_DETECTION.md) | Fraud rules, ML integration, tuning |\n| [docs/RUNBOOK.md](docs/RUNBOOK.md) | Operations runbook and alert response |\n\n---\n\n## Author\n\n**Arthur Maia Graf**\n\nStaff Data Engineer | Kafka | Flink | Airflow | PostgreSQL | Kubernetes | ML\n\n---\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farthurmgraf%2Fstreamflow-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farthurmgraf%2Fstreamflow-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farthurmgraf%2Fstreamflow-analytics/lists"}