{"id":32636430,"url":"https://github.com/mjdevaccount/market-data-store","last_synced_at":"2025-10-31T01:02:06.976Z","repository":{"id":316669916,"uuid":"1064256159","full_name":"mjdevaccount/market-data-store","owner":"mjdevaccount","description":"Production market data infrastructure: TimescaleDB + FastAPI control-plane, async sinks, Python client. Handles OHLCV bars, fundamentals, news, options. Features: RLS isolation, backpressure mgmt, Prometheus metrics, cross-repo testing. Built for scale.","archived":false,"fork":false,"pushed_at":"2025-10-29T23:09:10.000Z","size":786,"stargazers_count":1,"open_issues_count":6,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-29T23:42:55.726Z","etag":null,"topics":["async","data-infrastructure","fastapi","financial-data","market-data","postgresql","prometheus","python","time-series","timescaledb"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mjdevaccount.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-25T19:08:48.000Z","updated_at":"2025-10-29T23:09:13.000Z","dependencies_parsed_at":"2025-10-22T23:22:31.467Z","dependency_job_id":"1b0f859b-e688-4848-aab9-cd4cf444e005","html_url":"https://github.com/mjdevaccount/market-data-store","commit_stats":null,"previous_names":["mjdevaccount/market-data-store"],"tags_count":65,"template":false,"template_full_name":null,"purl":"pkg:github/mjdevaccount/market-data-store","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjdevaccount%2Fmarket-data-store","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjdevaccount%2Fmarket-data-store/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjdevaccount%2Fmarket-data-store/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjdevaccount%2Fmarket-data-store/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mjdevaccount","download_url":"https://codeload.github.com/mjdevaccount/market-data-store/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mjdevaccount%2Fmarket-data-store/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281734997,"owners_count":26552486,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["async","data-infrastructure","fastapi","financial-data","market-data","postgresql","prometheus","python","time-series","timescaledb"],"created_at":"2025-10-31T01:01:15.466Z","updated_at":"2025-10-31T01:02:06.951Z","avatar_url":"https://github.com/mjdevaccount.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🚀 market-data-store\n\n\u003e **High-performance market data infrastructure** with TimescaleDB, RLS, production-ready client library, and async sinks\n\n**Hybrid architecture** providing both control-plane and data-plane capabilities:\n\n- 🗄️ **Migrations \u0026 policies** (TimescaleDB)\n- 🔧 **Admin endpoints**: health, readiness, schema/version, migrate, retention/compression, backfills, aggregates\n- 📊 **Prometheus** metrics + observability\n- 🐍 **`mds_client` library**: Production-ready Python client for Market Data Core with sync/async APIs, RLS, and tenant isolation\n- 🚰 **Async sinks** (Phase 4.1): High-throughput ingestion with backpressure awareness\n\n\u003e 💡 The `mds_client` library provides direct in-process access for Market Data Core. No HTTP latency - Core imports and uses the library directly with connection pooling, RLS, and TimescaleDB integration.\n\n\u003e ⚡ NEW: **Async Sinks Layer** (Phase 4.1) - Stream-oriented ingestion with automatic Prometheus metrics, error handling, and flow control readiness.\n\n\u003e 🔥 **LATEST**: **Config-Driven Pipeline Support** (Phase 11.3) - Provider-based OHLCV ingestion with `bars_ohlcv` table, `StoreClient`, audit-grade job tracking, diff-aware upserts, and compression policies. Supports 10K+ bars/sec throughput for live and backfill operations.\n\n---\n\n## 🎯 **Dual Ingestion Architecture**\n\nThis store supports **two parallel ingestion paths**:\n\n### **Path 1: Tenant-Based System** (Existing)\n- **Tables**: `bars`, `fundamentals`, `news`, `options_snap`\n- **Client**: `mds_client` (MDS/AMDS) with RLS\n- **Use Case**: Multi-tenant analytics platform\n- **Features**: Row-level security, tenant isolation, comprehensive data types\n\n### **Path 2: Provider-Based Pipeline** (NEW - Phase 11.3)\n- **Tables**: `bars_ohlcv`, `job_runs`\n- **Client**: `datastore.StoreClient` / `AsyncStoreClient`\n- **Use Case**: Config-driven market data pipeline (live + backfill)\n- **Features**:\n  - 🚀 High throughput (10K+ bars/sec)\n  - 🔄 Diff-aware upserts (replay-safe)\n  - 📦 Smart batching (COPY for 1000+ rows)\n  - 📊 Job execution tracking with heartbeats\n  - 🗜️ Automatic compression (90-day hot tier)\n  - 🔍 Prometheus metrics\n\n**Architecture Diagram:**\n\n```\n┌─────────────────────────────────────────────────────────────────────┐\n│                        market_data_store                             │\n├─────────────────────────────────────────────────────────────────────┤\n│                                                                       │\n│  ┌───────────────────────────────┐  ┌──────────────────────────────┐│\n│  │ Tenant-Based (Existing)       │  │ Provider-Based (NEW)         ││\n│  │ ────────────────────────      │  │ ─────────────────────        ││\n│  │                               │  │                              ││\n│  │ mds_client (AMDS)             │  │ StoreClient                  ││\n│  │   ↓                           │  │   ↓                          ││\n│  │ bars (tenant_id, RLS)         │  │ bars_ohlcv (provider-based)  ││\n│  │ fundamentals                  │  │ job_runs (audit trail)       ││\n│  │ news                          │  │                              ││\n│  │ options_snap                  │  │ Features:                    ││\n│  │                               │  │ • Diff-aware upserts         ││\n│  │ Features:                     │  │ • Smart batching             ││\n│  │ • Multi-tenant isolation      │  │ • Compression (90d)          ││\n│  │ • RLS enforcement             │  │ • Heartbeat tracking         ││\n│  │ • Comprehensive data types    │  │ • Config fingerprinting      ││\n│  └───────────────────────────────┘  └──────────────────────────────┘│\n│                                                                       │\n│  TimescaleDB (Hypertables + Compression)                             │\n└─────────────────────────────────────────────────────────────────────┘\n```\n\n---\n\n## 📂 Project Layout \u0026 Description\n\nThis repository is structured as a **control-plane** with clear separation between infrastructure, schema management, service layer, and automation rules.\n\nBelow is a snapshot of the repo's structure with logical groupings to help new contributors and automation tools (like Cursor) navigate effectively.\n\n### 🏗️ **Infra \u0026 Ops**\n```bash\n├── docker-compose.yml             # Docker services configuration\n├── Dockerfile                     # Container build instructions\n├── Makefile                       # Build and deployment automation\n├── docker/                        # Docker-related files\n│   └── initdb.d/                  # Initial SQL scripts for DB setup\n│       └── 00_timescale.sql       # TimescaleDB initialization script\n└── tools/                         # Auxiliary scripts, CLI utilities\n    └── build_solution_manifest.py # Solution manifest builder\n```\n\n### 🗄️ **Schema \u0026 Migrations**\n```bash\n├── alembic.ini                         # Alembic configuration for migrations\n├── migrations/                         # Alembic migration files\n│   ├── env.py                          # Migration environment config\n│   ├── script.py.mako                  # Migration template\n│   └── versions/                       # Migration version files\n├── src/datastore/aggregates.py         # Continuous aggregates definitions\n└── src/datastore/timescale_policies.py # TimescaleDB retention/compression policies\n```\n\n### 🚀 **Service Layer**\n```bash\n├── src/datastore/                 # Control-plane: migrations, policies, admin endpoints\n│   ├── __init__.py                # Package init\n│   ├── cli.py                     # CLI for migrations, policies, seeds\n│   ├── config.py                  # App configuration\n│   ├── idempotency.py             # Conflict/idempotency helpers\n│   ├── reads.py                   # Read ops (ops/tests support)\n│   ├── writes.py                  # Write ops (batch/upserts)\n│   └── service/                   # FastAPI service layer\n│       └── app.py                 # FastAPI app with admin endpoints\n└── src/mds_client/                # Client library for Market Data Core\n    ├── __init__.py                # Library exports (MDS, AMDS, models, batch processors)\n    ├── client.py                  # MDS (sync client facade) with psycopg 3 + ConnectionPool\n    ├── aclient.py                 # AMDS (async client facade) with AsyncConnectionPool\n    ├── models.py                  # Pydantic data models with validation\n    ├── sql.py                     # Canonical SQL with named parameters and ON CONFLICT upserts\n    ├── rls.py                     # Row Level Security helpers (DSN options + context managers)\n    ├── errors.py                  # Structured exception hierarchy with psycopg error mapping\n    ├── utils.py                   # NDJSON processing with gzip support and model coercion\n    ├── batch.py                   # Production-safe batch processing (sync + async)\n    └── cli.py                     # Comprehensive operational CLI with environment variables\n```\n\n### 🤖 **Automation Rules**\n```bash\n├── cursorrules/                   # Cursor rules (automation home base)\n│   ├── index.mdc                  # Main rules index\n│   ├── README.md                  # Rules documentation\n│   ├── solution_manifest.json     # Asset lookup configuration\n│   └── rules/                     # Task-specific rule definitions\n```\n\n### 🧭 **How to Navigate**\n\n🗄️ **DB Migrations** → [`/migrations/versions/`](migrations/versions/)\n\n🚀 **Admin Endpoints** → [`/src/datastore/service/app.py`](src/datastore/service/app.py)\n```bash\n# Run FastAPI service (admin endpoints)\nuvicorn datastore.service.app:app --host 0.0.0.0 --port 8000 --factory\n```\n\n📊 **Policies \u0026 Aggregates** → [`/src/datastore/timescale_policies.py`](src/datastore/timescale_policies.py), [`/src/datastore/aggregates.py`](src/datastore/aggregates.py)\n\n🛠️ **Control-plane CLI** → [`/src/datastore/cli.py`](src/datastore/cli.py)\n\n📦 **Client Library** → [`/src/mds_client/`](src/mds_client/) - For Market Data Core integration\n\n🔧 **Client CLI** → [`/src/mds_client/cli.py`](src/mds_client/cli.py) - Operational commands (`mds` command)\n\n🤖 **Cursor Rules \u0026 Automation** → [`/cursorrules/`](cursorrules/) (Cursor's self-bootstrap home)\n\n🏗️ **Infra \u0026 Deployment** → [`/docker/`](docker/), [`Dockerfile`](Dockerfile), [`docker-compose.yml`](docker-compose.yml)\n\n⚙️ **Project Config** → [`pyproject.toml`](pyproject.toml)\n\n## 📋 Releases\n\n### 🏷️ Current Release: [v0.4.0]\n\n**What's included:**\n- ✅ **Core v1.1.0 contract adoption** - FeedbackEvent extends Core DTO\n- ✅ **Adapter pattern** - Preserves Store fields while maintaining Core compatibility\n- ✅ **Health DTOs** - `/healthz` and `/readyz` return Core `HealthStatus`\n- ✅ Complete `mds_client` library with sync/async APIs\n- ✅ Production-ready batch processing and backup/restore\n- ✅ Comprehensive CLI with all operational commands\n- ✅ Full documentation and troubleshooting guides\n- ✅ RLS security and tenant isolation\n\n### Previous Release: [v0.3.0]\n- ✅ Write coordinator with backpressure feedback\n- ✅ Async sinks layer\n- ✅ HTTP feedback broadcaster\n\n### 📦 Installation from Release\n```bash\n# Install specific version\npip install git+https://github.com/mjdevaccount/market-data-store.git@v0.1.0#subdirectory=src\n\n# Install latest version\npip install git+https://github.com/mjdevaccount/market-data-store.git#subdirectory=src\n```\n\n## 🚀 Quick Start\n\n### 📦 Installation\n\n#### Option 1: Install from Git (Recommended)\n```bash\n# Install the mds_client library directly from this repository\npip install git+https://github.com/mjdevaccount/market-data-store.git@v0.1.0#subdirectory=src\n\n# Or install the latest version\npip install git+https://github.com/mjdevaccount/market-data-store.git#subdirectory=src\n```\n\n#### Option 2: Development Setup\n```powershell\n# Clone and setup for development\ngit clone https://github.com/mjdevaccount/market-data-store.git\ncd market-data-store\n\n# Create and activate virtual environment\npython -m venv .venv\n.\\.venv\\Scripts\\Activate.ps1\n\n# Install dependencies\npip install -r requirements.txt\n\n# For development\npip install -r requirements-dev.txt\n```\n\n### Prerequisites\n- Python 3.11+\n- PostgreSQL 13+ with **TimescaleDB extension** (required)\n- Virtual environment\n\n### 🎯 Using the Released Package\n\nOnce installed, you can use the `mds_client` library in your projects:\n\n```python\n# Basic usage with cross-platform compatibility\nfrom mds_client import MDS, Bar\nfrom mds_client.runtime import boot_event_loop\nfrom datetime import datetime, timezone\n\n# Configure event loop for Windows/Docker compatibility\nboot_event_loop()\n\n# Configure client\nmds = MDS({\n    \"dsn\": \"postgresql://user:pass@host:port/db\",\n    \"tenant_id\": \"your-tenant-uuid\"\n})\n\n# Write market data\nbar = Bar(\n    tenant_id=\"your-tenant-uuid\",\n    vendor=\"ibkr\",\n    symbol=\"AAPL\",\n    timeframe=\"1m\",\n    ts=datetime.now(timezone.utc),\n    close_price=150.5,\n    volume=1000\n)\n\nmds.upsert_bars([bar])\n```\n\n```bash\n# Use the CLI\nexport MDS_DSN=\"postgresql://user:pass@host:port/db\"\nexport MDS_TENANT_ID=\"your-tenant-uuid\"\n\n# Test connection\nmds ping\n\n# Write data\nmds write-bar --vendor ibkr --symbol AAPL --timeframe 1m --ts \"2024-01-01T10:00:00Z\" --close-price 150.5 --volume 1000\n```\n\n---\n\n## 🚰 Async Sinks Layer (Phase 4.1 - NEW)\n\n\u003e **Status**: ✅ Production Ready | **Version**: v0.2.0 | **Released**: October 2025\n\nThe async sinks layer provides **high-throughput, observable ingestion** with automatic Prometheus metrics and backpressure readiness.\n\n### Key Features\n\n- ⚡ **Async-first**: Non-blocking I/O with `asyncio` and `asyncpg`\n- 📊 **Auto-metrics**: Prometheus counters and histograms for all writes\n- 🔄 **Context managers**: Clean resource management with `async with`\n- 🎯 **Type-safe**: Strong typing with Pydantic models\n- 🛡️ **Error handling**: Graceful failures with metric recording\n- 🧪 **Tested**: 12/12 unit tests passing, integration-ready\n\n### Available Sinks\n\n| Sink | Purpose | Model | Wraps |\n|------|---------|-------|-------|\n| **BarsSink** | OHLCV market data | `Bar` | `AMDS.upsert_bars()` |\n| **OptionsSink** | Options snapshots | `OptionSnap` | `AMDS.upsert_options()` |\n| **FundamentalsSink** | Company financials | `Fundamentals` | `AMDS.upsert_fundamentals()` |\n| **NewsSink** | News \u0026 sentiment | `News` | `AMDS.upsert_news()` |\n\n### Quick Start\n\n```python\nimport asyncio\nfrom datetime import datetime, timezone\nfrom mds_client import AMDS\nfrom mds_client.models import Bar\nfrom market_data_store.sinks import BarsSink\n\nasync def main():\n    # Configure AMDS client\n    config = {\n        \"dsn\": \"postgresql://user:pass@localhost:5432/marketdata\",\n        \"tenant_id\": \"your-tenant-uuid\",\n        \"pool_max\": 10\n    }\n\n    # Create sample data\n    bars = [\n        Bar(\n            tenant_id=config[\"tenant_id\"],\n            vendor=\"ibkr\",\n            symbol=\"AAPL\",\n            timeframe=\"1m\",\n            ts=datetime.now(timezone.utc),\n            close_price=150.5,\n            volume=1000\n        )\n    ]\n\n    # Write via sink (auto-metrics + error handling)\n    async with AMDS(config) as amds:\n        async with BarsSink(amds) as sink:\n            await sink.write(bars)\n\n    print(\"✅ Bars written successfully\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n---\n\n## 🆕 **Config-Driven Pipeline Usage** (Phase 11.3)\n\n### StoreClient - Provider-Based Ingestion\n\nFor config-driven pipeline operations (live/backfill), use `StoreClient` instead of `mds_client`:\n\n```python\nfrom datetime import datetime, timezone\nfrom dataclasses import dataclass\nfrom datastore import StoreClient, JobRunTracker, compute_config_fingerprint\n\n# Your provider returns bars matching this protocol\n@dataclass\nclass Bar:\n    provider: str\n    symbol: str\n    interval: str  # \"5min\", \"1d\", etc.\n    ts: datetime\n    open: float\n    high: float\n    low: float\n    close: float\n    volume: float\n\n# Example: Write bars from IBKR provider\ndef ingest_live_bars(config, bars):\n    \"\"\"Ingest bars with job tracking and config fingerprinting.\"\"\"\n\n    # Start tracking job run\n    tracker = JobRunTracker(config.database_url)\n    fingerprint = compute_config_fingerprint(config.dict())\n\n    run_id = tracker.start_run(\n        job_name=\"live_us_equities_5min\",\n        dataset_name=\"us_equities_5min\",\n        provider=\"ibkr_primary\",\n        mode=\"live\",\n        config_fingerprint=fingerprint,\n        pipeline_version=\"v1.2.0\",\n        metadata={\"git_hash\": \"abc123\", \"container_id\": \"xyz\"}\n    )\n\n    try:\n        # Write bars with diff-aware upserts\n        with StoreClient(config.database_url) as client:\n            count = client.write_bars(bars, batch_size=1000)\n\n        # Update progress with heartbeat\n        symbols = list(set(b.symbol for b in bars))\n        min_ts = min(b.ts for b in bars)\n        max_ts = max(b.ts for b in bars)\n\n        tracker.update_progress(\n            run_id=run_id,\n            rows_written=count,\n            symbols=symbols,\n            min_ts=min_ts,\n            max_ts=max_ts,\n            heartbeat=True\n        )\n\n        # Mark as success\n        tracker.complete_run(run_id, status=\"success\")\n        print(f\"✅ Wrote {count} bars (run_id={run_id})\")\n\n    except Exception as e:\n        tracker.complete_run(run_id, status=\"failure\", error_message=str(e))\n        raise\n\n# Example bars from IBKR\nbars = [\n    Bar(\n        provider=\"ibkr_primary\",\n        symbol=\"SPY\",\n        interval=\"5min\",\n        ts=datetime(2025, 1, 1, 9, 30, tzinfo=timezone.utc),\n        open=450.0,\n        high=451.0,\n        low=449.0,\n        close=450.5,\n        volume=1000000\n    ),\n    # ... more bars\n]\n\ningest_live_bars(config, bars)\n```\n\n### AsyncStoreClient - High-Performance Async Ingestion\n\n```python\nimport asyncio\nfrom datastore import AsyncStoreClient, JobRunTracker\n\nasync def ingest_bars_async(bars, db_uri):\n    \"\"\"Async ingestion with automatic batching.\"\"\"\n\n    async with AsyncStoreClient(db_uri) as client:\n        count = await client.write_bars(bars, batch_size=1000)\n\n    print(f\"✅ Wrote {count} bars asynchronously\")\n\n# Run async\nasyncio.run(ingest_bars_async(bars, \"postgresql://...\"))\n```\n\n### Key Features\n\n| Feature | Description | Benefit |\n|---------|-------------|---------|\n| **Diff-aware upserts** | `IS DISTINCT FROM` in SQL | Only updates when values change → true idempotency |\n| **Smart batching** | COPY for 1000+, executemany otherwise | Optimal performance for any batch size |\n| **Protocol-based** | Duck typing via `Bar` protocol | No hard dependency on specific classes |\n| **Job tracking** | Full lifecycle with heartbeats | Audit trail + stuck job detection |\n| **Compression** | 90-day hot tier policy | Automatic disk savings for historical data |\n| **Metrics** | Prometheus counters/histograms | Observability out of the box |\n\n### CLI - Job Management\n\n```bash\n# List recent job runs\ndatastore job-runs-list --limit 50\n\n# Inspect specific run\ndatastore job-runs-inspect 123\n\n# Find stuck jobs (no heartbeat for 15m)\ndatastore job-runs-stuck --timeout-minutes 15\n\n# View 24h summary\ndatastore job-runs-summary\n\n# Cleanup old runs\ndatastore job-runs-cleanup --older-than-days 90 --confirm\n```\n\n### Tables Created by Migration 0002\n\n#### `bars_ohlcv` - Provider-Based OHLCV Storage\n\n```sql\nCREATE TABLE bars_ohlcv (\n    provider   TEXT NOT NULL,\n    symbol     TEXT NOT NULL CHECK (symbol = UPPER(symbol)),\n    interval   TEXT NOT NULL,\n    ts         TIMESTAMPTZ NOT NULL,\n    open       DOUBLE PRECISION NOT NULL,\n    high       DOUBLE PRECISION NOT NULL,\n    low        DOUBLE PRECISION NOT NULL,\n    close      DOUBLE PRECISION NOT NULL,\n    volume     DOUBLE PRECISION NOT NULL,\n    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),\n    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),\n    PRIMARY KEY (provider, symbol, interval, ts)\n);\n```\n\n**Features:**\n- TimescaleDB hypertable (7-day chunks)\n- Compression after 90 days (segmentby `provider,symbol,interval`)\n- Uppercase symbol constraint\n- No tenant isolation (system-wide)\n\n#### `job_runs` - Audit-Grade Job Tracking\n\n```sql\nCREATE TABLE job_runs (\n    id                  BIGSERIAL PRIMARY KEY,\n    job_name            TEXT NOT NULL,\n    provider            TEXT,\n    status              TEXT NOT NULL CHECK (status IN ('running', 'success', 'failure', 'cancelled')),\n    config_fingerprint  TEXT,\n    pipeline_version    TEXT,\n    rows_written        BIGINT DEFAULT 0,\n    symbols             TEXT[],\n    min_ts              TIMESTAMPTZ,\n    max_ts              TIMESTAMPTZ,\n    metadata            JSONB DEFAULT '{}'::jsonb,\n    started_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),\n    completed_at        TIMESTAMPTZ,\n    elapsed_ms          BIGINT GENERATED ALWAYS AS (...) STORED  -- auto-computed\n);\n```\n\n**Features:**\n- Heartbeat tracking via `metadata-\u003e\u003e'last_heartbeat'`\n- Config fingerprinting for reproducibility\n- Derived `elapsed_ms` column\n- GIN index on metadata for fast heartbeat queries\n\n---\n\n### Metrics Exported\n\n#### Sinks Metrics (Tenant-Based System)\n\nSinks automatically register metrics to the global Prometheus registry:\n\n```promql\n# Total write attempts (counter)\nsink_writes_total{sink=\"bars|options|fundamentals|news\", status=\"success|failure\"}\n\n# Write latency (histogram)\nsink_write_latency_seconds{sink=\"bars|options|fundamentals|news\"}\n```\n\n#### StoreClient Metrics (Provider-Based Pipeline)\n\n```promql\n# Total bars written (counter)\nstore_bars_written_total{method=\"COPY|UPSERT\", status=\"success|failure\"}\n\n# Write latency (histogram)\nstore_bars_write_latency_seconds{method=\"COPY|UPSERT\"}\n```\n\n**Key Insight**: `method` label shows whether COPY (1000+ rows) or UPSERT (\u003c 1000 rows) was used, enabling performance tuning.\n\n#### JobRunTracker Metrics (Pipeline Audit)\n\n```promql\n# Total job runs tracked (counter)\nstore_job_runs_total{job_name=\"...\", provider=\"...\", mode=\"live|backfill\", status=\"started|success|failure|cancelled\"}\n\n# Job run duration (histogram)\nstore_job_runs_duration_seconds{job_name=\"...\", provider=\"...\", mode=\"live|backfill\", status=\"success|failure|cancelled\"}\n```\n\n**Key Insight**: Track job lifecycle from `status=\"started\"` through final status (`success`, `failure`, `cancelled`). Duration histogram only recorded on completion.\n\n**Scrape at**: `http://localhost:8081/metrics` (FastAPI control-plane)\n\n### Example: All Sinks\n\nSee [`examples/run_store_pipeline.py`](examples/run_store_pipeline.py) for a complete example using all four sinks.\n\n```powershell\n# Set environment variables\n$env:MDS_DSN=\"postgresql://user:pass@localhost:5432/marketdata\"\n$env:MDS_TENANT_ID=\"your-tenant-uuid\"\n\n# Run pipeline example\npython examples/run_store_pipeline.py\n```\n\n**Output:**\n```\n🚀 market_data_store Sink Pipeline Example\n   Tenant: 6b6a6a8a...\n\n📊 BarsSink Example\n  ✅ Wrote 2 bars\n📈 OptionsSink Example\n  ✅ Wrote 1 options\n📋 FundamentalsSink Example\n  ✅ Wrote 1 fundamentals\n📰 NewsSink Example\n  ✅ Wrote 1 news items\n\n✅ All sinks completed successfully!\n```\n\n### Benchmarks\n\nRun performance benchmarks with [`examples/benchmark_sinks.py`](examples/benchmark_sinks.py):\n\n```powershell\npython examples/benchmark_sinks.py --batches 50 --batch-size 1000 --parallel 4\n```\n\n**Example Results** (Mock mode, Windows):\n```\n========================================================================\nBenchmark Results (Phase 4.1)\n========================================================================\nBarsSink                 13,674 rec/s   avg latency   14.0 ms   total    2,000\nOptionsSink              12,899 rec/s   avg latency   14.2 ms   total    2,000\nFundamentalsSink         12,886 rec/s   avg latency   15.0 ms   total    2,000\nNewsSink                 12,947 rec/s   avg latency   14.9 ms   total    2,000\n========================================================================\n\nOverall: 8,000 records in 0.61s (13,093 rec/s aggregate)\n```\n\n### Migration from AsyncBatchProcessor\n\nIf you're currently using `mds_client.batch.AsyncBatchProcessor`:\n\n**Before:**\n```python\nfrom mds_client import AMDS, AsyncBatchProcessor, BatchConfig\n\nasync with AsyncBatchProcessor(amds, BatchConfig(max_rows=1000)) as processor:\n    for bar in stream:\n        await processor.add_bar(bar)\n```\n\n**After (with sinks):**\n```python\nfrom market_data_store.sinks import BarsSink\n\nasync with BarsSink(amds) as sink:\n    await sink.write(batch_of_bars)\n```\n\n**Key Differences:**\n- ✅ Sinks provide **automatic Prometheus metrics**\n- ✅ Sinks use **standardized logging** (loguru)\n- ⚠️ Sinks expect **pre-batched data** (no auto-flushing)\n- ⚠️ AsyncBatchProcessor provides **incremental adds + auto-flush**\n\n**When to Use:**\n- **Sinks**: Pre-batched data, need metrics/observability\n- **AsyncBatchProcessor**: Streaming data, need auto-batching\n\n### Testing\n\n```powershell\n# Unit tests (fast, no DB)\npytest -v tests/unit/sinks/\n\n# Smoke test\npython tests/smoke_test_sinks.py\n\n# Integration tests (requires DB)\n$env:MDS_DSN=\"postgresql://...\"\n$env:MDS_TENANT_ID=\"uuid\"\npytest -v tests/integration/ -m integration\n\n# All tests\npytest -v tests/\n```\n\n**Test Coverage:**\n- ✅ 12/12 unit tests passing (0.51s)\n- ✅ 6/6 smoke test checks passing\n- ✅ 0 linter errors\n- ✅ Integration tests ready (DB required)\n\n### Architecture\n\nThe sinks layer is part of the **hybrid architecture**:\n\n```\n┌─────────────────────────────────────────┐\n│ market-data-store (Hybrid)              │\n├─────────────────────────────────────────┤\n│ Control-plane (datastore/)              │\n│  • Migrations, policies, admin API      │\n│  • Health, readiness, metrics endpoints │\n├─────────────────────────────────────────┤\n│ Data-plane (market_data_store/sinks/)   │\n│  • BarsSink, OptionsSink, etc.          │\n│  • Prometheus metrics integration       │\n│  • Backpressure readiness (Phase 4.2+)  │\n├─────────────────────────────────────────┤\n│ Client library (mds_client/)            │\n│  • MDS (sync) + AMDS (async) facades    │\n│  • Connection pooling, RLS, validation  │\n└─────────────────────────────────────────┘\n```\n\n### Documentation\n\n- 📖 **Implementation Guide**: [`PHASE_4_IMPLEMENTATION.md`](PHASE_4_IMPLEMENTATION.md)\n- 📖 **Cursor Rules**: [`cursorrules/rules/sinks_layer.mdc`](cursorrules/rules/sinks_layer.mdc)\n- 📖 **Examples**: [`examples/`](examples/)\n\n### Roadmap\n\n| Phase | Status | Description |\n|-------|--------|-------------|\n| **4.1** | ✅ Complete | Async sinks with metrics |\n| **4.2** | ⏸️ Deferred | Write coordinator + queue |\n| **4.3** | 🚫 Blocked | Backpressure integration |\n\nPhase 4.2+ deferred pending architecture decisions and external dependencies (`market-data-pipeline` v0.8.0).\n\n---\n\n## 🔧 Windows/Docker Compatibility\n\nThis project includes comprehensive cross-platform compatibility for both Windows development and Linux/Docker production environments with **zero resource leaks** and **automatic cleanup**:\n\n### **Event Loop Configuration**\n- **Windows**: Automatically uses `WindowsSelectorEventLoopPolicy` for psycopg compatibility\n- **Linux/macOS**: Uses `uvloop` for enhanced performance when available\n- **Automatic**: No manual configuration required - just call `boot_event_loop()` early in your application\n\n### **Connection Pool Management**\n- **Context managers**: All clients support `with` and `async with` for automatic cleanup\n- **Zero pool warnings**: Explicit pool lifecycle management eliminates cleanup warnings\n- **Resource management**: Proper timeout-based cleanup prevents hanging threads\n- **Production ready**: Guaranteed clean shutdown in all scenarios\n\n### **Production Features**\n- **Health monitoring**: Comprehensive database health checks and Prometheus metrics\n- **Resource management**: Centralized resource cleanup with context managers\n- **CLI tools**: Cross-platform command-line interface with health and metrics commands\n- **Performance optimized**: 200+ bars/second processing with clean resource management\n\n### **Usage Examples**\n\n**Sync Client with Context Manager:**\n```python\nfrom mds_client.runtime import boot_event_loop\nfrom mds_client import MDS\n\nboot_event_loop()  # Configure event loop for your platform\n\nwith MDS({'dsn': 'postgresql://...', 'tenant_id': '...'}) as mds:\n    result = mds.upsert_bars([bar])\n    # Pool automatically closed on exit - NO warnings!\n```\n\n**Async Client with Context Manager:**\n```python\nfrom mds_client.runtime import boot_event_loop\nfrom mds_client import AMDS\n\nboot_event_loop()  # Configure event loop for your platform\n\nasync with AMDS({'dsn': 'postgresql://...', 'tenant_id': '...'}) as amds:\n    result = await amds.upsert_bars([bar])\n    # Pool automatically closed on exit - NO warnings!\n```\n\n**Batch Processing with Context Manager:**\n```python\nfrom mds_client.batch import BatchProcessor, BatchConfig\n\nwith MDS({'dsn': 'postgresql://...', 'tenant_id': '...'}) as mds:\n    with BatchProcessor(mds, BatchConfig(max_rows=100)) as processor:\n        processor.add_bar(bar)\n        # Both mds and processor automatically closed on exit - NO warnings!\n```\n\n**Health Monitoring:**\n```bash\n# CLI Health Check\nmds health --dsn \"postgresql://...\" --tenant-id \"...\"\n\n# CLI Metrics\nmds metrics --format prometheus\n```\n\n### **Performance Benchmarks**\n- **Sync processing**: 84+ bars/second with clean shutdown\n- **Async processing**: 77+ bars/second with clean shutdown\n- **Batch processing**: 200+ bars/second with clean shutdown\n- **Zero resource leaks**: All pools properly closed with context managers\n\n### Testing Quickstart\n```bash\n# Run NDJSON round-trip tests\npytest -q tests/test_ndjson_roundtrip.py\n\n# Run all tests (cross-platform)\npytest tests/ -v\n\n# Run Windows compatibility tests\npytest tests/test_windows_compatibility.py -v\n```\n\n### Development Commands\n```bash\n# Format and lint code\nmake fmt\nmake lint\n\n# Run tests\nmake test\n\n# Database operations\nmake migrate\nmake seed\nmake policies\n```\n\n### Database Setup\n\n**Option 1: Using Docker initdb (Recommended for fresh setup)**\n```powershell\n# The schema will be automatically applied when the database container starts\n# if using docker-compose with the initdb.d scripts\n```\n\n**Option 2: Manual setup**\n```powershell\n# Run migrations (for existing databases)\npython -m datastore.cli migrate\n\n# Apply seed data\npython -m datastore.cli seed\n\n# Apply TimescaleDB policies (optional)\npython -m datastore.cli policies\n```\n\n**Option 3: Fresh schema setup**\n```powershell\n# For a completely fresh database, you can use the production schema directly\n# After a fresh initdb bootstrap, stamp Alembic to prevent migration conflicts:\npython -m datastore.cli stamp-head\n\n# See DATABASE_SETUP.md for detailed instructions\n```\n\n## 📦 Client Library Usage\n\nThe `mds_client` library provides production-ready APIs for Market Data Core:\n\n### For Market Data Core (Async)\n```python\nfrom mds_client import AMDS, Bar\n\n# Configuration with tenant isolation\namds = AMDS({\n    \"dsn\": \"postgresql://user:pass@host:port/db?options=-c%20app.tenant_id%3D\u003cuuid\u003e\",\n    \"pool_max\": 10\n})\n\n# Write market data\nawait amds.upsert_bars([Bar(\n    tenant_id=\"uuid\", vendor=\"ibkr\", symbol=\"AAPL\", timeframe=\"1m\",\n    ts=now, close_price=150.5, volume=1000\n)])\n\n# Get latest prices for hot cache\nprices = await amds.latest_prices([\"AAPL\", \"MSFT\"], vendor=\"ibkr\")\n```\n\n### For Operations (Sync CLI)\n```bash\n# Health check\nmds ping --dsn \"postgresql://...\" --tenant-id \"uuid\"\n\n# Write data\nmds write-bar --dsn \"...\" --tenant-id \"uuid\" --vendor \"ibkr\" \\\n  --symbol \"AAPL\" --timeframe \"1m\" --ts \"2024-01-01T10:00:00\" \\\n  --close-price 150.5\n\n# Get latest prices\nmds latest-prices --dsn \"...\" --vendor \"ibkr\" --symbols \"AAPL,MSFT\"\n\n# Environment variable support\nexport MDS_DSN=\"postgresql://user:pass@host:port/db\"\nexport MDS_TENANT_ID=\"uuid-string\"\nmds ping  # Uses env vars automatically\n```\n\n## 🔗 Market Data Store Integration\n\nThe `market-data-store` package is a **core dependency** for Market Data Core, providing:\n\n### 📦 **Python Package Integration**\n```python\n# Import the market data store package\nimport market_data_store\n\n# Access version information\nprint(f\"Market Data Store version: {market_data_store.__version__}\")\n\n# The package provides access to all CLI operations and Python library\nfrom mds_client import MDS, AMDS, Bar, Fundamentals, News, OptionSnap\n```\n\n### 🛠️ **Available Operations**\n\nThe `market-data-store` package provides comprehensive data persistence capabilities:\n\n#### **Data Types Supported**\n- **Bars/OHLCV**: Time-series price data with multiple timeframes\n- **Fundamentals**: Company financial data (assets, liabilities, earnings)\n- **News**: Market news with sentiment analysis\n- **Options**: Options market data with Greeks (delta, gamma, IV)\n\n#### **CLI Operations** (via `mds` command)\n```bash\n# Health \u0026 Schema\nmds ping                    # Database connectivity check\nmds schema-version         # Get current schema version\nmds latest-prices          # Get latest prices for symbols\n\n# Individual Write Operations\nmds write-bar              # Write single OHLCV bar\nmds write-fundamental      # Write company fundamentals\nmds write-news             # Write news article\nmds write-option           # Write options data\n\n# Bulk Operations\nmds ingest-ndjson          # Bulk ingest from NDJSON files\nmds ingest-ndjson-async    # Async bulk ingest\n\n# Export/Import Operations\nmds dump                    # Export to CSV\nmds restore                 # Import from CSV\nmds restore-async           # Async CSV import\nmds dump-ndjson             # Export to NDJSON\nmds dump-ndjson-async       # Async NDJSON export\n\n# Job Queue Operations\nmds enqueue-job             # Queue background jobs\n```\n\n#### **Python Library Usage**\n```python\n# Synchronous operations\nfrom mds_client import MDS\nmds = MDS({\"dsn\": \"postgresql://...\", \"tenant_id\": \"uuid\"})\n\n# Write market data\nmds.upsert_bars([bar_data])\nmds.upsert_fundamentals([fundamental_data])\nmds.upsert_news([news_data])\nmds.upsert_options([option_data])\n\n# Read operations\nlatest_prices = mds.latest_prices([\"AAPL\", \"MSFT\"], vendor=\"ibkr\")\n\n# Async operations\nfrom mds_client import AMDS, AsyncBatchProcessor\namds = AMDS({\"dsn\": \"postgresql://...\", \"tenant_id\": \"uuid\", \"pool_max\": 10})\n```\n\n### 🏗️ **Architecture Benefits**\n\n- **Tenant Isolation**: Row Level Security (RLS) ensures data separation\n- **TimescaleDB Integration**: Optimized for time-series data\n- **Connection Pooling**: High-performance async/sync connection management\n- **Batch Processing**: Efficient bulk operations with configurable batching\n- **Idempotent Operations**: Safe retry and upsert semantics\n- **Production Ready**: Comprehensive error handling, logging, and monitoring\n\n### 📋 **Quick Reference**\n\nFor detailed operation documentation, see:\n- **CLI Operations**: [`cursorrules/rules/market_data_store_operations.mdc`](cursorrules/rules/market_data_store_operations.mdc)\n- **Python Library**: [`src/mds_client/`](src/mds_client/)\n- **Data Models**: [`src/mds_client/models.py`](src/mds_client/models.py)\n\n## 📚 Client Library Documentation\n\n### 🏗️ Architecture Overview\n\nThe `mds_client` library provides a production-ready Python client for Market Data Core with two main facades:\n\n- **`MDS`** - Synchronous client for operations and simple integrations\n- **`AMDS`** - Asynchronous client for high-performance Market Data Core\n\nBoth clients support:\n- **Row Level Security (RLS)** with tenant isolation via DSN options or context managers\n- **Connection pooling** with psycopg 3 + psycopg_pool (ConnectionPool/AsyncConnectionPool)\n- **TimescaleDB integration** with time-first composite primary keys and idempotent upserts\n- **Statement timeouts** with per-connection configuration\n- **Structured error handling** with psycopg error mapping and retry logic\n- **Job outbox pattern** with idempotency key support\n- **Performance optimization** with multiple write modes: `executemany` (default), `execute_values` (sync), and `COPY` (fastest)\n\n### 📊 Data Models\n\nThe library provides strict Pydantic models for all market data types:\n\n#### [`Bar`](src/mds_client/models.py#L15-L35) - OHLCV Market Data\n```python\nclass Bar(BaseModel):\n    tenant_id: str                    # UUID for tenant isolation (tenants.id)\n    vendor: str                       # Data provider (e.g., \"ibkr\", \"alpha_vantage\")\n    symbol: str                       # Trading symbol (auto-uppercased)\n    timeframe: str                    # Time aggregation (\"1m\", \"5m\", \"1h\", \"1d\")\n    ts: datetime                      # Timestamp (UTC)\n    open_price: Optional[float] = None\n    high_price: Optional[float] = None\n    low_price: Optional[float] = None\n    close_price: Optional[float] = None\n    volume: Optional[int] = None\n    id: Optional[str] = None          # UUID (not globally unique)\n```\n\n#### [`Fundamentals`](src/mds_client/models.py#L38-L50) - Company Financials\n```python\nclass Fundamentals(BaseModel):\n    tenant_id: str                    # UUID for tenant isolation (tenants.id)\n    vendor: str\n    symbol: str\n    asof: datetime                    # As-of date for financial data\n    total_assets: Optional[float] = None\n    total_liabilities: Optional[float] = None\n    net_income: Optional[float] = None\n    eps: Optional[float] = None       # Earnings per share\n    id: Optional[str] = None\n```\n\n#### [`News`](src/mds_client/models.py#L53-L66) - Market News \u0026 Sentiment\n```python\nclass News(BaseModel):\n    tenant_id: str                    # UUID for tenant isolation (tenants.id)\n    vendor: str\n    published_at: datetime            # Publication timestamp\n    title: str                        # News headline\n    id: Optional[str] = None\n    symbol: Optional[str] = None      # Related symbol (if applicable)\n    url: Optional[str] = None         # Source URL\n    sentiment_score: Optional[float] = None  # -1.0 to 1.0 sentiment\n```\n\n#### [`OptionSnap`](src/mds_client/models.py#L69-L90) - Options Market Data\n```python\nclass OptionSnap(BaseModel):\n    tenant_id: str                    # UUID for tenant isolation (tenants.id)\n    vendor: str\n    symbol: str\n    expiry: date                      # Option expiration date\n    option_type: str                  # \"C\" for call, \"P\" for put\n    strike: float                     # Strike price\n    ts: datetime                      # Snapshot timestamp\n    iv: Optional[float] = None        # Implied volatility\n    delta: Optional[float] = None     # Option delta\n    gamma: Optional[float] = None     # Option gamma\n    oi: Optional[int] = None          # Open interest\n    volume: Optional[int] = None      # Trading volume\n    spot: Optional[float] = None      # Underlying spot price\n    id: Optional[str] = None\n```\n\n#### [`LatestPrice`](src/mds_client/models.py#L93-L100) - Real-time Price Snapshots\n```python\nclass LatestPrice(BaseModel):\n    tenant_id: str                    # UUID for tenant isolation (tenants.id)\n    vendor: str\n    symbol: str\n    price: float                      # Latest price\n    price_timestamp: datetime         # When price was recorded\n```\n\n### 🔧 Configuration\n\n#### Client Configuration\n```python\n# MDS (sync) configuration with performance tuning\nmds = MDS({\n    \"dsn\": \"postgresql://user:pass@host:port/db?options=-c%20app.tenant_id%3D\u003cuuid\u003e\",\n    \"tenant_id\": \"uuid-string\",        # Optional: overrides DSN tenant_id\n    \"app_name\": \"mds_client\",          # Application name for pg_stat_activity\n    \"connect_timeout\": 10.0,           # Connection timeout in seconds\n    \"statement_timeout_ms\": 30000,     # Query timeout in milliseconds\n    \"pool_min\": 1,                     # Minimum connections in pool\n    \"pool_max\": 10,                    # Maximum connections in pool\n    # Performance optimization settings\n    \"write_mode\": \"auto\",              # \"auto\" | \"executemany\" | \"values\" | \"copy\"\n    \"values_min_rows\": 500,           # Use execute_values for \u003e= N rows\n    \"values_page_size\": 1000,         # Page size for execute_values\n    \"copy_min_rows\": 5000,            # Use COPY for \u003e= N rows\n})\n\n# AMDS (async) configuration\namds = AMDS({\n    \"dsn\": \"postgresql://user:pass@host:port/db\",\n    \"tenant_id\": \"uuid-string\",\n    \"app_name\": \"mds_client_async\",\n    \"pool_max\": 10,                    # Async pool typically larger\n    \"write_mode\": \"auto\",              # \"auto\" | \"executemany\" | \"copy\"\n    \"copy_min_rows\": 5000,            # Use COPY for \u003e= N rows\n})\n```\n\n### 🚀 API Reference\n\n#### Synchronous Client (`MDS`)\n\n**Connection \u0026 Health:**\n- [`health()`](src/mds_client/client.py) - Check database connectivity\n- [`schema_version()`](src/mds_client/client.py) - Get current schema version\n- [`close()`](src/mds_client/client.py) - Close connection pool\n\n**Write Operations (Idempotent Upserts):**\n- [`upsert_bars(rows: Sequence[Bar])`](src/mds_client/client.py) - Insert/update OHLCV data with time-first PKs\n- [`upsert_fundamentals(rows: Sequence[Fundamentals])`](src/mds_client/client.py) - Insert/update financial data\n- [`upsert_news(rows: Sequence[News])`](src/mds_client/client.py) - Insert/update news data (auto-generates UUID if missing)\n- [`upsert_options(rows: Sequence[OptionSnap])`](src/mds_client/client.py) - Insert/update options data\n\n**Read Operations:**\n- [`latest_prices(symbols: Sequence[str], vendor: str)`](src/mds_client/client.py) - Get latest prices for symbols\n- [`bars_window(symbol, timeframe, start, end, vendor)`](src/mds_client/client.py) - Get bars in time window\n\n**Job Operations:**\n- [`enqueue_job(idempotency_key, job_type, payload, priority)`](src/mds_client/client.py) - Enqueue job with idempotency\n\n#### Asynchronous Client (`AMDS`)\n\nThe async client provides identical methods with `async`/`await` syntax:\n\n- [`async health()`](src/mds_client/aclient.py) - Async health check\n- [`async schema_version()`](src/mds_client/aclient.py) - Async schema version\n- [`async aclose()`](src/mds_client/aclient.py) - Close async connection pool\n- [`async upsert_bars(rows)`](src/mds_client/aclient.py) - Async bar upserts\n- [`async upsert_fundamentals(rows)`](src/mds_client/aclient.py) - Async fundamentals upserts\n- [`async upsert_news(rows)`](src/mds_client/aclient.py) - Async news upserts\n- [`async upsert_options(rows)`](src/mds_client/aclient.py) - Async options upserts\n- [`async latest_prices(symbols, vendor)`](src/mds_client/aclient.py) - Async price queries\n- [`async bars_window(symbol, timeframe, start, end, vendor)`](src/mds_client/aclient.py) - Async bar queries\n- [`async enqueue_job(...)`](src/mds_client/aclient.py) - Async job enqueueing\n\n### 🔒 Row Level Security (RLS)\n\nThe library automatically handles tenant isolation through PostgreSQL's Row Level Security:\n\n#### DSN Options (Recommended)\n```python\n# Tenant ID embedded in connection string\ndsn = \"postgresql://user:pass@host:port/db?options=-c%20app.tenant_id%3D\u003cuuid\u003e\"\nmds = MDS({\"dsn\": dsn})\n```\n\n#### Context Manager (Fallback)\n```python\n# Explicit tenant context for operations (if not using DSN options)\n# Note: Current implementation uses SET app.tenant_id per connection\n# Context managers would be implemented in rls.py if needed\n```\n\n### ⚠️ Error Handling\n\nThe library provides structured error handling with automatic retry logic:\n\n#### [`MDSOperationalError`](src/mds_client/errors.py) - Base operational error\n#### [`RetryableError`](src/mds_client/errors.py) - Temporary errors (network, deadlocks, serialization failures)\n#### [`ConstraintViolation`](src/mds_client/errors.py) - Database constraint violations (unique, foreign key, check)\n#### [`RLSDenied`](src/mds_client/errors.py) - Row Level Security policy violations\n#### [`TimeoutExceeded`](src/mds_client/errors.py) - Query or connection timeouts\n\nAll errors are automatically mapped from `psycopg.errors` exceptions for precise error handling.\n\n### 🛠️ Operational CLI\n\nThe library includes a comprehensive CLI for operations and debugging:\n\n\u003e **Exit Codes**: All CLI commands return non-zero exit codes on failure for CI/CD integration.\n\n```bash\n# Health and connectivity\nmds ping --dsn \"postgresql://...\" --tenant-id \"uuid\"\n\n# Schema information\nmds schema-version --dsn \"postgresql://...\"\n\n# Write operations\nmds write-bar --dsn \"...\" --tenant-id \"uuid\" --vendor \"ibkr\" \\\n  --symbol \"AAPL\" --timeframe \"1m\" --ts \"2024-01-01T10:00:00\" \\\n  --close-price 150.5 --volume 1000\n\nmds write-fundamental --dsn \"...\" --tenant-id \"uuid\" --vendor \"alpha\" \\\n  --symbol \"AAPL\" --asof \"2024-01-01\" --eps 1.25\n\nmds write-news --dsn \"...\" --tenant-id \"uuid\" --vendor \"reuters\" \\\n  --title \"AAPL Reports Strong Q4\" --published-at \"2024-01-01T10:00:00\" \\\n  --symbol \"AAPL\" --sentiment-score 0.8\n\nmds write-option --dsn \"...\" --tenant-id \"uuid\" --vendor \"ibkr\" \\\n  --symbol \"AAPL\" --expiry \"2024-12-20\" --option-type \"C\" --strike 200 \\\n  --ts \"2024-01-01T10:00:00\" --iv 0.25 --delta 0.55\n# Note: write-option targets the options_snap table (model: OptionSnap)\n\n# Read operations\nmds latest-prices --dsn \"...\" --vendor \"ibkr\" --symbols \"AAPL,MSFT\"\n\n# Job queue operations\nmds enqueue-job --dsn \"...\" --tenant-id \"uuid\" \\\n  --idempotency-key \"job-123\" --job-type \"backfill\" \\\n  --payload '{\"symbol\": \"AAPL\", \"start\": \"2024-01-01\"}' --priority \"high\"\n\n# Sync NDJSON ingest (stdin or file, .gz supported)\nmds ingest-ndjson bars ./bars.ndjson \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --max-rows 2000 --max-ms 3000\n\n# Or from stdin\ncat bars.ndjson | mds ingest-ndjson bars - \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\"\n\n# Async NDJSON ingest (uses AMDS + AsyncBatchProcessor)\nmds ingest-ndjson-async bars ./bars.ndjson \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --max-rows 2000 --max-ms 3000\n\n# Backup/Export operations (tenant-aware, RLS-enforced)\nmds dump bars ./bars_export.csv.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor \"ibkr\" --symbol \"AAPL\" --timeframe \"1m\" \\\n  --start \"2024-01-01T00:00:00Z\" --end \"2024-02-01T00:00:00Z\"\n\n# Restore/Import operations (idempotent upserts)\n# Sync CSV restore\nmds restore bars ./bars_export.csv.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\"\n\n# Async CSV restore (for large files)\nmds restore-async bars ./bars_export.csv.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --delimiter \",\" --header\n\n# Async CSV restore from STDIN (shell pipelines)\nzcat bars_export.csv.gz | mds restore-async-stdin bars \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\"\n\n# NDJSON dump operations (round-trip with ingest-ndjson)\nmds dump-ndjson bars ./bars_export.ndjson.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor \"ibkr\" --symbol \"AAPL\" --timeframe \"1m\" \\\n  --start \"2024-01-01T00:00:00Z\" --end \"2024-02-01T00:00:00Z\"\n\n# Async NDJSON dump for large exports\nmds dump-ndjson-async bars ./bars_export.ndjson.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor \"ibkr\" --symbol \"AAPL\"\n```\n\n### ⚡ Performance Optimization\n\nThe library provides multiple write modes for optimal performance across different batch sizes:\n\n#### Write Mode Selection (Auto)\n```python\n# Automatic mode selection based on batch size\nmds = MDS({\n    \"write_mode\": \"auto\",              # Default: intelligent selection\n    \"values_min_rows\": 500,           # Use execute_values for \u003e= 500 rows\n    \"copy_min_rows\": 5000,            # Use COPY for \u003e= 5000 rows\n})\n\n# Behavior:\n# len(rows) \u003e= 5000 → COPY (fastest, sync + async)\n# len(rows) \u003e= 500  → execute_values (fast, sync only)\n# len(rows) \u003c 500   → executemany (safe default)\n```\n\n#### Manual Mode Selection\n```python\n# Force specific write modes\nmds = MDS({\"write_mode\": \"executemany\"})  # Always use executemany\nmds = MDS({\"write_mode\": \"values\"})       # Force execute_values (sync only)\nmds = MDS({\"write_mode\": \"copy\"})         # Force COPY path\n```\n\n#### Environment Variable Configuration\n```bash\n# Set via environment variables\nexport MDS_WRITE_MODE=auto\nexport MDS_VALUES_MIN_ROWS=500\nexport MDS_VALUES_PAGE_SIZE=1000\nexport MDS_COPY_MIN_ROWS=5000\n```\n\n#### Environment Variables Reference\n| Var | Meaning |\n|-----|---------|\n| `MDS_DSN` | PostgreSQL DSN |\n| `MDS_TENANT_ID` | Tenant UUID for RLS (must be tenants.id, not tenants.tenant_id) |\n| `MDS_WRITE_MODE` | `auto` \\| `executemany` \\| `values` \\| `copy` |\n| `MDS_VALUES_MIN_ROWS` | Threshold for execute_values |\n| `MDS_COPY_MIN_ROWS` | Threshold for COPY |\n\n#### Performance Characteristics\n- **`executemany`**: Safe default, good for small batches (\u003c 500 rows)\n- **`execute_values`**: Fast for mid-size batches (500-5000 rows), sync only\n  - Install extras: `pip install \"psycopg[pool,extras]\"`\n- **`COPY`**: Fastest for large batches (5000+ rows), works with RLS and maintains idempotency\n\n#### Troubleshooting\n- **Tenant ID errors**: Use `tenants.id` (UUID), not `tenants.tenant_id` (VARCHAR)\n- **Windows async issues**: Use sync `MDS` client; async pools need `SelectorEventLoop`\n- **Foreign key violations**: Ensure tenant exists in `tenants` table with correct UUID\n- **RLS denied**: Verify `app.tenant_id` is set correctly in connection context\n\n### 💾 Backup \u0026 Restore Operations\n\nThe library provides tenant-aware backup and restore operations using PostgreSQL's `COPY` command:\n\n#### Export Operations (Tenant-Aware Dumps)\n```python\nfrom mds_client import MDS\nfrom psycopg import sql as psql\n\nmds = MDS({\"dsn\": \"...\", \"tenant_id\": \"uuid\"})\n\n# Export bars for specific vendor/symbol/timeframe\nsel = psql.SQL(\"\"\"\n    SELECT {cols}\n    FROM bars\n    WHERE vendor = {v} AND symbol = {s} AND timeframe = '1m'\n      AND ts \u003e= {start} AND ts \u003c {end}\n    ORDER BY ts\n\"\"\").format(\n    cols=psql.SQL(\", \").join(psql.Identifier(c) for c in mds.TABLE_PRESETS[\"bars\"][\"cols\"]),\n    v=psql.Literal(\"ibkr\"),\n    s=psql.Literal(\"AAPL\"),\n    start=psql.Literal(\"2024-01-01T00:00:00Z\"),\n    end=psql.Literal(\"2024-02-01T00:00:00Z\"),\n)\n\n# Export to gzipped CSV\nmds.copy_out_csv(select_sql=sel, out_path=\"bars_aapl_2024-01.csv.gz\")\n```\n\n#### Import Operations (Idempotent Upserts)\n```python\n# Restore from CSV with upsert semantics\npreset = MDS.TABLE_PRESETS[\"bars\"]\nmds.copy_restore_csv(\n    target=\"bars\",\n    cols=preset[\"cols\"],\n    conflict_cols=preset[\"conflict\"],\n    update_cols=preset[\"update\"],\n    src_path=\"bars_aapl_2024-01.csv.gz\",\n)\n```\n\n#### CLI Operations\n```bash\n# Export with filters\nmds dump bars ./bars_export.csv.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor \"ibkr\" --symbol \"AAPL\" --timeframe \"1m\" \\\n  --start \"2024-01-01T00:00:00Z\" --end \"2024-02-01T00:00:00Z\"\n\n# Import with upsert (sync)\nmds restore bars ./bars_export.csv.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\"\n\n# Import with upsert (async - for large files)\nmds restore-async bars ./bars_export.csv.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --delimiter \",\" --header\n\n# Import from STDIN (shell pipelines)\nzcat bars_export.csv.gz | mds restore-async-stdin bars \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\"\n```\n\n#### Key Features\n- **RLS Enforcement**: All operations respect Row Level Security via `SET app.tenant_id`\n- **Consistent Snapshots**: Multiple `COPY` operations in single transaction\n- **Idempotent Restores**: `INSERT ... ON CONFLICT DO UPDATE` preserves existing data\n- **Gzip Support**: Automatic compression for `.gz` files\n- **CSV with Headers**: Self-describing format for easy inspection\n- **Streaming**: Memory-efficient for large datasets\n\n### 📄 NDJSON Export Operations\n\nThe library provides NDJSON export functionality that perfectly round-trips with the existing ingest commands:\n\n#### Export Operations (JSON Streaming)\n```python\nfrom mds_client import MDS\nfrom psycopg import sql as psql\n\nmds = MDS({\"dsn\": \"...\", \"tenant_id\": \"uuid\"})\n\n# Export bars as NDJSON\nsel = psql.SQL(\"\"\"\n    SELECT {cols}\n    FROM bars\n    WHERE vendor = {v} AND symbol = {s} AND timeframe = '1m'\n      AND ts \u003e= {start} AND ts \u003c {end}\n    ORDER BY ts\n\"\"\").format(\n    cols=psql.SQL(\", \").join(psql.Identifier(c) for c in mds.TABLE_PRESETS[\"bars\"][\"cols\"]),\n    v=psql.Literal(\"ibkr\"),\n    s=psql.Literal(\"AAPL\"),\n    start=psql.Literal(\"2024-01-01T00:00:00Z\"),\n    end=psql.Literal(\"2024-02-01T00:00:00Z\"),\n)\n\n# Export to gzipped NDJSON\nmds.copy_out_ndjson(select_sql=sel, out_path=\"bars_aapl_2024-01.ndjson.gz\")\n```\n\n#### CLI Operations\n```bash\n# Sync NDJSON export\nmds dump-ndjson bars ./bars_export.ndjson.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor \"ibkr\" --symbol \"AAPL\" --timeframe \"1m\" \\\n  --start \"2024-01-01T00:00:00Z\" --end \"2024-02-01T00:00:00Z\"\n\n# Async NDJSON export for large datasets\nmds dump-ndjson-async bars ./bars_export.ndjson.gz \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor \"ibkr\" --symbol \"AAPL\"\n\n# Round-trip: export then import\nmds dump-ndjson bars ./bars.ndjson --dsn \"...\" --tenant-id \"uuid\"\nmds ingest-ndjson bars ./bars.ndjson --dsn \"...\" --tenant-id \"uuid\"\n\n# Multi-table exports with template naming\nmds dump-ndjson-all \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor ibkr --symbol AAPL --timeframe 1m \\\n  --start 2024-01-01T00:00:00Z --end 2024-02-01T00:00:00Z\n\n# Custom naming template with directory structure\nmds dump-ndjson-all \"./exports/{table}/{vendor}-{symbol}-{start}-{end}.ndjson.gz\" \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" --vendor ibkr\n\n# Async multi-table export for large datasets\nmds dump-ndjson-async-all \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --start 2024-01-01 --end 2024-02-01\n```\n\n#### Key Features\n- **Round-trip compatibility**: Perfect compatibility with `ingest-ndjson` commands\n- **JSON streaming**: Uses `to_jsonb()` for efficient PostgreSQL JSON serialization\n- **RLS enforcement**: All operations respect tenant isolation\n- **Gzip support**: Automatic compression for `.ndjson.gz` files\n- **Async support**: High-performance async exports for large datasets\n- **ISO timestamps**: Timestamps serialized in ISO-8601 format for clean parsing\n- **Multi-table exports**: Export all tables at once with template-based naming\n- **Template system**: Flexible file naming with variables `{table}`, `{vendor}`, `{symbol}`, `{timeframe}`, `{start}`, `{end}`\n- **Directory creation**: Automatic parent directory creation for organized exports\n\n#### Multi-Table Export Operations\n```bash\n# Export all tables with default naming\nmds dump-ndjson-all \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --vendor ibkr --symbol AAPL --timeframe 1m \\\n  --start 2024-01-01T00:00:00Z --end 2024-02-01T00:00:00Z\n# Creates: ./bars-AAPL-2024-01-01T00:00:00Z-2024-02-01T00:00:00Z.ndjson.gz\n#          ./fundamentals-AAPL-2024-01-01T00:00:00Z-2024-02-01T00:00:00Z.ndjson.gz\n#          ./news-AAPL-2024-01-01T00:00:00Z-2024-02-01T00:00:00Z.ndjson.gz\n#          ./options_snap-AAPL-2024-01-01T00:00:00Z-2024-02-01T00:00:00Z.ndjson.gz\n# Note: {timeframe} is ignored for tables without that column (fundamentals/news/options)\n\n# Custom template with directory structure\nmds dump-ndjson-all \"./exports/{table}/{vendor}-{symbol}-{start}-{end}.ndjson.gz\" \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" --vendor ibkr\n# Creates: ./exports/bars/ibkr-AAPL-2024-01-01-2024-02-01.ndjson.gz\n#          ./exports/fundamentals/ibkr-AAPL-2024-01-01-2024-02-01.ndjson.gz\n#          etc.\n\n# Async version for large datasets\nmds dump-ndjson-async-all \\\n  --dsn \"postgresql://...\" --tenant-id \"uuid\" \\\n  --start 2024-01-01 --end 2024-02-01\n```\n\n#### Template Variables\n- `{table}`: Table name (bars, fundamentals, news, options_snap)\n- `{vendor}`: Data vendor (e.g., ibkr, reuters) or \"ALL\" if not specified\n- `{symbol}`: Symbol (e.g., AAPL) or \"ALL\" if not specified\n- `{timeframe}`: Timeframe (e.g., 1m, 1h) or \"ALL\" if not specified\n- `{start}`: Start timestamp or \"MIN\" if not specified\n- `{end}`: End timestamp or \"MAX\" if not specified\n\n### 🔄 Batch Processing\n\nFor high-throughput scenarios, the library supports both sync and async batch processing:\n\n#### Sync Batch Processing\n```python\nfrom mds_client import MDS, BatchProcessor, BatchConfig, Bar\n\nmds = MDS({\"dsn\": \"...\", \"tenant_id\": \"...\"})\nbp = BatchProcessor(mds, BatchConfig(max_rows=1000, max_ms=5000))\nfor bar in big_set:\n    bp.add_bar(bar)\nbp.flush()\n```\n\n#### Async Batch Processing\n```python\nfrom mds_client import AMDS, AsyncBatchProcessor, BatchConfig, Bar\n\namds = AMDS({\"dsn\": \"...\", \"tenant_id\": \"...\", \"pool_max\": 10})\nasync with AsyncBatchProcessor(amds, BatchConfig(max_rows=1000, max_ms=5000)) as bp:\n    for bar in big_set:\n        await bp.add_bar(bar)\n# Auto-flush on context exit\n```\n\n### Key Features\n- **Dual API**: Sync (`MDS`) and async (`AMDS`) facades with identical interfaces\n- **RLS Integration**: Automatic tenant isolation via DSN options or per-connection SET\n- **TimescaleDB Compatible**: Time-first composite primary keys with idempotent upserts\n- **Connection Pooling**: Production-ready with psycopg 3 + psycopg_pool\n- **Performance Optimization**: Multiple write modes with automatic selection:\n  - `executemany`: Safe default for small batches\n  - `execute_values`: Fast mid-size batches (sync only, requires psycopg extras)\n  - `COPY`: Fastest for large batches (sync + async)\n- **Batch Processing**: High-throughput ingestion with byte-accurate sizing and auto-flush tickers\n- **Structured Errors**: Comprehensive exception hierarchy with psycopg error mapping\n- **Environment Variables**: CLI support for MDS_DSN and MDS_TENANT_ID\n- **NDJSON Support**: Gzip compression, stdin input, and model coercion\n- **Job Outbox**: Idempotent job enqueueing with conflict-free guarantees\n- **Backup/Restore**: Tenant-aware CSV export/import with RLS enforcement and idempotent upserts\n- **NDJSON Export**: Round-trip compatible JSON dumps with `to_jsonb()` streaming\n\n### Dependencies\n\n- **Production**: [`requirements.txt`](requirements.txt) - Core runtime dependencies\n- **Development**: [`requirements-dev.txt`](requirements-dev.txt) - Includes dev tools (ruff, black, pre-commit)\n- **Project Config**: [`pyproject.toml`](pyproject.toml) - Full project metadata and build configuration\n\n\u003e **Cursor**: You can regenerate this section automatically whenever the folder structure changes. The `/cursorrules/` directory is your home base for self-bootstrapping rules and automation.\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Contributing\n\nContributions welcome! Please open an issue or submit a pull request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmjdevaccount%2Fmarket-data-store","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmjdevaccount%2Fmarket-data-store","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmjdevaccount%2Fmarket-data-store/lists"}