{"id":34975145,"url":"https://github.com/javier/equities-data-generator","last_synced_at":"2026-05-21T01:39:28.426Z","repository":{"id":325062672,"uuid":"1093487244","full_name":"javier/equities-data-generator","owner":"javier","description":"generates equities demo data for questdb demos","archived":false,"fork":false,"pushed_at":"2025-11-25T16:35:54.000Z","size":121,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-28T06:58:11.263Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/javier.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-10T12:48:05.000Z","updated_at":"2025-11-25T16:35:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/javier/equities-data-generator","commit_stats":null,"previous_names":["javier/equities-data-generator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/javier/equities-data-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Fequities-data-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Fequities-data-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Fequities-data-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Fequities-data-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/javier","download_url":"https://codeload.github.com/javier/equities-data-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Fequities-data-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28065691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-26T02:00:06.189Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-27T00:01:15.250Z","updated_at":"2026-05-21T01:39:28.418Z","avatar_url":"https://github.com/javier.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Equities Synthetic Data Generator for QuestDB (TIMESTAMP_NS)\n\nThis script  generates realistic U.S. equities with\nmulti-venue L2 books and trades. Arrays appear **only** in the L2 source table;\nall views publish scalars (top of book, NBBO, OHLCV). Everything uses\n**TIMESTAMP_NS**, and the writer emits **nanosecond** timestamps.\n\nIt supports:\n- **Modes:** `real-time` (wall-clock) and `faster-than-life` (max throughput)\n- **Session pacing in both modes** with a demo switch to allow off-session trading:\n  `--offsession_trades=none|trickle|full`\n- **Backpressure control** using WAL lag (pause/resume)\n- **Yahoo seeding** for symbol price brackets (with safe fallbacks)\n- **Suffixing/TTL** consistent with your FX setup\n\n---\n\n## Schema (arrays only in the L2 table)\n\n```sql\n-- L2 snapshots per venue\nCREATE TABLE IF NOT EXISTS ${PREFIX}equities_market_data (\n  timestamp TIMESTAMP_NS,\n  symbol       SYMBOL CAPACITY 256,\n  venue        SYMBOL CAPACITY 32,\n  bids         DOUBLE[][],   -- [2 x N]: row 1 prices, row 2 sizes\n  asks         DOUBLE[][]    -- [2 x N]: row 1 prices, row 2 sizes\n)\ntimestamp(timestamp)\nPARTITION BY HOUR;\n\n-- Executed trades\nCREATE TABLE IF NOT EXISTS ${PREFIX}equities_trades (\n  timestamp TIMESTAMP_NS,\n  symbol       SYMBOL CAPACITY 256,\n  venue        SYMBOL CAPACITY 32,\n  price        DOUBLE,\n  size         LONG,\n  side         SYMBOL,       -- \"B\" buyer initiated, \"S\" seller initiated\n  cond         SYMBOL        -- \"T\" trade, \"O\" open, \"C\" close, \"H\" halt resume\n)\ntimestamp(timestamp)\nPARTITION BY DAY;\n```\n\n**Continuous materialized views (live refresh, no timer):**\n\n```sql\n-- Top of book per venue (derived from arrays)\nCREATE MATERIALIZED VIEW IF NOT EXISTS ${PREFIX}top_of_book_1s AS (\n  SELECT\n    timestamp,\n    symbol,\n    venue,\n    last(bids[1][1]) AS bid_price,\n    last(bids[2][1]) AS bid_size,\n    last(asks[1][1]) AS ask_price,\n    last(asks[2][1]) AS ask_size\n  FROM ${PREFIX}equities_market_data${PREFIX}\n  SAMPLE BY 1s\n)\nPARTITION BY HOUR;\n\n-- NBBO per symbol\nCREATE MATERIALIZED VIEW IF NOT EXISTS ${PREFIX}nbbo_1s AS (\n  SELECT\n    timestamp,\n    symbol,\n    max(bid_price) AS best_bid,\n    min(ask_price) AS best_ask,\n    min(ask_price) - max(bid_price) AS spread\n  FROM ${PREFIX}top_of_book_1s${PREFIX}\n  SAMPLE BY 1s\n)\nPARTITION BY HOUR;\n\n-- Trades OHLCV + VWAP per symbol\nCREATE MATERIALIZED VIEW IF NOT EXISTS ${PREFIX}trades_ohlcv_1s AS (\n  SELECT\n    timestamp,\n    symbol,\n    first(price) AS open,\n    max(price)   AS high,\n    min(price)   AS low,\n    last(price)  AS close,\n    sum(size)    AS volume,\n    sum(price * size) / nullif(sum(size), 0) AS vwap\n  FROM ${PREFIX}equities_trades\n  SAMPLE BY 1s\n)\nPARTITION BY HOUR;\n```\n\n**Timed rollups (cheap long-range queries):**\n\n```sql\nCREATE MATERIALIZED VIEW IF NOT EXISTS ${PREFIX}nbbo_1m${PREFIX}\nREFRESH EVERY 1m DEFERRED START '2025-06-01T00:00:00.000000Z' AS (\n  SELECT\n    timestamp,\n    symbol,\n    max(best_bid) AS max_bid,\n    min(best_ask) AS min_ask,\n    min(best_ask) - max(best_bid) AS min_spread\n  FROM ${PREFIX}nbbo_1s${PREFIX}\n  SAMPLE BY 1m\n)\nPARTITION BY DAY;\n\nCREATE MATERIALIZED VIEW IF NOT EXISTS ${PREFIX}trades_ohlcv_1m\nREFRESH EVERY 1m DEFERRED START '2025-06-01T00:00:00.000000Z' AS (\n  SELECT\n    timestamp,\n    symbol,\n    first(open)  AS open,\n    max(high)    AS high,\n    min(low)     AS low,\n    last(close)  AS close,\n    sum(volume)  AS volume,\n    sum(vwap * volume) / nullif(sum(volume), 0) AS vwap\n  FROM ${PREFIX}trades_ohlcv_1s\n  SAMPLE BY 1m\n)\nPARTITION BY DAY;\n```\n\n\u003e For demo TTLs, add `TTL 3 DAYS` to HOUR partitions and `TTL 1 MONTH`\n\u003e to DAY/MONTH partitions, mirroring your FX setup.\n\n---\n\n## Symbols (30) and venues\n\n**Tech / Mega-cap (14):**\nAAPL, MSFT, NVDA, AMZN, GOOGL, META, TSLA, AVGO, ADBE, ORCL, CRM, NFLX, AMD, INTC\n\n**Finance / Market Infra (9):**\nJPM, BAC, GS, MS, V, MA, CME, ICE, NDAQ\n\n**Healthcare (3):**\nLLY, JNJ, UNH\n\n**Energy (2):**\nXOM, CVX\n\n**Consumer (2):**\nCOST, WMT\n\n**Venues:**\nNASDAQ, NYSE, ARCA, BATS, IEX\n\n---\n\n## Spreads and volumes (defaults)\n\n- **Tick size:** 0.01 USD\n- **Typical spread:** 0.02–0.05 (tight for mega-caps, wider on bursts)\n- **Book depth (shares per level):**\n  - L1: 100–400\n  - L2: 150–600\n  - L3: 200–800\n  - L4..N: gently increasing to ~3,000–5,000 by deep levels\n- **Venue size multipliers:** NASDAQ 1.0, NYSE 1.0, ARCA 0.9, BATS 0.9, IEX 0.8\n- **Trade sizes:** odd lots 1–99 (40–55%), round lots 100–1000 (35–45%),\n  blocks 1,000–10,000 (5–10%); average ~150–250 shares for mega-caps\n\n---\n\n## Session pacing (applies to both modes)\n\n- **Timezone:** `America/New_York`\n- **Phases:** pre-open (quotes only), continuous (normal), close auction (bursty),\n  post-close (taper)\n- **Flag:** `--session_pacing=true|false` (default true)\n- **Off-session control:** `--offsession_trades=none|trickle|full` (default **none**)\n  - `none`: no trades outside regular hours (quotes may continue)\n  - `trickle`: low-rate trades for demos when markets are closed\n  - `full`: ignore session; always allow trades\n\nFor early-UTC live demos, use `--offsession_trades=trickle` or `full`.\n\n---\n\n## Usage\n\n**Faster-than-life (time-boxed ingestion):**\n```bash\npython equities_data_generator.py \\\n  --host 127.0.0.1 \\\n  --protocol tcp \\\n  --mode faster-than-life \\\n  --processes 6 \\\n  --market_data_min_eps 6000 \\\n  --market_data_max_eps 10000 \\\n  --trades_min_eps 2000 \\\n  --trades_max_eps 4000 \\\n  --total_market_data_events 100_000_000 \\\n  --start_ts \"2025-07-01T13:00:00Z\" \\\n  --end_ts   \"2025-07-01T16:00:00Z\" \\\n  --session_pacing true \\\n  --offsession_trades none\n```\n\n**Real-time (wall-clock; allow trickle after hours):**\n```bash\npython equities_data_generator.py \\\n  --host 127.0.0.1 \\\n  --protocol http \\\n  --mode real-time \\\n  --processes 1 \\\n  --market_data_min_eps 800 \\\n  --market_data_max_eps 2000 \\\n  --trades_min_eps 300 \\\n  --trades_max_eps 800 \\\n  --session_pacing true \\\n  --offsession_trades trickle\n```\n\n---\n\n## Flags\n\n**Common:**\n`--host`, `--pg_port`, `--user`, `--password`, `--protocol [http|tcp]`,\n`--token`, `--token_x`, `--token_y`, `--ilp_user`, `--mode`, `--processes`,\n`--suffix`, `--create_views`, `--short_ttl`, `--yahoo_refresh_secs`\n\n**Rates:**\n`--market_data_min_eps`, `--market_data_max_eps`,\n`--trades_min_eps`, `--trades_max_eps`\n\n**Timing:**\n`--start_ts`, `--end_ts` (faster-than-life only)\n\n**Behavior:**\n`--session_pacing`, `--offsession_trades [none|trickle|full]`,\n`--incremental` (faster-than-life only),\n`--chunk_seconds` (default 900 = 15 min, limits memory for long runs)\n\n**Notes:**\nThe generator advances `start_ns` past the latest row in either base table to\navoid overlap. All ingestion uses `TimestampNanos(ns)` with nanosecond offsets.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjavier%2Fequities-data-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjavier%2Fequities-data-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjavier%2Fequities-data-generator/lists"}