{"id":29823811,"url":"https://github.com/javier/fx-data-generator","last_synced_at":"2025-08-19T18:47:37.776Z","repository":{"id":301555044,"uuid":"1009627156","full_name":"javier/fx-data-generator","owner":"javier","description":"generates fx data for demo purposes","archived":false,"fork":false,"pushed_at":"2025-07-18T16:36:57.000Z","size":70,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-18T20:53:34.387Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/javier.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-27T12:47:55.000Z","updated_at":"2025-07-18T16:37:00.000Z","dependencies_parsed_at":"2025-06-27T13:41:17.071Z","dependency_job_id":"3845eb00-a6cf-420e-9ed6-d964f2fd8cd6","html_url":"https://github.com/javier/fx-data-generator","commit_stats":null,"previous_names":["javier/fx-data-generator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/javier/fx-data-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Ffx-data-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Ffx-data-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Ffx-data-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Ffx-data-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/javier","download_url":"https://codeload.github.com/javier/fx-data-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javier%2Ffx-data-generator/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267617643,"owners_count":24116208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-29T02:08:52.396Z","updated_at":"2025-07-29T02:08:53.044Z","avatar_url":"https://github.com/javier.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FX Synthetic Data Generator \u0026 Ingestor for QuestDB\n\nThis script generates highly realistic, multi-level FX order book and price tick data and ingests it into QuestDB at high speed.\n\nDesigned for **stress testing**, benchmarking, and live demo scenarios, it supports both wall-clock-paced (“real-time”) and maximum-throughput (“faster-than-life”) simulation. Multi-process orchestration, pip-accurate orderbooks, WAL backpressure detection, and robust state management ensure both realism and throughput—without data overlap or out-of-order chaos.\n\n---\n\n## Features\n\n- **Realistic Multi-Level Orderbook Simulation:**\n  For a configurable set of FX pairs, generates L2 snapshots (bids/asks/volumes at multiple levels) and price ticks at a tunable, randomized rate. All values are always valid multiples of the correct pip for each pair (e.g., 0.0001 for EURUSD, 0.01 for USDJPY).\n- **True Tick Precision, No Floating Point Drift:**\n  All prices, spreads, and ladders are always generated as valid pips—never random floats—at every step.\n- **Parallel High-Throughput Ingestion:**\n  Multiprocessing splits both the event plan and pre-evolved state for max CPU utilization and ingest bandwidth. Each process ingests a unique time partition, never overlapping.\n- **WAL Lag Detection and Flow Control:**\n  Dedicated monitor process queries QuestDB’s WAL progress; when lag exceeds a threshold, all workers pause and resume cleanly.\n- **Resumable, Incremental Ingestion:**\n  Supports loading state from the latest DB rows, for continuous/incremental ingestion without data overlap, or generating from scratch.\n- **Start/End Timestamp Enforcement:**\n  The generator will **never** overlap or backfill data into already-populated time ranges; it automatically advances the start time if needed and stops cleanly at the end.\n- **Materialized Views Auto-Setup:**\n  On startup, creates all necessary tables and downstream views (with suffix support) if not already present.\n\n---\n\n## Mechanics\n\n### Table \u0026 View Creation\n\nTables (`market_data`, `core_price`, with optional suffix) and their required materialized views are created at startup.\n\n### Initial State\n\n- By default (incremental mode), loads last-known state from the DB (`core_price LATEST BY symbol`), advancing the simulation timestamp just after the latest row found.\n- Otherwise, starts from deterministic pip-aligned midpoints for each pair.\n\n### Event Plan\n\n- In `\"faster-than-life\"` mode, the total number of `market_data` (L2) events is set by `--total_market_data_events` and distributed across simulated seconds, then evenly split among processes. Each process gets a unique time slice—no overlap.\n- In `\"real-time\"` mode, the generator respects wall-clock time, writing only as time passes. If prior data exists in the DB, it waits for wall-clock to advance past the last ingested row, then starts.\n\n### Backpressure Management\n\n- The WAL monitor process checks for lag (`sequencerTxn - writerTxn` on the `market_data` table) and pauses/resumes ingestion globally via a multiprocessing event.\n- The threshold is `3 * num_processes`, with resume only when lag is fully cleared (debounced).\n\n### Exit Logic\n\n- If the requested start timestamp is already covered by existing data (or the end is past), the script exits with a clear message and does **not** overwrite or overlap existing data.\n- All subprocesses, including the WAL monitor, are cleanly terminated at the end.\n\n---\n\n## Arguments\n\n| Argument                     | Type      | Description                                                                                                    |\n|------------------------------|-----------|----------------------------------------------------------------------------------------------------------------|\n| `--host`                     | str       | Host/IP of QuestDB instance. Default: `127.0.0.1`                                                              |\n| `--pg_port`                  | str/int   | PostgreSQL port for QuestDB. Default: `8812`                                                                   |\n| `--user`                     | str       | Database user for metadata. Default: `admin`                                                                   |\n| `--password`                 | str       | Password for metadata. Default: `quest`                                                                        |\n| `--token`                    | str       | (Optional) ILP/HTTP authentication token (JWK)                                                                 |\n| `--token_x`                  | str       | (Optional) JWK token X (for tcps)                                                                              |\n| `--token_y`                  | str       | (Optional) JWK token Y (for tcps)                                                                              |\n| `--ilp_user`                 | str       | (Optional) ILP/HTTP ingestion user. Default: `admin`                                                           |\n| `--protocol`                 | str       | `tcp` or `http` (`tcps`/`https` if token present). Default: `http`                                             |\n| `--mode`                     | str       | `\"real-time\"` (wall clock pacing) or `\"faster-than-life\"` (max speed). **Required**                            |\n| `--market_data_min_eps`      | int       | Min events/sec for `market_data` (per simulated second). Default: `1000`                                       |\n| `--market_data_max_eps`      | int       | Max events/sec for `market_data`. Default: `15000`                                                             |\n| `--core_min_eps`             | int       | Min events/sec for `core_price`. Default: `800`                                                                |\n| `--core_max_eps`             | int       | Max events/sec for `core_price`. Default: `1100`                                                               |\n| `--total_market_data_events` | int       | Total `market_data` (L2) events to generate (across all workers). Default: `1_000_000`                         |\n| `--start_ts`                 | str       | (Optional) Simulation start time, ISO8601 format. Default: now (UTC)                                           |\n| `--end_ts`                   | str       | (Optional) Simulation end time, ISO8601 format.                                                                |\n| `--processes`                | int       | Number of worker processes. Default: `1`                                                                       |\n| `--min_levels`               | int       | Min orderbook levels (bids/asks). Default: `5`                                                                 |\n| `--max_levels`               | int       | Max orderbook levels. Default: `5`                                                                             |\n| `--incremental`              | bool/flag | If true (default), load last state from DB to continue appending, never overlap existing data.                 |\n| `--create_views`             | bool/flag | If true (default), create required materialized views if not present.\n| `--short_ttl`                | bool/flag | If true (non-default), will enforce TTL for all tables (3 DAYS) and views (3 DAYS or 1 MONTH, depending on sampling).                                       |\n| `--suffix`                   | str       | (Optional) Suffix to append to all table/view names.                                                           |\n\n---\n\n## WAL Backpressure Control\n\n- **Pause:** When `sequencerTxn - writerTxn \u003e 3 * num_processes`, all workers pause and print a message.\n- **Resume:** Only resumes after at least one interval with *zero* lag (`sequencerTxn == writerTxn`).\n\n---\n\n## Usage Examples\n\n### High-speed batch ingest (“faster-than-life” mode)\n\nIngests ~900M rows in one day, using TCP/TCPS with authentication:\n\n```bash\npython fx_data_generator.py \\\n  --host 192.21.12.42 \\\n  --market_data_min_eps 8200 \\\n  --market_data_max_eps 11000 \\\n  --core_min_eps 700 \\\n  --core_max_eps 1000 \\\n  --token \"secret_jwk_token\" \\\n  --token_x \"jwk_token_x_public_key\" \\\n  --token_y \"jwk_token_y_public_key\" \\\n  --ilp_user ilp_ingest \\\n  --protocol tcp \\\n  --mode faster-than-life \\\n  --processes 8 \\\n  --total_market_data_events 900_000_000 \\\n  --start_ts \"2025-07-05T00:00:00Z\" \\\n  --end_ts \"2025-07-06T00:00:00Z\"\n```\n\n### Wall-clock pacing (“real-time” mode)\n\nIngests in real time, with up to 100M events. Starts now (ignores `--start_ts`):\n\n```bash\npython fx_data_generator.py \\\n  --host 192.21.12.42 \\\n  --market_data_min_eps 1000 \\\n  --market_data_max_eps 2000 \\\n  --core_min_eps 700 \\\n  --core_max_eps 1000 \\\n  --protocol http \\\n  --mode real-time \\\n  --processes 1 \\\n  --total_market_data_events 100_000_000\n```\n\n---\n\n## Notes\n\n- The script will **never** insert data that overlaps existing rows—if there is already data for the requested range, it advances the start timestamp or exits.\n- All state, event counts, and random walks are pip-quantized, so *no invalid ticks or drift* are possible.\n- All subprocesses are shut down cleanly, and WAL monitoring never lingers after completion.\n\n---\n\n**Questions?** See comments in the script or [open an issue](https://github.com/questdb/questdb).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjavier%2Ffx-data-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjavier%2Ffx-data-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjavier%2Ffx-data-generator/lists"}