{"id":50700432,"url":"https://github.com/zd87pl/tsfm-autoresearch","last_synced_at":"2026-06-15T13:30:28.832Z","repository":{"id":356157321,"uuid":"1231054884","full_name":"zd87pl/tsfm-autoresearch","owner":"zd87pl","description":"Empirical validation of per-request autoresearch over frozen TimesFM for multi-tenant resource forecasting","archived":false,"fork":false,"pushed_at":"2026-05-14T14:55:57.000Z","size":484,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-14T16:41:33.260Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zd87pl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-06T15:24:04.000Z","updated_at":"2026-05-14T14:56:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zd87pl/tsfm-autoresearch","commit_stats":null,"previous_names":["zd87pl/tsfm-autoresearch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zd87pl/tsfm-autoresearch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zd87pl%2Ftsfm-autoresearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zd87pl%2Ftsfm-autoresearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zd87pl%2Ftsfm-autoresearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zd87pl%2Ftsfm-autoresearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zd87pl","download_url":"https://codeload.github.com/zd87pl/tsfm-autoresearch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zd87pl%2Ftsfm-autoresearch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34365596,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-15T02:00:07.085Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-09T09:02:20.418Z","updated_at":"2026-06-15T13:30:28.825Z","avatar_url":"https://github.com/zd87pl.png","language":"Python","funding_links":[],"categories":["Full list"],"sub_categories":["Software / Systems Optimization"],"readme":"# TSFM-Autoresearch: Empirical Validation\n\n**Per-request autoresearch over frozen TimesFM for multi-tenant resource forecasting.**\n\nProof-of-concept empirically validating the thesis:\n\n\u003e For multi-tenant resource forecasting, a per-request autoresearch loop over a frozen time-series foundation model achieves better cost-asymmetric performance than the same foundation model deployed with any single fixed configuration, while staying within a 200ms inference latency budget.\n\n## Project Status\n\n| Milestone | Status | Branch |\n|-----------|--------|--------|\n| M1: Synthetic Workload Generator | ✅ Complete | (merged) |\n| M2: Frozen TimesFM Wrapper | ✅ Complete | (merged) |\n| M3: Autoresearch Harness + Loss Functions | ✅ Complete | (merged) |\n| M4: Archetype Store (FAISS) | ✅ Complete | (merged) |\n| M5: Forecast Baselines | ✅ Complete | (merged) |\n| M6: Headline Experiment | ✅ Complete | (merged) |\n| M7: Latency Budget Sweep | ✅ Complete | (merged) |\n| M8: Cold-Start Experiment | ✅ Complete | (merged) |\n| M9: SLA Tier Asymmetry | ✅ Complete | (merged) |\n| M10: GCP Scale-Out | ✅ Complete | (merged) |\n\n**Tests:** 116 pass, 1 skip (non-TimesFM fast suite)\n**Setup:** `bash setup.sh` — zero to results in one command\n\n## Architecture\n\n```\n┌──────────────────────────────────────────────────┐\n│ OUTER LOOP (karpathy/autoresearch + -mlx layout) │\n│ autoresearch_agent/                              │\n│ ├── program.md   ← Research goals for AI agent   │\n│ ├── train.py     ← Modifiable experiment runner  │\n│ ├── prepare.py   ← Fixed infrastructure (data,   │\n│ │                  metrics, ledger writer)       │\n│ ├── results.tsv  ← Experiment ledger             │\n│ └── NOTICE.md    ← Upstream attribution          │\n│                                                   │\n│   Agent: modify → run → evaluate → log → repeat  │\n└──────────────────────┬───────────────────────────┘\n                       │ calls per tenant/request\n┌──────────────────────▼───────────────────────────┐\n│ INNER LOOP (per-request autoresearch)            │\n│ src/tsfm_autoresearch/                           │\n│ ├── autoresearch.py    ← K-config search loop    │\n│ ├── losses.py          ← Cost-asymmetric loss (α)│\n│ ├── tsfm_client.py     ← Frozen TimesFM wrapper  │\n│ ├── archetype_store.py ← FAISS retrieval         │\n│ └── workload_gen.py    ← Synthetic data generator│\n│                                                   │\n│   Per-request: split history → sample K configs  │\n│   → batch forecast → score → select winner       │\n└──────────────────────────────────────────────────┘\n```\n\n## Repository Structure\n\n```\ntsfm-autoresearch/\n├── CLAUDE.md\n├── README.md\n├── pyproject.toml                     # uv-managed\n├── autoresearch_agent/               # karpathy/autoresearch + -mlx layout\n│   ├── program.md                    # Research program\n│   ├── prepare.py                    # Fixed infrastructure (read-only)\n│   ├── train.py                      # Modifiable experiment runner\n│   ├── results.tsv                   # Experiment results ledger\n│   └── NOTICE.md                     # Upstream attribution\n├── src/\n│   ├── tsfm_autoresearch/\n│   │   ├── workload_gen.py           # M1: Synthetic workload generator ✓\n│   │   ├── tsfm_client.py            # M2: Frozen TimesFM wrapper ✓\n│   │   ├── autoresearch.py           # M3: Autoresearch harness ✓\n│   │   ├── losses.py                 # M3: Cost-asymmetric loss ✓\n│   │   └── archetype_store.py        # M4: FAISS archetype store ✓\n│   └── baselines/\n│       ├── protocol.py               # M5: Forecaster protocol\n│       ├── naive_last.py             # M5: Last-value baseline ✓\n│       ├── naive_seasonal.py         # M5: Seasonal baseline ✓\n│       ├── fixed_config.py           # M5: Fixed TimesFM baseline ✓\n│       └── per_tenant_arima.py       # M5: Per-tenant ARIMA ✓\n├── experiments/\n│   ├── 01_workload_characterization.py  ✓\n│   ├── 02_tsfm_wrapper_validation.py    ✓\n│   ├── m6_headline.py                   ✓ (M6)\n│   ├── m7_latency_sweep.py              ✓ (M7)\n│   ├── m8_cold_start.py                 ✓ (M8)\n│   ├── 09_sla_tier_asymmetry.py         (M9)\n│   └── 10_gcp_scale_out.py              (M10)\n├── data/\n│   ├── synthetic/                    # Generated workloads (gitignored)\n│   └── boom/                         # Datadog BOOM benchmark\n├── results/\n├── deploy/\n│   ├── gcp/\n│   └── docker/\n└── tests/\n    ├── test_workload_gen.py          ✓\n    ├── test_tsfm_client.py           ✓\n    ├── test_losses.py                ✓\n    ├── test_autoresearch.py          ✓\n    ├── test_archetype_store.py       ✓\n    ├── test_baselines.py             ✓\n    ├── test_headline_experiment.py   ✓ (M6)\n    ├── test_latency_sweep.py         ✓ (M7)\n    └── test_cold_start.py            ✓ (M8)\n```\n\n## Quick Start\n\n```bash\n# One-command setup (recommended)\nbash setup.sh\n\n# Or manual:\n# Install Python 3.12 + uv\nuv python install 3.12\n\n# Clone and set up\ngit clone https://github.com/zd87pl/tsfm-autoresearch.git\ncd tsfm-autoresearch\nuv sync --extra dev\n\n# Install TimesFM (from source, not PyPI)\ngit clone https://github.com/google-research/timesfm.git /tmp/timesfm\nuv pip install -e \"/tmp/timesfm[torch]\"\n\n# Generate synthetic workloads (M1)\nuv run python -m tsfm_autoresearch.workload_gen --tenants 1000 --days 30 --seed 42\n\n# Run all fast tests (no GPU/TimesFM required)\nuv run pytest tests/ --ignore=tests/test_autoresearch.py --ignore=tests/test_tsfm_client.py\n\n# Run all tests including TimesFM (requires model download, ~15GB RAM)\nuv run pytest tests/\n```\n\n## Milestones\n\n### M1: Synthetic Workload Generator\n1,000 tenants across 8 WordPress-hosting archetypes, 30 days of 1-min resolution.\n\n| Archetype | Signature |\n|-----------|-----------|\n| `low-traffic-blog` | Low baseline, weak diurnal, occasional comment spikes |\n| `ecommerce-retail` | Strong diurnal+weekly, sharp campaign spikes |\n| `news-publisher` | Diurnal + rare extreme breaking-news bursts |\n| `b2b-saas` | Binary on/off business hours, weekend trough |\n| `wp-cron-heavy` | Flat traffic, frequent uncorrelated CPU spikes |\n| `cache-driven` | High CPU–network correlation, cache-miss cascades |\n| `compute-heavy` | High constant baseline, low variance |\n| `idle-ish` | Near-zero, rare crawler spikes |\n\n```bash\nuv run python -m tsfm_autoresearch.workload_gen --tenants 1000 --days 30 --seed 42\n```\n\n### M2: Frozen TimesFM Wrapper\nThin wrapper around `google/timesfm-2.5-200m-pytorch`. Model loaded once, never updated.\n- `forecast(history, config, horizon)` → multivariate → (horizon, D, quantiles)\n- `forecast_batch(histories, configs, horizon)` → single forward pass for K configs\n\n### M3: Autoresearch Harness + Cost-Asymmetric Loss\nPer-request 6-stage loop: split history → sample K configs → batch forecast → score with α-weighted loss → select winner → final forecast.\n\n**Cost-asymmetric loss:** L = α·max(0, y-ŷ) + (1-α)·max(0, ŷ-y)\n- Premium (α=0.90): 90% weight on under-prediction\n- Standard (α=0.75)\n- Basic (α=0.65)\n\n**karpathy-style outer loop:** `autoresearch_agent/` — autonomous experiment management. Agent reads `program.md`, modifies `train.py`, runs experiments, logs to `results.tsv`, analyzes, repeats.\n\n### M4: Archetype Store\nFAISS-backed archetype embeddings. 28 statistical features per tenant, StandardScaler normalization, cosine similarity retrieval. **\u003e90% retrieval accuracy** on held-out tenants with full history.\n\n### M5: Forecast Baselines\nFour baselines conforming to `Forecaster` protocol:\n- **NaiveLast**: predict last observed value\n- **NaiveSeasonal**: 24h seasonal lag (1440 steps at 1-min resolution)\n- **PerTenantARIMA**: ARIMA(1,0,1) per tenant per dimension (slow by design)\n- **FixedConfigTSFM**: TimesFM with grid-searched best context_len (strongest baseline)\n\n### M6: Headline Experiment\nCompares FixedConfigTSFM vs AutoresearchHarness on cost-asymmetric loss across the synthetic tenant fleet. This is the central empirical validation of the thesis.\n\n```bash\nuv run python experiments/m6_headline.py --tenants 200 --horizon 60 --timestamps 50\nuv run python autoresearch_agent/train.py  # via outer loop\n```\n\n### M7: Latency Budget Sweep\nSweeps K ∈ {1, 2, 4, 8, 16, 32} to measure the latency/loss trade-off curve and validate the 200ms budget claim.\n\n```bash\nuv run python experiments/m7_latency_sweep.py --tenants 50 --timestamps 5\n```\n\n### M8: Cold-Start Experiment\nTests archetype retrieval from minimal history (30-480 min) to close the gap between cold autoresearch and the oracle. Wires the `archetype_embedding` parameter in `AutoresearchHarness.forecast()` for the first time.\n\n```bash\nuv run python experiments/m8_cold_start.py --tenants 50 --lengths 30,60,120,240,480\n```\n\n### M9: SLA Tier Asymmetry\nValidates that different α values (premium=0.90, standard=0.75, basic=0.65) produce measurably different forecast behavior — premium forecasts are systematically higher (protective) than basic ones (cost-efficient). Confirms monotonicity: premium \u003e standard \u003e basic.\n\n```bash\nuv run python experiments/m9_sla_asymmetry.py --tenants 100\n```\n\n### M10: GCP Scale-Out\nProduction deployment scaffold: Dockerfile for Cloud Run, FastAPI forecast service with `/forecast` and `/health` endpoints, Cloud Build pipeline. Qdrant migration path documented for horizontal scaling.\n\n```bash\ngcloud builds submit --tag gcr.io/PROJECT/tsfm-autoresearch -f deploy/docker/Dockerfile\ngcloud run deploy tsfm-autoresearch --image gcr.io/PROJECT/tsfm-autoresearch --gpu 1\n```\n\n## Tech Stack\n\n- **Python 3.12**, `uv` for environment management\n- **PyTorch** for TimesFM inference (frozen, never fine-tuned)\n- **FAISS** for archetype embeddings (PoC; Qdrant for GCP)\n- **Polars** for time-series data manipulation\n- **Pydantic v2** for all config and request/response schemas\n- **pytest** + **hypothesis** for testing (7 property-based loss invariants)\n- **Ruff** for lint/format\n- **scikit-learn** for feature standardization\n- **statsmodels** for ARIMA baselines\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzd87pl%2Ftsfm-autoresearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzd87pl%2Ftsfm-autoresearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzd87pl%2Ftsfm-autoresearch/lists"}