{"id":46128090,"url":"https://github.com/npow/metaflow-optuna","last_synced_at":"2026-03-02T03:09:28.751Z","repository":{"id":341233991,"uuid":"1169379293","full_name":"npow/metaflow-optuna","owner":"npow","description":"Parallel hyperparameter tuning for Metaflow with true adaptive TPE — no sequential bottleneck","archived":false,"fork":false,"pushed_at":"2026-02-28T16:33:07.000Z","size":736,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-28T19:26:28.783Z","etag":null,"topics":["hyperparameter-optimization","hyperparameter-tuning","machine-learning","metaflow","mlops","optuna","parallel-training","python","tpe"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/npow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-28T15:48:24.000Z","updated_at":"2026-02-28T16:33:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/npow/metaflow-optuna","commit_stats":null,"previous_names":["npow/metaflow-optuna"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/npow/metaflow-optuna","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npow%2Fmetaflow-optuna","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npow%2Fmetaflow-optuna/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npow%2Fmetaflow-optuna/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npow%2Fmetaflow-optuna/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/npow","download_url":"https://codeload.github.com/npow/metaflow-optuna/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npow%2Fmetaflow-optuna/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29991312,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-02T01:47:34.672Z","status":"online","status_checked_at":"2026-03-02T02:00:07.342Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hyperparameter-optimization","hyperparameter-tuning","machine-learning","metaflow","mlops","optuna","parallel-training","python","tpe"],"created_at":"2026-03-02T03:09:27.723Z","updated_at":"2026-03-02T03:09:28.744Z","avatar_url":"https://github.com/npow.png","language":"Python","funding_links":[],"categories":["Developer Tooling"],"sub_categories":[],"readme":"# metaflow-optuna\n\n[![PyPI version](https://img.shields.io/pypi/v/metaflow-optuna?style=flat-square)](https://pypi.org/project/metaflow-optuna/)\n[![CI](https://github.com/npow/metaflow-optuna/actions/workflows/ci.yml/badge.svg)](https://github.com/npow/metaflow-optuna/actions/workflows/ci.yml)\n[![Python versions](https://img.shields.io/pypi/pyversions/metaflow-optuna?style=flat-square)](https://pypi.org/project/metaflow-optuna/)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](LICENSE)\n\nOptuna hyperparameter tuning that fits naturally into Metaflow — parallel trials as native tasks, adaptive TPE that actually adapts, and results you can read without digging through logs.\n\n---\n\n## The problem\n\nRunning Optuna inside Metaflow works, but it's awkward. The standard approach runs all trials sequentially in a single step, which wastes your `@batch` parallelism. If you fan out with `foreach` to parallelize, you lose TPE adaptivity — all parameters have to be sampled upfront before any trial runs, so the sampler can't learn from early results. And either way, you get a wall of Optuna log spam and no easy way to see results in the Metaflow UI.\n\n## How this fixes it\n\nAn ephemeral coordinator service runs in a parallel branch alongside your trials. Each trial task calls the coordinator over HTTP to get its parameters — so TPE sees every completed result before suggesting the next trial, even though all tasks run in parallel. No external database, no infrastructure to set up. When the last trial finishes, the coordinator exits and the study is available as a normal Metaflow artifact.\n\n```\nstart\n  ├── run_coordinator  (Optuna study lives here, serves suggest/tell over HTTP)\n  └── launch_trials\n       ├── run_trial[0]  ─ ask → suggest → tell\n       ├── run_trial[1]  ─ ask → suggest → tell\n       └── run_trial[N]  ─ ask → suggest → tell\n            └── join_study  (study artifact + HTML report + Metaflow card)\n```\n\n---\n\n## Installation\n\n```bash\npip install metaflow-optuna\n```\n\nRequires: `metaflow\u003e=2.9`, `optuna\u003e=3.0`, `fastapi`, `uvicorn`, `httpx`, `plotly`, `matplotlib`\n\n---\n\n## Usage\n\n```python\nfrom metaflow import FlowSpec, Parameter, card, current, step\nfrom metaflow_optuna import hyperparam, optuna_coordinator, render_study_card, render_study_html\n\nclass MyTuningFlow(FlowSpec):\n    n_trials = Parameter(\"n_trials\", default=30, type=int)\n\n    @step\n    def start(self):\n        self.coordinator_id = current.run_id\n        self.n_trials_int   = int(self.n_trials)\n        self.next(self.run_coordinator, self.launch_trials)\n\n    @optuna_coordinator(direction=\"maximize\", sampler=\"tpe\")\n    @step\n    def run_coordinator(self):\n        self.next(self.join_study)\n\n    @step\n    def launch_trials(self):\n        from metaflow_optuna.rendezvous import await_coordinator\n        self.coordinator_url = await_coordinator(self.coordinator_id)\n        self.trial_indices   = list(range(self.n_trials_int))\n        self.next(self.run_trial, foreach=\"trial_indices\")\n\n    @hyperparam(objective=\"val_loss\", direction=\"minimize\")\n    @step\n    def run_trial(self):\n        trial = self.trial  # injected by @hyperparam\n\n        lr      = trial.suggest_float(\"lr\",      1e-4, 1e-1, log=True)\n        dropout = trial.suggest_float(\"dropout\", 0.0,  0.5)\n        layers  = trial.suggest_int(\"layers\",    1,    4)\n\n        self.val_loss = train_and_eval(lr, dropout, layers)\n        self.next(self.join_trials)\n\n    @step\n    def join_trials(self, inputs):\n        self.trial_results = [inp.trial_result for inp in inputs if hasattr(inp, \"trial_result\")]\n        self.merge_artifacts(inputs, exclude=[\"trial\", \"val_loss\", \"trial_result\"])\n        self.next(self.join_study)\n\n    @card(type=\"blank\")\n    @step\n    def join_study(self, inputs):\n        study = next(inp.study for inp in inputs if hasattr(inp, \"study\"))\n        self.study      = study\n        self.study_html = render_study_html(study)\n        render_study_card(study)\n        self.next(self.end)\n\n    @step\n    def end(self):\n        print(\"Best:\", self.study.best_params, \"→\", self.study.best_value)\n\nif __name__ == \"__main__\":\n    MyTuningFlow()\n```\n\n```bash\npython flow.py run --n_trials 50\n```\n\n---\n\n## Results\n\n### Interactive HTML report (`self.study_html`)\n\n![HTML report](docs/screenshots/study_report_top.png)\n\n![HTML report bottom](docs/screenshots/study_report_html.png)\n\n### Metaflow card\n\n![Metaflow card](docs/screenshots/metaflow_ui_card.png)\n\n### Run timeline\n\n![Metaflow UI](docs/screenshots/metaflow_ui_run.png)\n\n---\n\n## Batch mode (no coordinator, pre-sampled)\n\nIf you don't need adaptive sampling — e.g. you're doing a grid sweep or want fully reproducible parallel trials — you can skip the coordinator entirely. Parameters are sampled upfront with QMC and replayed in each task.\n\n```python\nfrom metaflow_optuna import create_study_inputs, rebuild_study, hyperparam\n\nclass BatchFlow(FlowSpec):\n    @step\n    def start(self):\n        self.configs = create_study_inputs(search_space, n_trials=50)\n        self.next(self.train, foreach=\"configs\")\n\n    @hyperparam(objective=\"val_loss\", mode=\"batch\")\n    @step\n    def train(self):\n        trial = self.trial\n        lr    = trial.suggest_float(\"lr\", 1e-4, 1e-1, log=True)\n        ...\n        self.val_loss = ...\n        self.next(self.join)\n\n    @step\n    def join(self, inputs):\n        self.study = rebuild_study(inputs, objective=\"val_loss\", direction=\"minimize\")\n        self.next(self.end)\n```\n\n---\n\n## Crash handling\n\nIf a trial throws an exception, `@hyperparam` catches it, marks the trial as failed, and lets the run continue — so one bad trial doesn't kill 49 others. If a task is hard-killed (OOM, SIGKILL), the coordinator waits up to the timeout and returns a partial study. Metaflow `resume` restarts failed tasks and the coordinator picks up where it left off.\n\n---\n\n## API reference\n\n```python\n@optuna_coordinator(\n    direction=\"minimize\",   # or \"maximize\"\n    sampler=\"tpe\",          # \"tpe\" | \"random\" | \"cmaes\" | \"qmc\"\n    port=None,              # auto-assign if None\n    timeout=7200,           # seconds to wait for all trials\n)\n\n@hyperparam(\n    objective=\"val_loss\",   # name of the self.\u003cattr\u003e you set in the step body\n    direction=\"minimize\",\n    suppress_logs=True,     # silences optuna INFO spam\n    mode=\"adaptive\",        # \"adaptive\" (coordinator) | \"batch\" (pre-sampled)\n)\n\nrender_study_card(study)    # renders into current @card(type=\"blank\") step\nrender_study_html(study)    # returns self-contained HTML string\n```\n\n---\n\n## Example\n\n[`examples/sklearn_tuning.py`](examples/sklearn_tuning.py) — tunes a `RandomForestClassifier` on the Wine dataset, 20 trials, adaptive TPE.\n\n```bash\npython examples/sklearn_tuning.py run --n_trials 20\n```\n\n## License\n\nApache 2.0 — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnpow%2Fmetaflow-optuna","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnpow%2Fmetaflow-optuna","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnpow%2Fmetaflow-optuna/lists"}