{"id":50786773,"url":"https://github.com/lincc-frameworks/rubin-dash","last_synced_at":"2026-06-12T08:04:16.265Z","repository":{"id":351875693,"uuid":"1212830611","full_name":"lincc-frameworks/rubin-dash","owner":"lincc-frameworks","description":"DRP Afterburner for Super HATS - importing rubin catalogs to HATS","archived":false,"fork":false,"pushed_at":"2026-06-08T09:41:44.000Z","size":114,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-08T11:24:30.538Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lincc-frameworks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-16T19:21:11.000Z","updated_at":"2026-05-28T21:26:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lincc-frameworks/rubin-dash","commit_stats":null,"previous_names":["lincc-frameworks/rubin-dash"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lincc-frameworks/rubin-dash","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincc-frameworks%2Frubin-dash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincc-frameworks%2Frubin-dash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincc-frameworks%2Frubin-dash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincc-frameworks%2Frubin-dash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lincc-frameworks","download_url":"https://codeload.github.com/lincc-frameworks/rubin-dash/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lincc-frameworks%2Frubin-dash/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34234593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-12T08:04:13.482Z","updated_at":"2026-06-12T08:04:16.258Z","avatar_url":"https://github.com/lincc-frameworks.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# rubin-dash\n\n**D**RP **A**fterburner for **S**uper **HATS** — converts Rubin DRP outputs into\n[HATS](https://hats.readthedocs.io/) catalogs suitable for use with\n[lsdb](https://lsdb.readthedocs.io/).\n\n[![Template](https://img.shields.io/badge/Template-LINCC%20Frameworks%20Python%20Project%20Template-brightgreen)](https://lincc-ppt.readthedocs.io/en/latest/)\n[![PyPI](https://img.shields.io/pypi/v/rubin-dash?color=blue\u0026logo=pypi\u0026logoColor=white)](https://pypi.org/project/rubin-dash/)\n[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/lincc-frameworks/rubin-dash/smoke-test.yml)](https://github.com/lincc-frameworks/rubin-dash/actions/workflows/smoke-test.yml)\n[![Codecov](https://codecov.io/gh/lincc-frameworks/rubin-dash/branch/main/graph/badge.svg)](https://codecov.io/gh/lincc-frameworks/rubin-dash)\n[![Read The Docs](https://img.shields.io/readthedocs/rubin-dash)](https://rubin-dash.readthedocs.io/)\n\n## Overview\n\nThe pipeline runs a sequence of stages that read from a Butler repository and\nwrite HATS catalogs to an output directory:\n\n| Stage | Description                                           |\n|---|-------------------------------------------------------|\n| `butler` | Find catalog parquet files from the Butler repository |\n| `raw_sizes` | Measure raw parquet file sizes                        |\n| `import` | Import catalogs into HATS format                      |\n| `postprocess` | Post-process imported catalogs                        |\n| `nesting` | Build nested (light-curve) catalogs                   |\n| `collections` | Generate HATS collections                             |\n| `crossmatch` | Cross-match against external surveys (e.g. ZTF, PS1)  |\n| `generate_json` | Generate JSON metadata for the HATS collections       |\n\n## Setting up the environment\n\nThis pipeline requires IDAC access and is normally run on USDF SLAC nodes. It\ncannot be run on the login node. It is *highly recommended* to use `tmux` or `screen` so\nyou can detach and reattach without losing your session. The pipeline typically\ntakes at least ~5h and can take closer to ~15h.\n\n### Request a reserved node\n\n\nYour connection path should look like this:\n\n```mermaid\ngraph LR\n    L[\"\u003ci\u003elogin node\u003c/i\u003e\"] --\u003e T(\"\u003ccode\u003etmux/screen\u003c/code\u003e\")\n    T --\u003e I[\"\u003ci\u003einteractive node\u003c/i\u003e\"]\n    I --\u003e R[\"\u003ci\u003ereserved node\u003c/i\u003e\"]\nstyle T fill:lightblue,stroke:darkblue,stroke-width:2px\n```\nFrom an interactive node, request a reserved node:\n\n```shell\nsrun --pty --exclusive --nodes=1 --time=48:00:00 \\\n     --partition=torino --account=rubin:commissioning bash\n```\n\nDo not exit the reserved node shell directly — use `tmux detach` or screen's `ctrl+a -\u003e d` instead so the\njob keeps running.\n\n### Load the LSST stack\n\n```shell\nsource /sdf/group/rubin/sw/loadLSST.sh\nsetup lsst_distrib\n```\n\n### Install rubin-dash\n\n```shell\npip install git+https://github.com/lincc-frameworks/rubin-dash.git\n```\n\n## Running the pipeline\n\n### 1. Create a config file\n\nThe package ships a `default_config.toml` with sensible defaults for all\ncatalogs, nested catalogs, collections, crossmatch surveys, and Dask settings.\nYour config file is merged on top of those defaults — you only need to specify\nwhat changes for your run.\n\nCopy `example_config.toml` and fill in the `[run]` section. The values come\nfrom the JIRA ticket associated with the weekly release. For example, the\ncollection string `LSSTCam/runs/DRP/20250417_20250921/w_2025_49/DM-53545`\nbreaks down as:\n\n```toml\n[run]\ninstrument = \"LSSTCam\"\nrepo       = \"/repo/embargo\"         # Butler repo path\nversion    = \"w_2025_49\"\ncollection = \"DM-53545\"\noutput_dir = \"/sdf/data/rubin/shared/lsdb_commissioning\"\nrun        = \"20250417_20250921\"      # optional — omit for releases without a run segment\n```\n\n#### Overriding stages\n\nBy default all stages run. Restrict to a subset:\n\n```toml\n[stages]\nenabled = [\"butler\", \"raw_sizes\", \"import\", \"postprocess\"]\n```\n\n#### Overriding catalogs\n\nBy default all six catalogs are processed: `dia_object`, `dia_source`,\n`dia_object_forced_source`, `object`, `source`, `object_forced_source`.\nRestrict to a subset:\n\n```toml\n[catalogs]\nenabled = [\"dia_object\", \"object\"]\n```\n\nOverride settings for a specific catalog:\n\n```toml\n[catalogs.object]\nchunksize = 100_000   # DimensionParquetReader batch size (default 250_000 for object)\n\n[catalogs.object.import_args]\npixel_threshold = 500_000   # override any hats-import argument\n```\n\nAdd a custom catalog not in the defaults (all fields required):\n\n```toml\n[catalogs.my_catalog]\ndims            = [\"tract\"]\ngroup_by        = [\"tract\"]\nflux_columns    = []\nadd_mjds        = false\nuse_schema_file = false\nchunksize       = 500_000\n\n[catalogs.my_catalog.import_args]\nra_column       = \"ra\"\ndec_column      = \"dec\"\ncatalog_type    = \"object\"\npixel_threshold = 1_000_000\n```\n\n#### Overriding nested catalogs\n\nThe defaults define two nested catalogs (`dia_object_lc` and `object_lc`).\nOverride settings or restrict which ones are built:\n\n```toml\n[nested]\nenabled = [\"object_lc\"]   # omit to run all\n\n[nested.object_lc]\npixel_threshold       = 20_000   # override any field\nhighest_healpix_order = 10\n```\n\n#### Overriding collections\n\n```toml\n[collections]\nenabled = [\"object_collection\"]   # omit to run all\n\n[collections.object_collection]\nmargin_threshold = 10.0\n```\n\n#### Overriding crossmatch surveys\n\nThe defaults cross-match against ZTF DR22 and PS1. Add, remove, or reconfigure:\n\n```toml\n# Disable all crossmatches by leaving surveys empty\n[crossmatch]\n\n# Or override a survey's search radius\n[crossmatch.surveys.ztf_dr22]\nradius_arcsec = 0.5\n```\n\n#### Overriding Dask settings\n\nGlobal settings apply to all stages; stage-specific sections override them for\nthat stage only:\n\n```toml\n[dask]\nn_workers        = 32\nthreads_per_worker = 1\nmemory_limit     = \"16GB\"\n\n[dask.stages.nesting]\nn_workers    = 8\nmemory_limit = \"32GB\"\n```\n\n#### Layering multiple config files\n\nYou can split settings across files and layer them at run time — later files\noverride earlier ones:\n\n```shell\nrubin-dash run --config base.toml --config this_week.toml --config overrides.toml\n```\n\n### 2. Run the full pipeline\n\n```shell\nrubin-dash run --config my_config.toml\n```\n\n### CLI options\n\n```\nrubin-dash run --config CONFIG [--config CONFIG ...]\n               [--stages butler,import,postprocess]\n               [--from-stage STAGE]\n               [--catalogs dia_object,object]\n               [--nestings object_lc]\n               [--collections object_collection]\n```\n\n| Option | Description |\n|---|---|\n| `--config` | TOML config file. Repeat to layer overrides (later files win). |\n| `--stages` | Comma-separated list of stages to run. |\n| `--from-stage` | Run all enabled stages starting from this one. |\n| `--catalogs` | Restrict to a subset of catalogs. |\n| `--nestings` | Restrict to specific nested catalogs. |\n| `--collections` | Restrict to specific collections. |\n\nExamples:\n\n```shell\n# Re-run only the import and postprocess stages\nrubin-dash run --config my_config.toml --stages import,postprocess\n\n# Resume from the nesting stage onward\nrubin-dash run --config my_config.toml --from-stage nesting\n\n# Layer a base config with per-run overrides\nrubin-dash run --config base.toml --config overrides.toml\n```\n\n### 3. Interactive notebook access\n\nTo open the notebooks interactively from within the processing environment:\n\n```shell\nrubin-dash notebook --port 8769\n```\n\nThis starts a Jupyter server and prints the SSH tunnel command you need to run\non your laptop to forward the port. It will look something like:\n\n```shell\nssh -J user@s3dflogin.slac.stanford.edu,user@sdfiana004 \\\n    -L 8769:localhost:8769 \\\n    user@sdfmilan005\n```\n\n### 4. Rerunning a single stage after a failure\n\nIf the pipeline fails partway through, you can rerun from a specific stage:\n\n```shell\nrubin-dash run --config my_config.toml --from-stage import\n```\n\nOr run a single stage in isolation:\n\n```shell\nrubin-dash run --config my_config.toml --stages import\n```\n\nIf you need to debug interactively, the `notebooks/` directory contains a\nnotebook for each stage. Run them individually after confirming the environment\nvariables are set. If you encounter unexpected issues with upstream data, reach\nout in `#dm-algorithms-pipelines` on the Rubin Observatory Slack.\n\n## Development\n\n```shell\nconda create -n rubin-dash python=3.11\nconda activate rubin-dash\npip install -e \".[dev]\"\nchmod +x .setup_dev.sh\n./.setup_dev.sh\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flincc-frameworks%2Frubin-dash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flincc-frameworks%2Frubin-dash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flincc-frameworks%2Frubin-dash/lists"}