{"id":47694955,"url":"https://github.com/togethercomputer/aurora","last_synced_at":"2026-04-02T16:20:22.478Z","repository":{"id":348379536,"uuid":"1189098207","full_name":"togethercomputer/aurora","owner":"togethercomputer","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-31T22:15:26.000Z","size":1694,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-01T01:22:06.860Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/togethercomputer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-23T01:13:29.000Z","updated_at":"2026-04-01T00:53:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/togethercomputer/aurora","commit_stats":null,"previous_names":["togethercomputer/aurora"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/togethercomputer/aurora","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Faurora","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Faurora/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Faurora/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Faurora/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/togethercomputer","download_url":"https://codeload.github.com/togethercomputer/aurora/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Faurora/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31309834,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-02T16:20:19.867Z","updated_at":"2026-04-02T16:20:22.339Z","avatar_url":"https://github.com/togethercomputer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Aurora\n\nAurora is a unified training-serving system for online speculative decoding. It closes the loop between speculator training and serving by continuously learning a draft model directly from live inference traces — treating online speculator learning as an asynchronous reinforcement-learning problem. Aurora is built on top of [TorchSpec](https://github.com/torchspec-project/TorchSpec).\n\nAurora supports **day-0 deployment**: a speculator can be served immediately and rapidly adapted to live traffic, improving system performance while providing immediate utility feedback. Across experiments, Aurora achieves a **1.5x day-0 speedup** on recently released frontier models (e.g., MiniMax-M2.1 and Qwen3-Coder-Next), and adapts effectively to distribution shifts in user traffic, delivering an additional **1.25x speedup** over a well-trained but static speculator on widely used models (e.g., Qwen3).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/diagram.png\" alt=\"Aurora Architecture\" width=\"100%\"\u003e\n\u003c/p\u003e\n\n\n## Deployment Modes\n\n| Mode | Description |\n|------|-------------|\n| **Online** | Training and inference co-located via Ray controller. Draft model updated continuously from live serving traces with hot-swapped weight sync. |\n| **External with draft** | Standalone SGLang server with EAGLE3 speculative decoding. Training improves the draft and syncs weights back periodically. |\n| **External without draft** | Standalone SGLang server runs target-only inference. Draft model trained from scratch — no pre-existing data or speculator required. |\n\n## Setup\n\n```bash\n./tools/build_conda.sh\nmicromamba activate aurora\n```\n\nTo install into your current environment instead:\n\n```bash\n./tools/build_conda.sh current\n```\n\nOptional Flash Attention extras:\n\n```bash\npip install -e \".[fa]\"\n```\n\n## Quick Start\n\n```bash\n# Start training + external SGLang server (Qwen3-4B, day-0 from scratch)\nbash examples/qwen3-4b-external-no-draft/run.sh\n\n# In another terminal, send requests to generate training samples\nbash examples/qwen3-4b-external-no-draft/send_requests.sh\n```\n\nSee [`examples/README.md`](examples/README.md) for the full example catalog, per-model training curves, GPU layout, and config overrides.\n\n## Production Notes\n\n- The example `run.sh` scripts are **single-node oriented** — they manage their own local Ray cluster. For multi-node or Kubernetes deployments, start Ray manually and invoke `python3 -m aurora.train_entry` directly. See [docs/ray.md](docs/ray.md).\n- **External with-draft** mode requires a **shared filesystem** between training and the SGLang server for draft weight sync.\n- `online_serving.hidden_states_dtype` must match the serving model's dtype (e.g., set `float16` when serving an FP8 model).\n- Training and inference GPU sets (`CUDA_VISIBLE_DEVICES` vs `SGLANG_GPUS`) **must not overlap**.\n\n## Checkpoint Conversion\n\nConvert an Aurora checkpoint to Hugging Face format:\n\n```bash\npython tools/convert_to_hf.py --input-dir ./outputs/my_experiment/iter_0010000/\n```\n\nVocabulary pruning can be applied either during training (`draft_vocab_size` in config) or at conversion time:\n\n```bash\npython tools/convert_to_hf.py \\\n    --input-dir ./outputs/my_experiment/iter_0010000/ \\\n    --prune-vocab \\\n    --dataset-path Aeala/ShareGPT_Vicuna_unfiltered \\\n    --draft-vocab-size 32000 \\\n    --tokenizer Qwen/Qwen3-8B \\\n    --chat-template qwen \\\n    --prompt-key conversations\n```\n## Metrics Reporting\n\nW\u0026B logging is disabled by default (report_to: none). To enable it, set report_to: wandb in your config and supply your API key.\n\n## Troubleshooting\n\n| Issue | Reference |\n|-------|-----------|\n| Stuck or failing distributed runs, Ray actor errors | [docs/debugging_ray_jobs.md](docs/debugging_ray_jobs.md) |\n| Ray cluster setup, actor hierarchy, placement groups | [docs/ray.md](docs/ray.md) |\n| Pipeline bottlenecks, slow steps, throughput analysis | [docs/performance_metrics.md](docs/performance_metrics.md) |\n\nEnable verbose logging:\n\n```bash\nAURORA_LOG_LEVEL=DEBUG bash examples/qwen3-4b-external-with-draft/run.sh\n```\n\n## Citation\n\n```bibtex\n@article{wang2026aurora,\n  title={When RL Meets Adaptive Speculative Training: A Unified Training--Serving System},\n  author={Wang, Junxiong and Bie, Fengxiang and Li, Jisen and Zhou, Zhongzhu and Shao, Zelei and Wang, Yubo and Liu, Yinghui and Wu, Qingyang and May, Avner and Yanamandra, Sri and Zhang, Yineng and Zhang, Ce and Dao, Tri and Liang, Percy and Athiwaratkun, Ben and Song, Shuaiwen Leon and Xu, Chenfeng and Wu, Xiaoxia},\n  journal={arXiv preprint arXiv:2602.06932},\n  year={2026}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2Faurora","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftogethercomputer%2Faurora","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2Faurora/lists"}