{"id":24416456,"url":"https://github.com/NovaSky-AI/SkyThought","last_synced_at":"2025-09-30T17:31:00.294Z","repository":{"id":271928370,"uuid":"914574966","full_name":"NovaSky-AI/SkyThought","owner":"NovaSky-AI","description":"Sky-T1: Train your own O1 preview model within $450","archived":false,"fork":false,"pushed_at":"2025-01-17T04:28:01.000Z","size":9261,"stargazers_count":1830,"open_issues_count":7,"forks_count":193,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-01-18T05:46:13.047Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://novasky-ai.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NovaSky-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-09T21:37:37.000Z","updated_at":"2025-01-18T05:34:48.000Z","dependencies_parsed_at":"2025-01-10T20:34:46.891Z","dependency_job_id":null,"html_url":"https://github.com/NovaSky-AI/SkyThought","commit_stats":null,"previous_names":["novasky-ai/skythought"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyThought","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyThought/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyThought/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NovaSky-AI%2FSkyThought/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NovaSky-AI","download_url":"https://codeload.github.com/NovaSky-AI/SkyThought/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234757151,"owners_count":18881936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-20T08:01:45.470Z","updated_at":"2025-09-30T17:31:00.287Z","avatar_url":"https://github.com/NovaSky-AI.png","language":"Python","funding_links":[],"categories":["Project List","Open-source","A01_文本生成_文本对话","Python"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","Models","大语言对话模型及数据"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# SkyThought\n\n[![Github](https://img.shields.io/badge/SkyThought-000000?style=for-the-badge\u0026logo=github\u0026logoColor=000\u0026logoColor=white)](https://github.com/NovaSky-AI/SkyThought) [![Twitter](https://img.shields.io/badge/NovaSky-white?style=for-the-badge\u0026logo=X\u0026logoColor=000\u0026color=000\u0026labelColor=white)](https://x.com/NovaSkyAI) [![Hugging Face Collection](https://img.shields.io/badge/NovaSky-fcd022?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000\u0026labelColor)](https://huggingface.co/NovaSky-AI) [![Discord](https://img.shields.io/badge/NovaSky-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white)](https://discord.gg/kexQXy5yA3)\n\n\n\u003cdiv align=\"center\" style=\"font-family: Arial, sans-serif;\"\u003e\n  \u003cp\u003e\n    \u003ca href=\"#news\" style=\"text-decoration: none; font-weight: bold;\"\u003eNews\u003c/a\u003e •\n    \u003ca href=\"#links\" style=\"text-decoration: none; font-weight: bold;\"\u003eLinks\u003c/a\u003e •\n    \u003ca href=\"#getting-started\" style=\"text-decoration: none; font-weight: bold;\"\u003eGetting Started\u003c/a\u003e •\n    \u003ca href=\"#evaluation\" style=\"text-decoration: none; font-weight: bold;\"\u003eEvaluation\u003c/a\u003e •\n    \u003ca href=\"#citation\" style=\"text-decoration: none; font-weight: bold;\"\u003eCitation\u003c/a\u003e •\n    \u003ca href=\"#acknowledgement\" style=\"text-decoration: none; font-weight: bold;\"\u003eAcknowledgement\u003c/a\u003e \n  \u003c/p\u003e\n\u003c/div\u003e\n\n\u003c/div\u003e\n\n\n# News\n- **[2025/02/21]** 🎉 We released S*: Test time scaling for code generation ([paper](https://arxiv.org/pdf/2502.14382), [code](https://github.com/NovaSky-AI/SkyThought/tree/main/skythought/test-time-scaling)), a simple and extensible test time scaling framework for code generation.\n- **[2025/02/11]** 🎉 We released Sky-T1-7B ([model](https://huggingface.co/NovaSky-AI/Sky-T1-7B)) and Sky-T1-mini ([model](https://huggingface.co/NovaSky-AI/Sky-T1-mini)) to demonstrate the potential of RL in further enhancing model's capability beyond distillation.\n- **[2025/01/23]** ⚡️ We released Sky-T1-32B-Flash ([model](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash), [data](https://huggingface.co/datasets/NovaSky-AI/Sky-T1_preference_data_10k)) to tackle overthinking and reduce reasoning sequence lengths while maintaining accuracy.\n- **[2025/01/19]** 🎉 [Chat demo](http://164.152.23.196:3000/) for Sky-T1-32B-Preview is alive! Please check it out!\n- **[2025/01/10]** 🎉 We have released our Sky-T1-32B-Preview [model](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview) and [data](https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k) through [HuggingFace](https://huggingface.co/NovaSky-AI)!\n\n\n# Links\n\n- 📜 [Sky-T1-7B and Sky-T1-mini Blog Post](https://novasky-ai.github.io/posts/sky-t1-7B/)\n- 📜 [Sky-T1-32B-Flash Blog Post](https://novasky-ai.github.io/posts/reduce-overthinking/)\n- 📜 [Sky-T1-32B-Preview model Blog Post](https://novasky-ai.github.io/posts/sky-t1/)\n- 🤗 [Sky-T1-32B-Preview model](https://huggingface.co/NovaSky-AI)\n\n# Getting Started\n\nWe open source the code and scripts we used for data curation, training, and evaluation for Sky-T1-32B-Preview, you can find more details in each directory.\n- [`recipes`](./recipes/): Recipes - data curation steps and training strategies - for building our models `Sky-T1-32B-Flash`, `Sky-T1-32B-Preview` and `Sky-T1-7B` series. \n- [`skythought/evals`](./skythought/evals/): Our data generation and evaluation library. We provide a convenient CLI for evaluation as well as a `Scorer` API for scoring during data curation and training ([example](./examples/scoring.ipynb)). \n- [`skythought/train`](./skythought/train/): Training scripts for Sky-T1. We use [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) to perform training. \n- [`skythought/skythought-rl`](./skythought/skythought-rl/): RL training code for Sky-T1-7B and Sky-T1-mini.\n\n# Evaluation\n\n## Usage\n\nYou can install the latest release from PyPI or from [source](#installing-from-source):\n\n```shell\npip install skythought\n```\n\n### Installing from source\n\n```shell\n# Clone the repository\ngit clone https://github.com/NovaSky-AI/SkyThought.git\ncd SkyThought\n\n# Create and activate a virtual environment (using uv here)\nuv venv --python 3.10\nsource .venv/bin/activate\n\n# Install the package in editable mode\nuv pip install -e .\n```\n\nRunning evaluation is as simple as: \n\n```bash\nskythought evaluate --model NovaSky-AI/Sky-T1-32B-Preview --task aime24\n```\n\nWe support a wide variety of datasets in mathematics, science and coding:\n\n- AIME'24\n- MATH500\n- GPQADiamond\n- MMLU\n- ARC-Challenge\n- OlympiadBench\n- AMC'23 \n- TACO \n- APPS\n- LiveCodeBench\n- MMLU Pro\n- MinervaMath\n- GSM8K\n- AIME'25\n\nFor more details, please refer to our [evaluation guide](examples/evaluate.ipynb) and the [evaluation README](skythought/evals/README.md).\n\n\n### Evaluation results\nFollowing, we show our evaluation results for the Sky-T1-32B-Preview model across math, coding, and science benchmarks.\n\n| Metric                | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ   | o1-preview |\n|-----------------------|---------------------|--------|-------|------------|\n| Math500              | 86.4                    | 81.4    | 92.2 | 81.4       |\n| AIME2024             | 43.3                    | 16.7    | 50.0  | 40.0       |\n| LiveCodeBench-Easy   | 86.3                    | 84.6   | 90.7  | 92.9       |\n| LiveCodeBench-Medium | 56.8                    | 40.8   | 56.3  | 54.9       |\n| LiveCodeBench-Hard   | 17.9                    | 9.8   | 17.1  | 16.3       |\n| GPQA-Diamond         | 56.8                    | 45.5   | 52.5  | 75.2       |\n| OlympiadBench (Math, EN)    | 59.79\t           | 46.74\t| 62.17\t | 59.2      | \n\n#### Results on non-reasoning benchmarks\n\nWe also evaluate on non-reasoning benchmarks (these are benchmarks for instruction-following, QA, etc) to test whether the model has traded-off capability in other domains for better performance in reasoning-related benchmarks. \n\n\n| Metric | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ-32B-Preview | Eval Implementation |\n|---------|-------------------|---------------------|-----------------|-------------------|\n| MMLU (0 shot; no CoT) | **78.36** | 74.14 | 71.23 | [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) |\n| MMLU (5 shot; no CoT) | 82.46 | **82.62** | 82.32 | [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) |\n| ARC-C (0 shot; no CoT) | **49.49** | 49.4 | 49.66 | [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) |\n| IFEval | 75.79 | **78.74** | 42.51 | [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) |\n| LLM-as-a-Judge | 9.12\t| **9.19** | 8.30 | [fastchat](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) |\n| MGSM (0 shot; `direct`) | 33 | **42.3** | 19.07 | [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) |\n| MGSM (8-shot; `direct`) | 58.4 | **61.47** | 58.5 | [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) |\n| BFCL-v3 | 53.18 | **58.92** | 17.41 | [BFCL](https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard) |\n| Arena-Hard | **74.79** | 66.51 | 52.6 | [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) |\n\nFor more details, refer [here](./skythought/evals/base_instruct_evals.md).\n\n# Fully Open-source: Driving Progress Together\nWe believe that open-source collaboration drives progress, and with Sky-T1-32B-Preview, we are fully committed to empowering the community. We open-source all details (i.e., data, codes, model weights) to enable the community to replicate and improve on our results *easily*:\n\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eModel\u003c/th\u003e\n      \u003cth style=\"background-color: #f2f2f2;\"\u003e\u003cdiv align=\"center\"\u003eSky-T1-32B-Preview\u003c/div\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cdiv align=\"center\"\u003eSTILL-2\u003c/div\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cdiv align=\"center\"\u003eJourney\u003c/div\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cdiv align=\"center\"\u003eQwQ\u003c/div\u003e\u003c/th\u003e\n      \u003cth\u003e\u003cdiv align=\"center\"\u003eo1\u003c/div\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eData\u003c/td\u003e\n      \u003ctd style=\"background-color: #f2f2f2;\"\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eCode\u003c/td\u003e\n      \u003ctd style=\"background-color: #f2f2f2;\"\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eReport\u003c/td\u003e\n      \u003ctd style=\"background-color: #f2f2f2;\"\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eMath domain\u003c/td\u003e\n      \u003ctd style=\"background-color: #f2f2f2;\"\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eCoding domain\u003c/td\u003e\n      \u003ctd style=\"background-color: #f2f2f2;\"\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eModel Weights\u003c/td\u003e\n      \u003ctd style=\"background-color: #f2f2f2;\"\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e✅\u003c/div\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003cdiv align=\"center\"\u003e❌\u003c/div\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n# Citation\nThe code in this repository is mostly described in the post below. Please consider citing this work if you find the repository helpful. \n\n```bibtex\n@misc{sky_t1_2025,\n  author       = {NovaSky Team},\n  title        = {Sky-T1: Train your own O1 preview model within $450},\n  howpublished = {https://novasky-ai.github.io/posts/sky-t1},\n  note         = {Accessed: 2025-01-09},\n  year         = {2025}\n}\n```\n\n# Acknowledgement\nThis work is done at [Berkeley Sky Computing Lab](https://sky.cs.berkeley.edu/), with the amazing compute support from [Lambda Labs](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5), [Anyscale](https://www.anyscale.com/), and [Databricks](https://www.databricks.com/). We would like to express our gratitude for the valuable academic feedback and support from the [Still-2 Team](https://arxiv.org/pdf/2412.09413), and Junyang Lin from the [Qwen Team](https://qwenlm.github.io/).\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNovaSky-AI%2FSkyThought","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNovaSky-AI%2FSkyThought","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNovaSky-AI%2FSkyThought/lists"}