https://github.com/open-thoughts/open-thoughts

Fully open data curation for reasoning models
https://github.com/open-thoughts/open-thoughts

open-data reasoning

Last synced: about 1 year ago
JSON representation

Fully open data curation for reasoning models

Host: GitHub
URL: https://github.com/open-thoughts/open-thoughts
Owner: open-thoughts
License: apache-2.0
Created: 2025-01-27T22:28:56.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-04-03T06:54:47.000Z (about 1 year ago)
Last Synced: 2025-04-03T07:35:47.713Z (about 1 year ago)
Topics: open-data, reasoning
Language: Python
Homepage: https://open-thoughts.ai
Size: 2.57 MB
Stars: 1,604
Watchers: 24
Forks: 140
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - open-thoughts/open-thoughts
awesome-data-quality - OpenThoughts-114k - Large-scale dataset of reasoning trajectories distilled from DeepSeek R1. (2024) (Large Language Model Data / Cognition Engineering & Test-Time Scaling)
awesome-opensource-ai - OpenThoughts - Fully open data curation for reasoning models. Curated high-quality reasoning datasets for training and evaluating LLMs. Apache 2.0 licensed. ![GitHub stars](https://img.shields.io/github/stars/open-thoughts/open-thoughts?style=social) (9. Evaluation, Benchmarks & Datasets)

README

          



  





  

    

  

  

    

  

  


  Curating the best open reasoning datasets
 

  A collaboration led by Bespoke Labs and the DataComp community





Our first goal is to curate a reasoning dataset to train state-of-the-art small reasoning models that surpass [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) and [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) on math and code reasoning benchmarks.

# News

- **[2025/04/03]** 🎉 We release [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M), [OpenThinker2-7B](https://huggingface.co/open-thoughts/OpenThinker2-7B), and [OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B).

- **[2025/03/13]** 🎉 [OpenThoughts Alice in Wonderland Blogpost](https://www.open-thoughts.ai/blog/aiw) is out.

- **[2025/02/16]** 🎉 [OpenThinker on Ollama](https://ollama.com/library/openthinker) reaches 400k downloads.

- **[2025/02/14]** 🎉 Chat with OpenThinker in the [online playground](https://playground.bespokelabs.ai/).

- **[2025/02/13]** 🎉 OpenThinker is now [available on Ollama](https://ollama.com/library/openthinker) for easy local inference.

- **[2025/02/12]** 🎉 We release [OpenThinker-32B](https://huggingface.co/open-thoughts/OpenThinker-32B), the [best open-data reasoning model](https://www.open-thoughts.ai/blog/scale).

- **[2025/02/02]** 🎉 [OpenThoughts-114k dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) is the #1 trending dataset on Hugging Face.

- **[2025/01/30]** 🎉 Reasoning benchmarks are added to [Evalchemy](https://github.com/mlfoundations/Evalchemy) and [compared](https://www.open-thoughts.ai/blog/measure) to publicly reported scores.

- **[2025/01/28]** 🎉 [Open Thoughts](https://www.open-thoughts.ai/) launches with [OpenThoughts-114k dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) and [OpenThinker-7B model](https://huggingface.co/open-thoughts/OpenThinker-7B).

- **[2025/01/27]** 🎉 [Bespoke-Stratos-17k dataset](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k) is the #2 trending dataset on Hugging Face.

- **[2025/01/22]** 🎉 [Bespoke-Stratos-17k dataset](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k) and [Bespoke-Stratos-32B model](https://huggingface.co/bespokelabs/Bespoke-Stratos-32B) are [announced](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation).

# Results

Our [OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B) model trained on [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) is the state of the art open-data reasoning model.

The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).

[OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B) vs other 32B models

| Model                                                                                           | Data | AIME24 | AIME25 | AMC23 | MATH500 | GPQA-D | LCBv2 |

| ----------------------------------------------------------------------------------------------- | ---- | ------ | ------ | ----- | ------- | ------ | ----- |

| [OpenThinker2-32B](https://huggingface.co/open-thoughts/OpenThinker2-32B)                       | ✅    | 76.7   | 58.7   | 94.0  | 90.8    | 64.1   | 72.5  |

| [OpenThinker-32B](https://huggingface.co/open-thoughts/OpenThinker-32B)                         | ✅    | 68.0   | 49.3   | 95.5  | 90.6    | 63.5   | 68.6  |

| [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ❌    | 74.7   | 50.0   | 96.5  | 90.0    | 65.8   | 72.3  |

| [Light-R1-32B](https://huggingface.co/qihoo360/Light-R1-32B)                                    | ✅    | 74.7   | 58.0   | 96.0  | 90.4    | 62.0   | 56.0  |

| [S1.1-32B](https://huggingface.co/simplescaling/s1.1-32B)                                       | ✅    | 59.3   | 42.7   | 91.5  | 87.4    | 62.0   | 58.7  |

[OpenThinker2-7B](https://huggingface.co/open-thoughts/OpenThinker2-7B) vs other 7B models

| Model                                                                                         | Data | AIME24 | AIME25 | AMC23 | MATH500 | GPQA-D | LCBv2       |

| --------------------------------------------------------------------------------------------- | ---- | ------ | ------ | ----- | ------- | ------ | ----------- |

| [OpenThinker2-7B](https://huggingface.co/open-thoughts/OpenThinker2-7B)                       | ✅    | 50.0   | 33.3   | 89.5  | 88.4    | 49.3   | 55.6        |

| [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B)                         | ✅    | 31.3   | 23.3   | 74.5  | 83.2    | 42.9   | 38.0        |

| [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | ❌    | 57.3   | 33.3   | 92.0  | 89.6    | 47.3   | 48.4        |

| [OlympicCoder-7B](https://huggingface.co/open-r1/OlympicCoder-7B)                             | ✅    | 20.7   | 15.3   | 63.0  | 74.8    | 25.3   | 55.4        |

| [OpenR1-Qwen-7B](https://huggingface.co/open-r1/OpenR1-Qwen-7B)                               | ✅    | 48.7   | 34.7   | 88.5  | 87.8    | 21.2   | 9.5

 |

To mitigate variance in evaluation accuracy, we compute average scores over multiple evaluation runs with different seeds. We average over 5 runs for AIME and AMC, and 3 runs for the other tasks. No system prompt is used, the maximum token length is set to 32,768, and temperature is 0.7.

We are fully open-source. Our [model weights](https://huggingface.co/open-thoughts), [datasets](https://huggingface.co/open-thoughts), [data generation code](https://github.com/open-thoughts/open-thoughts), [evaluation code](https://github.com/mlfoundations/Evalchemy), and [training code](https://github.com/hiyouga/LLaMA-Factory) are all publicly available. 

# Installation

```

make install

poetry shell

```

Set the DeepSeek API key:

```

export DEEPSEEK_API_KEY=your_api_key

```

Set HF_ORG to your organization id. Set HF_PRIVATE=true if you want to push to a private repo.

```

export HF_ORG=your_org_id

export HF_PRIVATE=false

```

# OpenThoughts2-1M Data Generation

The [OpenThoughts2-1M](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M) dataset is a combination of [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k), [OpenR1](https://huggingface.co/open-r1), and our newly generated math and code reasoning data. We generate the additional math and code data by ablating on 26 different question generation methodologies and sampling from the highest performing ones.

The recipe is outlined below:

    

    

More details can be found in our [blog post](https://www.open-thoughts.ai/blog/thinkagain). 

# OpenThoughts-114k Data Generation

For OpenThoughts-114k, we generate data for the following domains:

1. Code

2. Math

3. Science

4. Puzzle

The recipe is outlined below:

    

    

More instructions are in [open_thoughts/README.md](open_thoughts/README.md).

# Training and Evaluation

Training and evaluation code coming soon.

# Links

- 📊 [OpenThoughts2 and OpenThinker2 Blog Post](https://www.open-thoughts.ai/blog/thinkagain)

- 💻 [Open Thoughts GitHub Repository](https://github.com/open-thoughts/open-thoughts)

- 🧠 [OpenThoughts2-1M dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts2-1M)

- 🤖 [OpenThinker2-7B model](https://huggingface.co/open-thoughts/OpenThinker2-7B)

- 🤖 [OpenThinker2-32B model](https://huggingface.co/open-thoughts/OpenThinker2-32B)

# Citation

```

@misc{Open Thoughts,

  author = {Open Thoughts Team},

  month = jan,

  title = {{Open Thoughts}},

  year = {2025}

}

```

# About Us

We are a team of researchers and engineers from [Bespoke Labs](https://www.bespokelabs.ai/), Stanford, University of California Berkeley, University of Washington, UT Austin, Juelich Supercomputing Center (JSC), LAION, UCLA, UNC Chapel Hill, UT Austin, and Toyota Research Institute united around building the best datasets (and thus the best models). See our previous works at [datacomp.ai](https://www.datacomp.ai/) and [mlfoundations](https://github.com/mlfoundations).

# Sponsors

Open Thoughts is supported by 

- [Bespoke Labs](https://www.bespokelabs.ai/)

- [Lambda Labs](https://lambdalabs.com/)

- [NSF IFML](https://www.ifml.institute/)

- [UT Austin Machine Learning Lab](https://ml.utexas.edu/)

- [Juelich Supercomputing Center](https://www.fz-juelich.de/en/ias/jsc)

- [Toyota Research Institute](https://www.tri.global)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/open-thoughts/open-thoughts

Awesome Lists containing this project

README