https://github.com/bklieger-groq/g1

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
https://github.com/bklieger-groq/g1

Last synced: about 2 months ago
JSON representation

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Host: GitHub
URL: https://github.com/bklieger-groq/g1
Owner: bklieger-groq
License: mit
Created: 2024-09-13T21:20:48.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-01-27T18:36:13.000Z (5 months ago)
Last Synced: 2025-04-28T13:59:46.825Z (2 months ago)
Language: Python
Homepage:
Size: 374 KB
Stars: 4,212
Watchers: 54
Forks: 379
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-llm-strawberry - bklieger-groq
StarryDivineSky - bklieger-groq/g1 - 3.1 70b 创建类似 o1 的推理链。这是使用提示策略通过类似 o1 的推理链来提高 LLM 的推理能力的早期原型。这允许 LLM “思考”并解决通常会难倒领先模型的逻辑问题。与 o1 不同，它显示了所有推理标记，并且该应用程序使用开源模型。G1 是实验性的，并且是开源的，以帮助激励开源社区开发新的策略来产生类似 O1 的推理。该实验有助于展示在可视化步骤中提示推理的力量，而不是与使用不同技术的 o1 进行比较或完全复制。相反，OpenAI 的 o1 通过大规模强化学习进行训练，以使用 Chain of Thought 进行推理，从而在复杂的博士级问题上实现最先进的性能。g1 展示了单独提示克服简单的 LLM 逻辑问题（如 Strawberry 问题）的潜力，使现有的开源模型能够从动态推理链和改进的界面中受益。由 Llama3.1-70b 提供支持的 g1 创建了推理链，原则上是一个动态的思维链，它允许 LLM 能够“思考”并解决一些通常会难倒领先模型的逻辑问题。在每个步骤中，LLM 可以选择继续另一个推理步骤，或提供最终答案。每个步骤都有标题，并且对用户可见。系统提示符还包括 LLM。Prompt Breakdown 下有完整的解释，但有几个示例要求模型“包括对替代答案的探索”和“使用至少 3 种方法来得出答案”。因此，通过将思维链与尝试多种方法、探索替代答案、质疑以前的草案解决方案并考虑 LLM。仅此一项，无需任何训练，就足以在草莓问题上达到 ~70% 的准确率（n=10，“草莓中有多少 R？）在没有提示的情况下，Llama-3.1-70b 的准确率为 0%，ChatGPT-4o 的准确率为 30%。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
awesome-LLM-resources - g1 - 3.1 70b on Groq to create o1-like reasoning chains. (推理 Inference)
awesome-llm-reasoning-openai-o1-survey - [Github
awesome-llm-reasoning-openai-o1-survey - [Github

README

# g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

[Video Demo](https://github.com/user-attachments/assets/db2a221f-f8eb-48c3-b5a7-8399c6300243)

This is an early prototype of using prompting strategies to improve the LLM's reasoning capabilities through o1-like reasoning chains. This allows the LLM to "think" and solve logical problems that usually otherwise stump leading models. Unlike o1, all the reasoning tokens are shown, and the app uses an open source model.

g1 is experimental and being open sourced to help inspire the open source community to develop new strategies to produce o1-like reasoning. This experiment helps show the power of prompting reasoning in visualized steps, not a comparison to or full replication of o1, which uses different techniques. OpenAI's o1 is instead trained with large-scale reinforcement learning to reason using Chain of Thought, achieving state-of-the-art performance on complex PhD-level problems.

g1 demonstrates the potential of prompting alone to overcome straightforward LLM logic issues like the Strawberry problem, allowing existing open source models to benefit from dynamic reasoning chains and an improved interface for exploring them.

### How it works

g1 powered by Llama3.1-70b creates reasoning chains, in principle a dynamic Chain of Thought, that allows the LLM to "think" and solve some logical problems that usually otherwise stump leading models.

At each step, the LLM can choose to continue to another reasoning step, or provide a final answer. Each step is titled and visible to the user. The system prompt also includes tips for the LLM. There is a full explanation under Prompt Breakdown, but a few examples are asking the model to “include exploration of alternative answers” and “use at least 3 methods to derive the answer”.

The reasoning ability of the LLM is therefore improved through combining Chain-of-Thought with the requirement to try multiple methods, explore alternative answers, question previous draft solutions, and consider the LLM’s limitations. This alone, without any training, is sufficient to achieve ~70% accuracy on the Strawberry problem (n=10, "How many Rs are in strawberry?"). Without prompting, Llama-3.1-70b had 0% accuracy and ChatGPT-4o had 30% accuracy.

### Examples

> [!IMPORTANT]
> g1 is not perfect, but it can perform significantly better than LLMs out-of-the-box. From initial testing, g1 accurately solves simple logic problems 60-80% of the time that usually stump LLMs. However, accuracy has yet to be formally evaluated. See examples below.

##### How many Rs are in strawberry?

Prompt: How many Rs are in strawberry?

Result:

![Strawberry example](examples/strawberry.png)

---

Prompt: Which is larger, .9 or .11?

Result:

![0.9 or 0.11 example](examples/math.png)

### Quickstart

To use the Streamlit UI, follow these instructions:

~~~
python3 -m venv venv
~~~

~~~
source venv/bin/activate
~~~

~~~
pip3 install -r requirements.txt
~~~

~~~
export GROQ_API_KEY=gsk...
~~~

~~~
streamlit run app.py
~~~

---

Alternatively, follow these additional instructions to use the Gradio UI:

~~~
cd gradio
~~~

~~~
pip3 install -r requirements.txt
~~~

~~~
python3 app.py
~~~

### Prompting Strategy

The prompt is as follows:

```
You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.

Example of a valid JSON response:
json
{
"title": "Identifying Key Information",
"content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
"next_action": "continue"
}
```

#### Breakdown

First, a persona is added:

> You are an expert AI assistant that explains your reasoning step by step.

Then, instructions to describe the expected step-by-step reasoning process while titling each reasoning step. This includes the ability for the LLM to decide if another reasoning step is needed or if the final answer can be provided.

> For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer.

JSON formatting is introduced with an example provided later.

> Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys.

In all-caps to improve prompt compliance by emphasizing the importance of the instruction, a set of tips and best practices are included.

1. Use as many reasoning steps as possible. At least 3. -> This ensures the LLM actually takes the time to think first, and results usually in about 5-10 steps.
2. Be aware of your limitations as an llm and what you can and cannot do. -> This helps the LLM remember to use techniques which produce better results, like breaking "strawberry" down into individual letters before counting.
3. Include exploration of alternative answers. Consider you may be wrong, and if you are wrong in your reasoning, where it would be. -> A large part of the gains seem to come from the LLM re-evaluating its initial response to ensure it logically aligns with the problem.
4. When you say you are re-examining, actually re-examine, and use another approach to do so. Do not just say you are re-examining. -> This encourages the prevention of the LLM just saying it re-examined a problem without actually trying a new approach.
5. Use at least 3 methods to derive the answer. -> This helps the LLM come to the right answer by trying multiple methods to derive it.
6. Use best practices. -> This is as simple as the "Do better" prompts which improve LLM code output. By telling the LLM to use best practices, or do better, it generally performs better!

> USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.

Finally, after the problem is added as a user message, an assistant message is loaded to provide a standardized starting point for the LLM's generation.

> Assistant: Thank you! I will now think step by step following my instructions, starting at the beginning after decomposing the problem

### Top Forks

* Huggingface Spaces Demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/xylin/g1-demo)
* Mult1: Using multiple AI providers to create o1-like reasoning chains ([GitHub Repository](https://github.com/tcsenpai/multi1))
* thinkR: o1 like chain of thoughts with local LLMs in R ([GitHub Repository](https://github.com/eonurk/thinkR))

### Credits

This app was developed by [Benjamin Klieger](https://x.com/benjaminklieger).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bklieger-groq/g1

Awesome Lists containing this project

README