Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jackfsuia/open-m1

A roadmap to reproduce OpenAI o1, Combining PRM and RLHF to train the model to learn infinite CoT (Chain of Thought).
https://github.com/jackfsuia/open-m1

chatgpt llm openai-o1 reinforcement-learning strawberry

Last synced: 19 days ago
JSON representation

A roadmap to reproduce OpenAI o1, Combining PRM and RLHF to train the model to learn infinite CoT (Chain of Thought).

Host: GitHub
URL: https://github.com/jackfsuia/open-m1
Owner: jackfsuia
License: mit
Created: 2024-10-12T03:32:51.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-10-27T16:32:02.000Z (2 months ago)
Last Synced: 2024-10-27T19:33:49.019Z (2 months ago)
Topics: chatgpt, llm, openai-o1, reinforcement-learning, strawberry
Homepage:
Size: 8.79 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# open-m1
A roadmap to reproduce OpenAI o1.
# A roadmap
**Step 1: A good base model.** Find a good math base model M0. For example, deepseek-math-base. The larger it is, the more likely for its emergence of self-backtrack and self-refine abilitis.

**Step 2: Format training.** Finetune M0 to produce solutions in a newline delimited step-by-step format [1] for PRM training. The result is model M1.

**Step 3: PRM reinforcement learning.** Now model M1 can produce formatted CoT. Train a PRM like [1]. Use it to do the reinforcement learning for M1, of which object is maximizing the expectation of step rewards. Then this model is used to train PRM again... run as many iterations as possible. The result is model M2.

**Step 4: Inference.** During inference of model M2, if time is out, the program adds a "\n Therefore the answer is " to force it to produce the final answer [2]. Then one another model will produce a summary for this whole output.

# Reference
[1] Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever. Let’s Verify Step by Step.

[2] Nat McAleese, Rai Michael Pokorny, Juan Felipe Ceron Uribe, Evgenia Nitishinskaya, Maja Trebacz, Jan Leike. LLM Critics Help Catch LLM Bugs.