Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-llm-strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
https://github.com/hijkzzz/awesome-llm-strawberry
Last synced: 5 days ago
JSON representation
-
OpenAI Docs
-
Blogs
-
Talks
-
Twitter
-
Papers
-
Relevant Paper from OpenAI o1 [contributors](https://openai.com/openai-o1-contributions/)
- Training Verifiers to Solve Math Word Problems
- Generative Language Modeling for Automated Theorem Proving
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Let's Verify Step by Step
- Training Verifiers to Solve Math Word Problems
- Generative Language Modeling for Automated Theorem Proving
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Let's Verify Step by Step
- LLM Critics Help Catch LLM Bugs
- Self-critiquing models for assisting human evaluators
- Scalable Online Planning via Reinforcement Learning Fine-Tuning
- LLM Critics Help Catch LLM Bugs
- Self-critiquing models for assisting human evaluators
- Scalable Online Planning via Reinforcement Learning Fine-Tuning
- MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
- Improving Policies via Search in Cooperative Partially Observable Games
- Deliberative alignment: reasoning enables safer language models
-
2024
- Planning In Natural Language Improves LLM Search For Code Generation
- Training Language Models to Self-Correct via Reinforcement Learning
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
- An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
- Planning In Natural Language Improves LLM Search For Code Generation
- Training Language Models to Self-Correct via Reinforcement Learning
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
- An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
- Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
- Generative Verifiers: Reward Modeling as Next-Token Prediction
- Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
- Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
- Generative Verifiers: Reward Modeling as Next-Token Prediction
- Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
- Improve Mathematical Reasoning in Language Models by Automated Process Supervision
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
- Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
- Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
- Self-Rewarding Language Models
- Self-Rewarding Language Models
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
- Advancing LLM Reasoning Generalists with Preference Trees
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
- Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
- Advancing LLM Reasoning Generalists with Preference Trees
- Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
- AlphaMath Almost Zero: Process Supervision Without Process
- AlphaMath Almost Zero: Process Supervision Without Process
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
- Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
- ReFT: Reasoning with Reinforced Fine-Tuning
- Chain-of-Thought Reasoning Without Prompting
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
- MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
- Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
- ReFT: Reasoning with Reinforced Fine-Tuning
- Chain-of-Thought Reasoning Without Prompting
- Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
- Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
- V-STaR: Training Verifiers for Self-Taught Reasoners
- Do Large Language Models Latently Perform Multi-Hop Reasoning?
- VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
- Stream of Search (SoS): Learning to Search in Language
- GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
- Not All LLM Reasoners Are Created Equal
- LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
- Thinking LLMs: General Instruction Following with Thought Generation
- CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks
- RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
- On Designing Effective RL Reward at Training Time for LLM Reasoning
- Does RLHF Scale? Exploring the Impacts From Data, Model, and Method
- Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
- Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning Through Trap Problems
- Mixture-of-Agents Enhances Large Language Model Capabilities
- When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
- Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
- Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
- Evaluating LLMs at Detecting Errors in LLM Responses
- Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
- MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes
- Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
- AFlow: Automating Agentic Workflow Generation
- Interpretable Contrastive Monte Carlo Tree Search Reasoning
- Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems
-
2023
- Training Chain-of-Thought via Latent-Variable Inference
- Training Chain-of-Thought via Latent-Variable Inference
- Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
- Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
- Reasoning with Language Model is Planning with World Model
- Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
- Certified reasoning with language models
- Reasoning with Language Model is Planning with World Model
- Don’t throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
- Certified reasoning with language models
- OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
- Large Language Models Cannot Self-Correct Reasoning Yet
-
2022
- Chain of Thought Imitation with Procedure Cloning
- Chain of Thought Imitation with Procedure Cloning
- STaR: Bootstrapping Reasoning With Reasoning
- Solving math word problems with processand outcome-based feedback
- STaR: Bootstrapping Reasoning With Reasoning
- Solving math word problems with processand outcome-based feedback
-
2021
-
2017
-
2025
-
-
Open-source
-
Projects
-
2017
-
-
Evaluation
-
2017
-
-
News
Programming Languages
Categories
Sub Categories