https://github.com/tkellogg/lrm-reasoning
LRM answers to "if you're flying over a desert in a canoe and your wheels fall off, how many pancakes does it take to cover a dog house?"
https://github.com/tkellogg/lrm-reasoning
Last synced: 3 months ago
JSON representation
LRM answers to "if you're flying over a desert in a canoe and your wheels fall off, how many pancakes does it take to cover a dog house?"
- Host: GitHub
- URL: https://github.com/tkellogg/lrm-reasoning
- Owner: tkellogg
- Created: 2024-11-24T23:34:36.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-03T13:46:55.000Z (over 1 year ago)
- Last Synced: 2025-01-03T14:43:31.757Z (over 1 year ago)
- Size: 32.2 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LRMs vs. The Absurdist Dog House
My family has long had this absurd riddle:
> If you're flying over a desert in a canoe and your wheels fall off, how many pancakes does it take to cover a dog house?
The answer is also absurd, "8 because lemons don't dance".
Initially, I sent this to [DeepSeek R1](deepseek-r1.md) and was blown away by how
extensive it's thought trace was. Also, how unconfident it was during the trace, but
how it flipped into a different mode, exuding confidence, when giving it's final answer.
I've been using this question as a test whenever I see a new LRM (Large Reasoning Model,
an LLM that does chain-of-thought reasoning before answering. It's not quantifiable as
a benchmark, but it gives a feel for how the model behaves.
Click through the files in this repo. Each of them are different models, with the exception
of [7yo-nephew.md](7yo-nephew.md), which is my 7 year old nephew. Also notable, [phi4](phi4.md)
is a regular LLM but seems to exhibit LRM behavior. And [SuperPrompt.gemini](SuperPrompt.gemini.md)
is regular Gemini but prompted to do chain-of-thought reasoning.