https://github.com/tkellogg/lrm-reasoning

LRM answers to "if you're flying over a desert in a canoe and your wheels fall off, how many pancakes does it take to cover a dog house?"
https://github.com/tkellogg/lrm-reasoning

Last synced: 3 months ago
JSON representation

LRM answers to "if you're flying over a desert in a canoe and your wheels fall off, how many pancakes does it take to cover a dog house?"

Host: GitHub
URL: https://github.com/tkellogg/lrm-reasoning
Owner: tkellogg
Created: 2024-11-24T23:34:36.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-03T13:46:55.000Z (over 1 year ago)
Last Synced: 2025-01-03T14:43:31.757Z (over 1 year ago)
Size: 32.2 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# LRMs vs. The Absurdist Dog House

My family has long had this absurd riddle:

> If you're flying over a desert in a canoe and your wheels fall off, how many pancakes does it take to cover a dog house?

The answer is also absurd, "8 because lemons don't dance".

Initially, I sent this to [DeepSeek R1](deepseek-r1.md) and was blown away by how
extensive it's thought trace was. Also, how unconfident it was during the trace, but
how it flipped into a different mode, exuding confidence, when giving it's final answer.

I've been using this question as a test whenever I see a new LRM (Large Reasoning Model,
an LLM that does chain-of-thought reasoning before answering. It's not quantifiable as
a benchmark, but it gives a feel for how the model behaves.

Click through the files in this repo. Each of them are different models, with the exception
of [7yo-nephew.md](7yo-nephew.md), which is my 7 year old nephew. Also notable, [phi4](phi4.md)
is a regular LLM but seems to exhibit LRM behavior. And [SuperPrompt.gemini](SuperPrompt.gemini.md)
is regular Gemini but prompted to do chain-of-thought reasoning.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tkellogg/lrm-reasoning

Awesome Lists containing this project

README