https://github.com/lennart-finke/picturebooks
Which objects are visible through the holes in a picture book? This visual task is easy for adults, doable for primary schoolers, but hard for vision transformers.
https://github.com/lennart-finke/picturebooks
evals inspect vision-transformer
Last synced: 19 days ago
JSON representation
Which objects are visible through the holes in a picture book? This visual task is easy for adults, doable for primary schoolers, but hard for vision transformers.
- Host: GitHub
- URL: https://github.com/lennart-finke/picturebooks
- Owner: lennart-finke
- License: mit
- Created: 2024-07-25T11:06:16.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-26T11:00:14.000Z (over 1 year ago)
- Last Synced: 2024-07-26T15:42:41.230Z (over 1 year ago)
- Topics: evals, inspect, vision-transformer
- Language: Jupyter Notebook
- Homepage:
- Size: 2.08 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Picturebooks
Which objects are visible through the holes in a picture book? This visual task is easy for adults, pretty doable for primary school children, but perhaps hard for current (July 2024) vision transformers.
The task in question is closely adapted from a problem for grades 3 and 4 of the international Kangoroo math event. It goes like this: "In the following image, you can see an open picture book. There are holes in its cover. On the first page, there are pictures of objects. Which pictures are visible when the book is closed?" We also give a list of the images to be chosen from, so in this case `"Avocado", "Ginger", "Kiwi", "Watermelon"`.

And the correct answer here is "Watermelon, Ginger" (in any order). Let's see what GPT-4o has to say:
> To determine which pictures are visible when the book is closed, follow these steps: 1. Identify the position of the hole on the left side based on the grid layout. 2. Observe how this corresponds to the position of the objects on the right side. 3. Note which objects on the right side would align with the hole when the book is closed. Based on the grid in the image: - The hole on the left side spans the coordinates (1,2) to (3,6). - When the book is closed, the objects that align with this hole are in the corresponding positions on the right side, which are avocado, watermelon, and ginger. ANSWER: avocado, watermelon, ginger
Apparently, the problem was understood, and even the steps towards a solution make sense. But the spatial computation is not quite there. This repository contains code to generate more instances of the above problem, and as the scores below indicate, this solution attempt indicated a pattern across all those.
### Scores
With a sample size of `n=50`, we obtain
| | GPT-4o-mini | GPT-4o-mini + Chain of Thought | GPT-4o | **GPT-4o + Chain of Thought** | Random Baseline |
| - | ----------- | -------------------------- | ------ | ------------------------- | -------------------|
| Accuracy | 0.02 | 0.04 | 0.08 | 0.12 | 0.04 |
| Jaccard | 0.44 | 0.43 | 0.38 | 0.45 | 0.2 |
### How to run
This eval is implemented in the UK AI Safety Institute's [Inspect](https://inspect.ai-safety-institute.org.uk) framework. After installing that, you can do a test run with `inspect eval picturebooks --model openai/gpt-4o-mini --limit 1` in the main directory and remove the `limit` parameter to go through all samples.
### Heritage
Many thanks to the Kangoroo math competition team for the task idea. You can read an interview with mathematician Meike Akveld, in which the problem is featured, [here](https://johnian.joh.cam.ac.uk/news/career-spotlight-meike-akveld/).
### License
Code and images are the author's, and licensed under MIT.