https://github.com/ahwang16/grounded-intuition-gpt-vision

Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images
https://github.com/ahwang16/grounded-intuition-gpt-vision

cv gpt-4 grounded-theory hci images llms nlp qualitative-analysis thematic-analysis vision-language

Last synced: 2 months ago
JSON representation

Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images

Host: GitHub
URL: https://github.com/ahwang16/grounded-intuition-gpt-vision
Owner: ahwang16
Created: 2023-11-03T06:55:11.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2023-11-11T01:46:38.000Z (over 1 year ago)
Last Synced: 2025-02-28T22:35:39.318Z (3 months ago)
Topics: cv, gpt-4, grounded-theory, hci, images, llms, nlp, qualitative-analysis, thematic-analysis, vision-language
Language: Jupyter Notebook
Homepage:
Size: 8.58 MB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

![Grounded Intuition of GPT-Vision's Abilities with Scientific Images](grounded_intuition_github.png)

## Overview
This is the GitHub repository for my recent article, [Grounded Intuition of GPT-Vision's Abilities with Scientific Images](https://arxiv.org/abs/2311.02069).

**~Coming soon: Colab notebook for running GPT-Vision on the API.~ Now available!**

This paper contributes:

- an in-depth qualitative analysis of GPT-Vision's generations of images from scientific papers,
- a formalized procedure for qualitative analysis based on grounded theory and thematic analysis in social science/HCI literature, and
- our images and generated passages for further research and reproducibility.

We used two prompts to generate passages for each image:

- Write alt text to describe this \.
- Describe this \ as though you are speaking with someone who cannot see it.

We replaced \ with "figure" (photos, diagrams, graphs, tables), "page" (full page), or "image" (code, math) depending on the image type.

The images can be found in the `images` directory. Each file is named with the following convention:

```
__.png
```

with decimals in image IDs replaced by hyphens. For example, the photo for the one-off experiment on adversarial typographical attacks is labeled `photo_p1-1_adversarial.png`.

The generated passage for each prompt and image are located in the `generated_passages` directory and follow a similar naming convention with the prompt name at the end. The prompts for `photo_p1-1_adversarial.png` can be found in `photo_p1-1_adversarial_alt.png` and `photo_p1-1_adversarial_desc.png`.

## We're on the news!

- As OpenAI's Multimodal API Launches Broadly, Research Shows It's Still Flawed, [TechCrunch](https://techcrunch.com/2023/11/06/openai-gpt-4-with-vision-release-research-flaws/)
- ChatGPT-Maker OpenAI Hosts its First Big Tech Showcase as the AI Startup Faces Growing Competition, [Associated Press](https://apnews.com/article/chatgpt-openai-tech-showcase-da850be425aaa269e2915e9e0b1c726a)

## Suggested citation

If you would like to cite the paper or repository, you can use

```
@misc{hwang_grounded_2023,
title={Grounded Intuition of GPT-Vision's Abilities with Scientific Images},
author={Alyssa Hwang and Andrew Head and Chris Callison-Burch},
year={2023},
eprint={2311.02069},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ahwang16/grounded-intuition-gpt-vision

Awesome Lists containing this project

README