Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ahwang16/grounded-intuition-gpt-vision
Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images
https://github.com/ahwang16/grounded-intuition-gpt-vision
cv gpt-4 grounded-theory hci images llms nlp qualitative-analysis thematic-analysis vision-language
Last synced: 2 months ago
JSON representation
Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images
- Host: GitHub
- URL: https://github.com/ahwang16/grounded-intuition-gpt-vision
- Owner: ahwang16
- Created: 2023-11-03T06:55:11.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2023-11-11T01:46:38.000Z (about 1 year ago)
- Last Synced: 2024-10-11T19:13:16.116Z (3 months ago)
- Topics: cv, gpt-4, grounded-theory, hci, images, llms, nlp, qualitative-analysis, thematic-analysis, vision-language
- Language: Jupyter Notebook
- Homepage:
- Size: 8.58 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![Grounded Intuition of GPT-Vision's Abilities with Scientific Images](grounded_intuition_github.png)
## Overview
This is the GitHub repository for my recent article, [Grounded Intuition of GPT-Vision's Abilities with Scientific Images](https://arxiv.org/abs/2311.02069).**~Coming soon: Colab notebook for running GPT-Vision on the API.~ Now available!**
This paper contributes:
- an in-depth qualitative analysis of GPT-Vision's generations of images from scientific papers,
- a formalized procedure for qualitative analysis based on grounded theory and thematic analysis in social science/HCI literature, and
- our images and generated passages for further research and reproducibility.We used two prompts to generate passages for each image:
- Write alt text to describe this \.
- Describe this \ as though you are speaking with someone who cannot see it.We replaced \ with "figure" (photos, diagrams, graphs, tables), "page" (full page), or "image" (code, math) depending on the image type.
The images can be found in the `images` directory. Each file is named with the following convention:
```
__.png
```with decimals in image IDs replaced by hyphens. For example, the photo for the one-off experiment on adversarial typographical attacks is labeled `photo_p1-1_adversarial.png`.
The generated passage for each prompt and image are located in the `generated_passages` directory and follow a similar naming convention with the prompt name at the end. The prompts for `photo_p1-1_adversarial.png` can be found in `photo_p1-1_adversarial_alt.png` and `photo_p1-1_adversarial_desc.png`.
## We're on the news!
- As OpenAI's Multimodal API Launches Broadly, Research Shows It's Still Flawed, [TechCrunch](https://techcrunch.com/2023/11/06/openai-gpt-4-with-vision-release-research-flaws/)
- ChatGPT-Maker OpenAI Hosts its First Big Tech Showcase as the AI Startup Faces Growing Competition, [Associated Press](https://apnews.com/article/chatgpt-openai-tech-showcase-da850be425aaa269e2915e9e0b1c726a)## Suggested citation
If you would like to cite the paper or repository, you can use
```
@misc{hwang_grounded_2023,
title={Grounded Intuition of GPT-Vision's Abilities with Scientific Images},
author={Alyssa Hwang and Andrew Head and Chris Callison-Burch},
year={2023},
eprint={2311.02069},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```