Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/EGCap/awesome-gpt4-vision

A collection of awesome GPT4 vision use cases
https://github.com/EGCap/awesome-gpt4-vision

List: awesome-gpt4-vision

Last synced: 3 months ago
JSON representation

A collection of awesome GPT4 vision use cases

Awesome Lists containing this project

README

        

# Awesome GPT4 Vision [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

A collection of awesome GPT4 vision use cases.

- [Animal Classification](https://twitter.com/Helghardt/status/1709485359887794486)
- [Web design agent](https://twitter.com/mattshumer_/status/1707480439793840402): writes code, looks at the resulting site, improves the code accordingly, repeat.
- [Image to Replit website](https://twitter.com/skirano/status/1706823089487491469)
- [Picture to Lightroom settings](https://twitter.com/skirano/status/1709333011135681012)
- [Figma to HTML](https://twitter.com/benhylak/status/1709584171398529065)
- [Image to JSON](https://twitter.com/mckaywrigley/status/1708557028149673990): turn an image of groceries into a JSON list of objects
- [Mockup feedback](https://twitter.com/ammaar/status/1709430616524259445)
- [Reference pop culture and movies](https://twitter.com/petergyang/status/1707125696550858784)
- [Explain humor and memes](https://twitter.com/rcweston/status/1706893312588746943)
- [Understand diagrams](https://twitter.com/youraimarketer/status/1706461715078975778)
- [Understand complex powerpoint slides](https://twitter.com/seanspriggens/status/1706785470862995934)
- [Name new architectural styles](https://twitter.com/skirano/status/1707130007599116289)
- [Screenshot of website to code](https://twitter.com/mckaywrigley/status/1707047423863136687)
- [Solve math and physics problems](https://twitter.com/skirano/status/1707468861929381959/photo/1)
- [Solve chess puzzles](https://twitter.com/skirano/status/1706858014110826562)
- [Give interior design feedback](https://twitter.com/skirano/status/1707466657176637709)
- [Create recipe from image of food dish](https://twitter.com/DeeperThrill/status/1707510560814662093)
- [Find Waldo](https://twitter.com/skirano/status/1707591973572387223)
- [Whiteboard logic to code](https://twitter.com/mckaywrigley/status/1707101465922453701)
- [Decipher handwriting](https://twitter.com/emollick/status/1707076651320770870)
- [Translate between languages](https://twitter.com/emollick/status/1707077645530177775)
- [Understand parking signs](https://twitter.com/petergyang/status/1707169696049668472)
- [Sketch to logo using DALL•E 3](https://twitter.com/dr_cintas/status/1708917098817175857)
- [Photography feedback](https://twitter.com/emollick/status/1707634298507956569)
- [Stock price trajectory analysis](https://twitter.com/saana_ai/status/1707582774679576585)
- [Generate workout routines given image of workout equipment](https://twitter.com/ABeanSits/status/1709801622854148398)
- [Analyze X-rays](https://twitter.com/Saboo_Shubham_/status/1710171316819476872)
- [Estimate calories of food](https://twitter.com/Scobleizer/status/1710067078265188614)
- [Classify physical locations and landmarks](https://twitter.com/_seanliu/status/1709371559704498184)
- [Create stories from movie stills](https://twitter.com/emollick/status/1708908960948851038)
- [Decide how to play video games](https://twitter.com/emollick/status/1709382225748226191)

From the [The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) paper](https://arxiv.org/abs/2309.17421?fbclid=IwAR0-lMZOw1sl444ryLVJ1bKxGBGlXKNo_8oaboZS7uYT37sGKMNWzGYEQdE):

- [Analyze receipts](data/2309.17421-gpt4vision/receipt.png)
- [Calculate order total price](data/2309.17421-gpt4vision/calc-prices.png)
- [Celebrity Recognition](/data/2309.17421-gpt4vision/celebrity-recognition.png)
- [landmark recognition](/data/2309.17421-gpt4vision/image-description-landmarks.png)
- [General text prompts](/data/2309.17421-gpt4vision/general-text-following.png)
- [Smart prompting](/data/2309.17421-gpt4vision/general-text-following-conditioning.png)
- [Extract driver's license data](/data/2309.17421-gpt4vision/license.png)
- [Object localization](/data/2309.17421-gpt4vision/object-localization.png)
- [Count things](/data/2309.17421-gpt4vision/object-counting.png)
- [Bounding boxes / object localization](/data/2309.17421-gpt4vision/bounding-boxes.png)
- [Jokes and memes](/data/2309.17421-gpt4vision/meme-jokes.png)
- [Science and knowledge](/data/2309.17421-gpt4vision/science-knowledge.png)
- [Text understanding](/data/2309.17421-gpt4vision/scene-text-ocr.png)
- [Scene understanding](/data/2309.17421-gpt4vision/scene-understanding.png)
- [Visual math understanding](/data/2309.17421-gpt4vision/visual-math.png)
- [Flow chart understanding](/data/2309.17421-gpt4vision/diagrams-flowcharts-2.png)
- [Chart understanding](/data/2309.17421-gpt4vision/chart-understanding.png)
- [Table to code](/data/2309.17421-gpt4vision/table-to-code.png)
- [Table reasoning](/data/2309.17421-gpt4vision/table-understanding.png)
- [Document understanding: Floor plan, posters, diagrams](/data/2309.17421-gpt4vision/document-understanding.png)
- [Understand research papers and diagrams](/data/2309.17421-gpt4vision/paper-understanding.png)
- [Multilingual culture descriptions](/data/2309.17421-gpt4vision/multilingual-culture.png)
- [Multilingual image descriptions](/data/2309.17421-gpt4vision/multilingual-image-descriptions.png)
- [Multilingual text](/data/2309.17421-gpt4vision/multilingual-text.png)
- [Transcribe math to LaTeX](/data/2309.17421-gpt4vision/latex-code.png)
- [Generate code to draw graphics](/data/2309.17421-gpt4vision/graphics-code.png)
- [Visual prompting + text](/data/2309.17421-gpt4vision/visual-text-prompting.png)
- [Visual prompting](/data/2309.17421-gpt4vision/visual-prompting.png)
- [Understand pointing inputs](/data/2309.17421-gpt4vision/coord-pointing-inputs.png)
- [Generate pointing outputs](/data/2309.17421-gpt4vision/generate-pointing-outputs.png)
- [Understanding abstract visual stimuli](data/2309.17421-gpt4vision/abstract-visual-stimuli.png)
- [Discover and associate parts and objects](/data/2309.17421-gpt4vision/association-parts.png)
- [Read emotions from facial expressions](/data/2309.17421-gpt4vision/emotion-faces.png)
- [Emotional effects of images](/data/2309.17421-gpt4vision/emotional-effects.png)
- [Emotional conditioning on images](/data/2309.17421-gpt4vision/emotion-conditioned.png)
- [Few shot prompting](/data/2309.17421-gpt4vision/few-shot-2.png)
- [Food recognition](/data/2309.17421-gpt4vision/food-recognition.png)
- [Spot the difference](/data/2309.17421-gpt4vision/spot-diff.png)
- [Defect detection](/data/2309.17421-gpt4vision/defect-analysis.png)
- [Safety inspection](/data/2309.17421-gpt4vision/safety-inspection.png)
- [Grocery checkout](/data/2309.17421-gpt4vision/grocery-checkout.png)
- [Medical image description](/data/2309.17421-gpt4vision/medical-description.png)
- [Medical reports / diagnosis](/data/2309.17421-gpt4vision/medical-report-diag.png)
- [Insurance damage evaluation](/data/2309.17421-gpt4vision/insurance-damage.png)
- [Insurance report generation](/data/2309.17421-gpt4vision/insurance-report.png)
- [Customized image captioning: use set of people](/data/2309.17421-gpt4vision/customized-captioner.png)
- [Customized image captioning: use set of objects](/data/2309.17421-gpt4vision/customized-captioning.png)
- [Image counterfactuals: disagree with the false prompt](/data/2309.17421-gpt4vision/image-counterfactuals.png)
- [Logo recognition](/data/2309.17421-gpt4vision/logo-recognition.png)
- [Image complex logo and brands](/data/2309.17421-gpt4vision/image-description-complex-logos.png)
- [Dense captioning](/data/2309.17421-gpt4vision/dense-captioning.png)
- [Image generation: editing](/data/2309.17421-gpt4vision/image-generation-editing.png)
- [Image generation: evaluation](/data/2309.17421-gpt4vision/image-generation-eval.png)
- [Image generation: prompts](/data/2309.17421-gpt4vision/image-generation-prompts.png)
- [Image sequences](/data/2309.17421-gpt4vision/image-sequences.png)
- [Agentic actions: Use coffee machines](/data/2309.17421-gpt4vision/agent-actions.png)
- [Navigate around a house as a robot](/data/2309.17421-gpt4vision/robot-agent.png)
- [Browse the web](/data/2309.17421-gpt4vision/gui-navigation-2.png)
- [Shop on Amazon](/data/2309.17421-gpt4vision/gui-navigation-14.png)
- [Use windows OS](/data/2309.17421-gpt4vision/gui-navigation.png)
- [Watch TikToks](/data/2309.17421-gpt4vision/watch-tiktoks.png)
- [Use plugins](/data/2309.17421-gpt4vision/agents-plugins.png)
- [Multimodal chains](/data/2309.17421-gpt4vision/multimodal-chains.png)
- [Multimodal commonsense](/data/2309.17421-gpt4vision/multimodal-commonsense.png)
- [Raven's progressive matrices](/data/2309.17421-gpt4vision/ravens-progressive-matrices.png)
- [Self reflection for image generation](/data/2309.17421-gpt4vision/self-reflection-image-gen.png)
- [Self reflection for coding](/data/2309.17421-gpt4vision/self-reflection.png)
- [Self consistency and voting](/data/2309.17421-gpt4vision/self-consistency-voting.png)
- [Video: anticipate the next actions](/data/2309.17421-gpt4vision/video-anticipation.png)
- [Video: localization reasoning](/data/2309.17421-gpt4vision/video-localization-reasoning.png)
- [Video: order the steps](/data/2309.17421-gpt4vision/video-ordering.png)
- [Video: visual prompting](/data/2309.17421-gpt4vision/video-visual-prompting.png)
- [Wechsler Adult Intelligence Scale](/data/2309.17421-gpt4vision/wechsler-adult-intelligence-scale.png)