https://github.com/gptscript-ai/gpt4-v-vision
https://github.com/gptscript-ai/gpt4-v-vision
Last synced: 12 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/gptscript-ai/gpt4-v-vision
- Owner: gptscript-ai
- Created: 2024-02-21T23:44:17.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-11T14:07:14.000Z (over 1 year ago)
- Last Synced: 2025-06-28T23:42:48.322Z (12 months ago)
- Language: JavaScript
- Size: 1.81 MB
- Stars: 9
- Watchers: 4
- Forks: 5
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# gpt4-v-vision
`gpt4-v-vision` is a simple OpenAI CLI and GPTScript Tool for interacting with vision models.
## Prerequisites
- NodeJS
- OpenAI API key
## Usage
Import `vision` into any `.gpt` script by referencing this GitHub repo.
```yaml
Tools: github.com/gptscript-ai/gpt4-v-vision
Describe the images at the following locations:
- examples/eiffel-tower.png
- https://avatars.githubusercontent.com/u/158112119?s=400&u=d2c6ae055a80ced8209f4aab2562986a97d79e9f&v=4
```
You will be prompted to enter your OpenAI API key if you have not provided it before.
## Testing Changes
1. Clone this repository or download the source code:
```bash
git clone git@github.com:gptscript-ai/gpt4-v-vision.git
cd gpt4-v-vision
```
2. Install the `npm` dependencies
```bash
npm install
```
3. Import the local `tools.gpt` file to test local changes
Here's a simple example:
```yaml
# The tool script import path is relative to the directory of the script importing it; in this case ./examples
Tools: ../tool.gpt
Description: This script is used to test local changes to the vision tool by invoking it with a simple prompt and image references.
Describe the images at the following locations:
- examples/eiffel-tower.png
- https://avatars.githubusercontent.com/u/158112119?s=400&u=d2c6ae055a80ced8209f4aab2562986a97d79e9f&v=4
```
It can be run from the root directory of this repo
```sh
# Disable response caching to ensure the tool is always called for testing purposes
gptscript --disable-cache examples/test.gpt
```
## Running the CLI
```console
$ node index.js --help
Usage: index [options]
Utility for processing images with the OpenAI API
Arguments:
prompt Prompt to send to the vision model
images List of image URIs to process. Supports file:// and https:// protocols. Images must be jpeg or png.
Options:
--openai-api-key OpenAI API Key (env: OPENAI_API_KEY)
--openai-base-url OpenAI base URL (env: OPENAI_BASE_URL)
--openai-org-id OpenAI Org ID to use (env: OPENAI_ORG_ID)
--max-tokens Max tokens to use (default: 2048, env: MAX_TOKENS)
--model Model to process images with (choices: "gpt-4o", "gpt-4-turbo", default: "gpt-4o", env: MODEL)
--detail Fidelity to use when processing images (choices: "low", "high", "auto", default: "auto", env: DETAIL)
-h, --help display help for command
```
### Ask a question about an image in a local file
```bash
node index.js 'Describe the picture' 'file://examples/eiffel-tower.png'
```
### Ask a question about an image at a remote URL
```bash
node index.js 'Describe the picture' 'https://github.com/gptscript-ai/vision/blob/main/examples/eiffel-tower.png?raw=true'
```
### Ask a question related to multiple images
```bash
node index.js 'Do you think these two portraits are by the same artist?' 'https://github.com/gptscript-ai/vision/blob/main/examples/eiffel-tower.png?raw=true' 'file://examples/eiffel-tower.png'
```