https://github.com/mapluisch/gpt-4-vision-for-hololens
Capture images with HoloLens and receive descriptive responses from OpenAI's GPT-4V(ision).
https://github.com/mapluisch/gpt-4-vision-for-hololens
gpt-4 gpt-4-vision gpt-4-vision-preview gpt4vision hololens hololens-applications hololens2 openai openai-api unity3d
Last synced: 3 months ago
JSON representation
Capture images with HoloLens and receive descriptive responses from OpenAI's GPT-4V(ision).
- Host: GitHub
- URL: https://github.com/mapluisch/gpt-4-vision-for-hololens
- Owner: mapluisch
- License: mit
- Created: 2023-11-18T10:08:09.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-29T15:06:38.000Z (over 1 year ago)
- Last Synced: 2025-04-08T06:30:25.475Z (7 months ago)
- Topics: gpt-4, gpt-4-vision, gpt-4-vision-preview, gpt4vision, hololens, hololens-applications, hololens2, openai, openai-api, unity3d
- Language: ShaderLab
- Homepage:
- Size: 107 MB
- Stars: 14
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
GPT-4 Vision for HoloLens
## Overview
This project demonstrates the integration of OpenAI's GPT-4 Vision API with a HoloLens application. Users can capture images using the HoloLens camera and receive descriptive responses from the GPT-4V model.
### Demo
https://github.com/mapluisch/GPT-4-Vision-for-HoloLens/assets/31780571/03260bce-97c2-481d-b0e8-6c04e4cf496d
#### Screenshot of demo result
'A laptop displaying a webpage with the header "Let's build from here" is placed next to a spiral notebook and a pen on a dark surface.'
## Dependencies
- Newtonsoft.JSON
- MRTK Foundation
- MRTK Standard Assets
## Setup
1. Open the `GPT4 Vision Example`-Scene
2. Specify your OpenAI key in the GameObject `GPT4Vision` > `OpenAIWrapper` (or hardcode it into the OpenAIWrapper.cs class)
3. Specify your base prompt (which is concatenated to the image sent to OpenAI), e.g. Describe this image.
4. Specify max tokens, sampling temperature, and image detail for the OpenAI API call
### Running the application
1. Build the app as `.appx` (or deploy to HoloLens directly, e.g. via Visual Studio) and install it on your HoloLens
2. Run the app. Press on the camera button to capture a photo using HoloLens' PV camera which gets send to OpenAI's API.
3. See the inference result (based on your prompt) displayed on the label.
### Using the .unitypackage
1. Make sure you have the dependencies from above installed.
2. Import the package via `Assets > Import Package`.
3. Either open up the `GPT4 Vision Example`-Scene, or import the `GPT4Vision`-Prefab into your own scene.
4. Edit the base prompt, tokens, temperature, image detail as described above.
5. Optional: call `CapturePhoto()` within the `GPT4Vision`-Prefab (in case you do not want to use the button and label within the Prefab).
## Performance improvements
For some reason, the built-in `UnityEngine.Windows.WebCam` approach provided by Microsoft is really slow (~1.2s per captured photo on average, regardless of resolution). Also, inference speed on OpenAI's server can vary quite a bit. If you need this approach in real-time, skip `PhotoCapture` altogether (Research Mode) and think about hosting your own LMM. Feel free to message me if you need some pointers.
## Disclaimer
This project is a barebones prototype for now and still WIP. Feel free to create a PR.