Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arslanstack/gemini-vision-pro-implementation
Gemini Vision Pro API with Multimodal Prompts in JavaScript (Node.js & Express.js)
https://github.com/arslanstack/gemini-vision-pro-implementation
gemini gemini-api gemini-pro-api gemini-pro-vision gemini-vision-pro javascri nodejs rest-api
Last synced: about 2 months ago
JSON representation
Gemini Vision Pro API with Multimodal Prompts in JavaScript (Node.js & Express.js)
- Host: GitHub
- URL: https://github.com/arslanstack/gemini-vision-pro-implementation
- Owner: arslanstack
- Created: 2024-01-04T16:22:14.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-04T18:29:50.000Z (about 1 year ago)
- Last Synced: 2024-01-05T18:31:04.986Z (about 1 year ago)
- Topics: gemini, gemini-api, gemini-pro-api, gemini-pro-vision, gemini-vision-pro, javascri, nodejs, rest-api
- Language: JavaScript
- Homepage: https://arslanstack.com/
- Size: 230 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Gemini Vision Pro API with Multimodal Prompts Integration with JavaScript (Node.js & Express.js)
This project implements the Gemini Pro Vision LLM (Google Generative AI) library to process text and images together, producing relevant text responses. The Gemini Pro Vision model excels at multimodal tasks, including visual understanding, classification, summarization, and content creation from images and videos.
![image](./postman.png)
## About Gemini Vision Pro
Gemini Pro Vision is a versatile large language vision model that interprets input from text and visual modalities (images and videos) to generate contextually relevant text responses. It serves as a foundational model capable of performing well across various multimodal tasks, such as visual understanding, object identification, content extraction from images, and much more. Its applications extend to processing visual and text inputs from photographs, documents, infographics, screenshots, and more.
## Use Cases
- **Visual Information Seeking:** Utilize external knowledge combined with information extracted from the input image or video to answer questions.
- **Object Recognition:** Answer questions related to fine-grained identification of objects in images and videos.
- **Digital Content Understanding:** Answer questions and extract information from visual content like infographics, charts, figures, tables, and web pages.
- **Structured Content Generation:** Generate responses based on multimodal inputs in formats like HTML and JSON.
- **Captioning and Description:** Generate descriptions of images and videos with varying levels of details.
- **Reasoning:** Compositionally infer new information without memorization or retrieval.## Installation
1. Clone the repository
2. Install the dependencies
```sh
npm install
```## Usage
1. Add your Google API key to the `.env` file
```env
GOOGLE_API_KEY=your_google_api_key
```
2. Run the script with Node.js
```sh
node index.js
```
3. Or use API in Postman
```sh
npm start
```## Functionality
The script uses the Google Generative AI library to generate content based on a template and an image. The `model.generateContent` function is used to generate the content. It takes an array as an argument, which includes the template and the image data. The generated content is then logged to the console or sent in API response.## Snapshots
![image](./postman.png)
![image](./terminal.png)