Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/RMNCLDYO/gemini-ai-toolkit

Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.
https://github.com/RMNCLDYO/gemini-ai-toolkit

artificial-intelligence audio-transcribing chatbot conversational-ai gemini gemini-advanced gemini-api gemini-flash gemini-pro gemini-pro-1-5-experimental gemini-pro-api gemini-pro-flash gemini-pro-vision google google-api google-deepmind google-gemini image-analysis text-processing video-processing

Last synced: about 2 months ago
JSON representation

Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.

Awesome Lists containing this project

README

        



Google Gemini AI



Gemini AI Toolkit


maintained - yes
contributions - welcome






Google Gemini AI


> [!NOTE]
> *This toolkit supports Google's newest Gemini 1.5 Pro and 1.5 Flash stable & experimental models (as of October 3, 2024)*

The Gemini AI Toolkit is the easiest way for developers to build with Google's Gemini AI models. It offers seamless integration for chat, text generation, and multimodal interactions, allowing you to process and analyze text, images, audio, video, code, and more—all in one comprehensive package with minimal dependencies.

## 🚀 Features

- **Multimodal Interaction**: Effortlessly process and analyze a wide array of file types—including PDFs, images, videos, audio files, text documents, and code snippets—unlocking new dimensions of AI-assisted understanding.
- **Interactive Chat**: Engage in dynamic, context-aware conversations with Gemini, enabling real-time dialogue that adapts to your needs.
- **Smart File Handling**: Seamlessly upload and process files from local paths or URLs, with automatic temporary storage management to keep your workspace clutter-free.
- **Command Support**: Utilize intuitive commands to control the toolkit's functionality, enhancing efficiency and user experience.
- **Customizable Parameters**: Tailor your AI interactions by enabling structured JSON output for automated processing, using streaming responses for faster interactions, and adjusting temperature, token limits, and safety thresholds and more to suit your needs
- **Lightweight Design**: Enjoy a streamlined experience with minimal dependencies—primarily leveraging the requests package—making setup and deployment a breeze.

## 📋 Table of Contents

- [Installation](#-installation)
- [API Key Configuration](#-configuration)
- [Usage](#-usage)
- [Special Commands](#-special-commands)
- [Advanced Configuration](#%EF%B8%8F-advanced-configuration)
- [Supported Models](#-supported-models)
- [Error Handling and Safety](#-error-handling-and-safety)
- [Supported File Types](#-supported-file-types)
- [Caching and Cleanup](#-caching-and-cleanup)
- [Contributing](#-contributing)
- [Reporting Issues](#-issues-and-support)
- [Submitting Pull Requests](#-feature-requests)
- [Versioning and Changelog](#-versioning-and-changelog)
- [Security](#-security)
- [License](#-license)

## 🛠 Installation

1. Clone the repository:
```bash
git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git
```

2. Navigate to the repository folder:
```bash
cd gemini-ai-toolkit
```

3. Install the required dependencies:
```bash
pip install -r requirements.txt
```

## 🔑 Configuration
1. Obtain an API key from [Google AI Studio](https://aistudio.google.com/app/apikey).
2. You have three options for managing your API key:

Click here to view the API key configuration options

- **Setting it as an environment variable on your device (recommended for everyday use)**
- Navigate to your terminal.
- Add your API key like so:
```shell
export GEMINI_API_KEY=your_api_key
```
This method allows the API key to be loaded automatically when using the wrapper or CLI.

- **Using an .env file (recommended for development):**
- Install python-dotenv if you haven't already: `pip install python-dotenv`.
- Create a .env file in the project's root directory or rename `example.env` in the root folder to `.env` and replace `your_api_key_here` with your API key.
- Add your API key to the .env file like so:
```makefile
GEMINI_API_KEY=your_api_key
```
This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly.

- **Direct Input:**
- If you prefer not to use a `.env` file, you can directly pass your API key as an argument to the CLI or the wrapper functions.

***CLI***
```shell
--api_key "your_api_key"
```
***Wrapper***
```shell
api_key="your_api_key"
```
This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.

## 💻 Usage

### Multimodal Mode
*For processing multiple input types including audio, video, text, images, code and a wide range of files. This mode allows you to upload files (from local paths or URLs), chat with the AI about the content, and maintain a knowledge base throughout the conversation.*

***CLI***
```bash
python cli.py --multimodal --prompt "Analyze both of these files and provide a summary of each, one by one. Don't overlook any details." --files file1.jpg https://example.com/file2.pdf
```

***Wrapper***
```python
from gemini import Multimodal

Multimodal().run(prompt="Analyze both of these files and provide a summary of each, one by one. Don't overlook any details.", files=["file1.jpg", "https://example.com/file2.pdf"])
```

### Chat Mode
*For interactive conversations with the AI model.*

***CLI***
```bash
python cli.py --chat
```

***Wrapper***
```python
from gemini import Chat

Chat().run()
```

### Text Mode
*For generating text based on a prompt or a set of instructions.*

***CLI***
```bash
python cli.py --text --prompt "Write a story about a magic backpack."
```

***Wrapper***
```python
from gemini import Text

Text().run(prompt="Write a story about a magic backpack.")
```

## 🔧 Special Commands
During interaction with the toolkit, you can use the following special commands:

- `/exit` or `/quit`: End the conversation and exit the program.
- `/clear`: Clear the conversation history (useful for saving API credits).
- `/upload`: Upload a file for multimodal processing.
- Usage: `/upload file_path_and_or_url [optional prompt]`
- Example: `/upload file1.jpg https://example.com/file2.pdf Analyze the files and provide a summary of each`

## ⚙️ Advanced Configuration

| Description | CLI Flags | CLI Usage | Wrapper Usage |
|-------------|-----------|-----------|---------------|
| Chat mode | `-c`, `--chat` | `--chat` | *See mode usage above.* |
| Text mode | `-t`, `--text` | `--text` | *See mode usage above.* |
| Multimodal mode | `-m`, `--multimodal` | `--multimodal` | *See mode usage above.* |
| User prompt | `-p`, `--prompt` | `--prompt "Your prompt here"` | `prompt="Your prompt here"` |
| File inputs | `-f`, `--files` | `--files file1.jpg https://example.com/file2.pdf` | `files=["file1.jpg", "https://example.com/file2.pdf"]` |
| Enable streaming | `-s`, `--stream` | `--stream` | `stream=True` |
| Enable JSON output | `-js`, `--json` | `--json` | `json=True` |
| API Key | `-ak`, `--api_key` | `--api_key "your_api_key"` | `api_key="your_api_key"` |
| Model name | `-md`, `--model` | `--model "gemini-1.5-flash-8b"` | `model="gemini-1.5-flash-8b"` |
| System prompt | `-sp`, `--system_prompt` | `--system_prompt "Set custom instructions"` | `system_prompt="Set custom instructions"` |
| Max tokens | `-mt`, `--max_tokens` | `--max_tokens 1024` | `max_tokens=1024` |
| Temperature | `-tm`, `--temperature` | `--temperature 0.7` | `temperature=0.7` |
| Top-p | `-tp`, `--top_p` | `--top_p 0.9` | `top_p=0.9` |
| Top-k | `-tk`, `--top_k` | `--top_k 40` | `top_k=40` |
| Candidate count | `-cc`, `--candidate_count` | `--candidate_count 1` | `candidate_count=1` |
| Stop sequences | `-ss`, `--stop_sequences` | `--stop_sequences ["\n", "."]` | `stop_sequences=["\n", "."]` |
| Safety categories | `-sc`, `--safety_categories` | `--safety_categories ["HARM_CATEGORY_HARASSMENT"]` | `safety_categories=["HARM_CATEGORY_HARASSMENT"]` |
| Safety thresholds | `-st`, `--safety_thresholds` | `--safety_thresholds ["BLOCK_NONE"]` | `safety_thresholds=["BLOCK_NONE"]` |

## 📊 Supported Models

### Base Models
| **Model** | **Inputs** | **Context Length** |
|---|---|---|
| `gemini-1.5-pro-002` (*stable*) | Text, images, audio, video | 8192 |
| `gemini-1.5-pro` | Text, images, audio, video | 8192 |
| `gemini-1.5-flash-002` (*stable*) | Text, images, audio, video | 8192 |
| `gemini-1.5-flash` | Text, images, audio, video | 8192 |
| `gemini-1.5-flash-8b` | Text, images, audio, video | 8192 |
| `gemini-1.0-pro` | Text | 2048 |

> [!NOTE]
> *On October 3rd, Google released a new Gemini Flash model, `gemini-1.5-flash-8b` which is now available for production usage. On September 24th, Google released two new stable Gemini models, `gemini-1.5-pro-002` and `gemini-1.5-flash-002`. The `gemini-1.5-pro` and `gemini-1.5-flash` base models will default to use the `-002` versions automatically on October 8, 2024.

### Experimental Models
| **Model** | **Inputs** | **Context Length** |
|---|---|---|
| `gemini-1.5-pro-exp-0827` | Text, images, audio, video | 8192 |
| `gemini-1.5-flash-exp-0827` | Text, images, audio, video | 8192 |
| `gemini-1.5-flash-8b-exp-0827` | Text, images, audio, video | 8192 |

> [!NOTE]
> *The availability of specific models may be subject to change. Always refer to Google's official documentation for the most up-to-date information on model availability and capabilities. See base models docs [here](https://ai.google.dev/gemini-api/docs/models/gemini) and experimental model docs [here](https://ai.google.dev/gemini-api/docs/models/experimental-models).*

## 🔒 Error Handling and Safety

The Gemini AI Toolkit now includes robust error handling to help you diagnose and resolve issues quickly. Here are some common error codes and their solutions:

| HTTP Code | Status | Description | Solution |
|-----------|--------|-------------|----------|
| 400 | INVALID_ARGUMENT | Malformed request body | Check API reference for correct format and supported versions |
| 400 | FAILED_PRECONDITION | API not available in your country | Enable billing on your project in Google AI Studio |
| 403 | PERMISSION_DENIED | API key lacks permissions | Verify API key and access rights |
| 404 | NOT_FOUND | Resource not found | Check if all parameters are valid for your API version |
| 429 | RESOURCE_EXHAUSTED | Rate limit exceeded | Ensure you're within model rate limits or request a quota increase |
| 500 | INTERNAL | Unexpected error on Google's side | Retry after a short wait; report persistent issues |
| 503 | UNAVAILABLE | Service temporarily overloaded/down | Retry after a short wait; report persistent issues |

For rate limit errors (429), the toolkit will automatically pause for 15 seconds before retrying the request.

## 📁 Supported File Types

The Gemini AI Toolkit supports a wide range of file types for multimodal processing. Here are the supported file extensions:

| Category | File Extensions |
|--------------------|-----------------|
| **Images** | `jpg`, `jpeg`, `png`, `webp`, `gif`, `heic`, `heif` |
| **Videos** | `mp4`, `mpeg`, `mpg`, `mov`, `avi`, `flv`, `webm`, `wmv`, `3gp` |
| **Audio** | `wav`, `mp3`, `aiff`, `aac`, `ogg`, `flac` |
| **Text/Documents** | `txt`, `html`, `css`, `js`, `ts`, `csv`, `md`, `py`, `json`, `xml`, `rtf`, `pdf` |

> [!NOTE]
> *Google's Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours.*

## 💾 Caching and Cleanup

The Gemini AI Toolkit implements a caching mechanism for downloaded files to improve performance and reduce unnecessary network requests. Here's how it works:

1. When a file is downloaded from a URL, it's stored in a temporary cache folder (`.gemini_ai_toolkit_cache`).
2. The file will be used to process the request and will be stored locally due to Google's upload requirements.
3. The cache is automatically cleaned up at the end of each session to prevent accumulation of temporary files.

You don't need to manage this cache manually, but it's good to be aware of its existence, especially if you're processing large files or have limited storage space.

## 🤝 Contributing
Contributions are welcome!

Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for detailed guidelines on how to contribute to this project.

## 🐛 Issues and Support
Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:

1. Check if the issue has already been reported.
2. Use the [Bug Report](.github/ISSUE_TEMPLATE/bug_report.md) template to create a detailed report.
3. Submit the report [here](https://github.com/RMNCLDYO/gemini-ai-toolkit/issues).

Your report will help us make the project better for everyone.

## 💡 Feature Requests
Got an idea for a new feature? Feel free to suggest it. Here's how:

1. Check if the feature has already been suggested or implemented.
2. Use the [Feature Request](.github/ISSUE_TEMPLATE/feature_request.md) template to create a detailed request.
3. Submit the request [here](https://github.com/RMNCLDYO/gemini-ai-toolkit/issues).

Your suggestions for improvements are always welcome.

## 🔁 Versioning and Changelog
Stay up-to-date with the latest changes and improvements in each version:

- [CHANGELOG.md](.github/CHANGELOG.md) provides detailed descriptions of each release.

## 🔐 Security
Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in [SECURITY.md](.github/SECURITY.md). Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.

## 📄 License
Licensed under the MIT License. See [LICENSE](LICENSE) for details.