{"id":16056065,"url":"https://github.com/ksylvest/omniai-google","last_synced_at":"2026-01-24T01:12:45.887Z","repository":{"id":315147598,"uuid":"1058302736","full_name":"ksylvest/omniai-google","owner":"ksylvest","description":"An implementation of the OmniAI interface for Google.","archived":false,"fork":false,"pushed_at":"2025-09-16T23:25:42.000Z","size":155,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-17T01:24:36.489Z","etag":null,"topics":["gemini","omniai","ruby","vertex"],"latest_commit_sha":null,"homepage":"https://omniai-google.ksylvest.com","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ksylvest.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-16T23:03:32.000Z","updated_at":"2025-09-16T23:25:27.000Z","dependencies_parsed_at":"2025-09-17T01:25:10.737Z","dependency_job_id":"df12035c-9b52-44e9-a1db-e46503b4ce47","html_url":"https://github.com/ksylvest/omniai-google","commit_stats":null,"previous_names":["ksylvest/omniai-google"],"tags_count":50,"template":false,"template_full_name":null,"purl":"pkg:github/ksylvest/omniai-google","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksylvest%2Fomniai-google","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksylvest%2Fomniai-google/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksylvest%2Fomniai-google/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksylvest%2Fomniai-google/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ksylvest","download_url":"https://codeload.github.com/ksylvest/omniai-google/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksylvest%2Fomniai-google/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275658692,"owners_count":25504776,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-17T02:00:09.119Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gemini","omniai","ruby","vertex"],"created_at":"2024-10-09T02:40:25.626Z","updated_at":"2025-09-17T20:31:40.177Z","avatar_url":"https://github.com/ksylvest.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OmniAI::Google\n\n[![LICENSE](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/ksylvest/omniai-google/blob/main/LICENSE)\n[![RubyGems](https://img.shields.io/gem/v/omniai-google)](https://rubygems.org/gems/omniai-google)\n[![GitHub](https://img.shields.io/badge/github-repo-blue.svg)](https://github.com/ksylvest/omniai-google)\n[![Yard](https://img.shields.io/badge/docs-site-blue.svg)](https://omniai-google.ksylvest.com)\n[![CircleCI](https://img.shields.io/circleci/build/github/ksylvest/omniai-google)](https://circleci.com/gh/ksylvest/omniai-google)\n\nA Google implementation of the [OmniAI](https://github.com/ksylvest/omniai) APIs.\n\n## Installation\n\n```sh\ngem install omniai-google\n```\n\n## Usage\n\n### Client\n\nA client is setup as follows if `ENV['GOOGLE_API_KEY']` exists:\n\n```ruby\nclient = OmniAI::Google::Client.new\n```\n\nA client may also be passed the following options:\n\n- `api_key` (required - default is `ENV['GOOGLE_API_KEY']`)\n- `credentials` (optional)\n- `host` (optional)\n- `version` (optional - options are `v1` or `v1beta`)\n\n### Configuration\n\nVertex AI and Google AI offer different options for interacting w/ Google's AI APIs. Checkout the [Vertex AI and Google AI differences](https://cloud.google.com/vertex-ai/generative-ai/docs/overview#how-gemini-vertex-different-gemini-aistudio) to determine which option best fits your requirements.\n\n#### Configuration w/ Google AI\n\nIf using Gemini simply provide an `api_key`:\n\n```ruby\nOmniAI::Google.configure do |config|\n  config.api_key = 'sk-...' # defaults is `ENV['GOOGLE_API_KEY']`\nend\n```\n\n#### Configuration w/ Vertex AI\n\nIf using Vertex supply the `credentials`, `host`, `location_id` and `project_id`:\n\n```ruby\nOmniAI::Google.configure do |config|\n  config.credentials = File.open(\"./credentials.json\") # default is `ENV['GOOGLE_CREDENTIALS_PATH']` / `ENV['GOOGLE_CREDENTIALS_JSON']`\n  config.host = 'https://us-east4-aiplatform.googleapis.com' # default is `ENV['GOOGLE_HOST']`\n  config.location_id = 'us-east4' # defaults is `ENV['GOOGLE_LOCATION_ID']`\n  config.project_id = '...' # defaults is `ENV['GOOGLE_PROJECT_ID']`\nend\n```\n\n**Note for Transcription**: When using transcription features, ensure your service account has the necessary permissions for Google Cloud Speech-to-Text API and Google Cloud Storage (for automatic file uploads). See the [GCS Setup](#gcs-setup-for-transcription) section below for detailed configuration.\n\nCredentials may be configured using:\n\n1. A `File` / `String` / `Pathname`.\n2. Assigning `ENV['GOOGLE_CREDENTIALS_PATH']` as the path to the `credentials.json`.\n3. Assigning `ENV['GOOGLE_CREDENTIALS_JSON']` to the contents of `credentials.json`.\n\n### Chat\n\nA chat completion is generated by passing in a simple text prompt:\n\n```ruby\ncompletion = client.chat('Tell me a joke!')\ncompletion.text # 'Why did the chicken cross the road? To get to the other side.'\n```\n\nA chat completion may also be generated by using the prompt builder:\n\n```ruby\ncompletion = client.chat do |prompt|\n  prompt.system('Your are an expert in geography.')\n  prompt.user('What is the capital of Canada?')\nend\ncompletion.text # 'The capital of Canada is Ottawa.'\n```\n\n#### Model\n\n`model` takes an optional string (default is `gemini-1.5-pro`):\n\n```ruby\ncompletion = client.chat('How fast is a cheetah?', model: OmniAI::Google::Chat::Model::GEMINI_FLASH)\ncompletion.text # 'A cheetah can reach speeds over 100 km/h.'\n```\n\n[Google API Reference `model`](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versioning#gemini-model-versions)\n\n#### Temperature\n\n`temperature` takes an optional float between `0.0` and ` 2.0`:\n\n```ruby\ncompletion = client.chat('Pick a number between 1 and 5', temperature: 2.0)\ncompletion.text # '3'\n```\n\n[Google API Reference `temperature`](https://ai.google.dev/api/rest/v1/GenerationConfig)\n\n#### Stream\n\n`stream` takes an optional a proc to stream responses in real-time chunks instead of waiting for a complete response:\n\n```ruby\nstream = proc do |chunk|\n  print(chunk.text) # 'Better', 'three', 'hours', ...\nend\nclient.chat('Be poetic.', stream:)\n```\n\n### Upload\n\nAn upload is especially useful when processing audio / image / video / text files. To use:\n\n```ruby\nCAT_URL = 'https://images.unsplash.com/photo-1472491235688-bdc81a63246e?fm=jpg'\nDOG_URL = 'https://images.unsplash.com/photo-1517849845537-4d257902454a?fm=jpg'\n\nbegin\n  cat_upload = client.upload(CAT_URL)\n  dog_upload = client.upload(DOG_URL)\n\n  completion = client.chat(stream: $stdout) do |prompt|\n    prompt.user do |message|\n      message.text 'What are these photos of?'\n      message.url(cat_upload.uri, cat_upload.mime_type)\n      message.url(dog_upload.uri, dog_upload.mime_type)\n    end\n  end\nensure\n  cat_upload.delete!\n  dog_upload.delete!\nend\n```\n\n[Google API Reference `stream`](https://ai.google.dev/gemini-api/docs/api-overview#stream)\n\n### Transcribe\n\nAudio files can be transcribed using Google's Speech-to-Text API. The implementation automatically handles both synchronous and asynchronous recognition based on file size and model type.\n\n#### Basic Usage\n\n```ruby\n# Transcribe a local audio file\nresult = client.transcribe(\"path/to/audio.mp3\")\nresult.text # \"Hello, this is the transcribed text...\"\n\n# Transcribe with specific model\nresult = client.transcribe(\"path/to/audio.mp3\", model: \"latest_long\")\nresult.text # \"Hello, this is the transcribed text...\"\n```\n\n#### Multi-Language Detection\n\nThe transcription automatically detects multiple languages when no specific language is provided:\n\n```ruby\n# Auto-detect English and Spanish\nresult = client.transcribe(\"bilingual_audio.mp3\", model: \"latest_long\")\nresult.text # \"Hello, how are you? Hola, ¿cómo estás?\"\n\n# Specify expected languages explicitly\nresult = client.transcribe(\"audio.mp3\", language: [\"en-US\", \"es-US\"], model: \"latest_long\")\n```\n\n#### Detailed Transcription with Timestamps\n\nUse `VERBOSE_JSON` format to get detailed timing information, confidence scores, and language detection per segment:\n\n```ruby\nresult = client.transcribe(\"audio.mp3\", \n  model: \"latest_long\", \n  format: OmniAI::Transcribe::Format::VERBOSE_JSON\n)\n\n# Access the full transcript\nresult.text # \"Complete transcribed text...\"\n\n# Access detailed segment information\nresult.segments.each do |segment|\n  puts \"Segment #{segment[:segment_id]}: #{segment[:text]}\"\n  puts \"Language: #{segment[:language_code]}\"\n  puts \"Confidence: #{segment[:confidence]}\"\n  puts \"End time: #{segment[:end_time]}\"\n  \n  # Word-level timing (if available)\n  segment[:words].each do |word|\n    puts \"  #{word[:word]} (#{word[:start_time]} - #{word[:end_time]})\"\n  end\nend\n\n# Total audio duration\nputs \"Total duration: #{result.total_duration}\"\n```\n\n#### Models\n\nThe transcription supports various models optimized for different use cases:\n\n```ruby\n# For short audio (\u003c 60 seconds)\nclient.transcribe(\"short_audio.mp3\", model: OmniAI::Google::Transcribe::Model::LATEST_SHORT)\n\n# For long-form audio (\u003e 60 seconds) - automatically uses async processing\nclient.transcribe(\"long_audio.mp3\", model: OmniAI::Google::Transcribe::Model::LATEST_LONG)\n\n# For phone/telephony audio\nclient.transcribe(\"phone_call.mp3\", model: OmniAI::Google::Transcribe::Model::TELEPHONY_LONG)\n\n# For medical conversations\nclient.transcribe(\"medical_interview.mp3\", model: OmniAI::Google::Transcribe::Model::MEDICAL_CONVERSATION)\n\n# Other available models\nclient.transcribe(\"audio.mp3\", model: OmniAI::Google::Transcribe::Model::CHIRP_2) # Enhanced model\nclient.transcribe(\"audio.mp3\", model: OmniAI::Google::Transcribe::Model::CHIRP)   # Universal model\n```\n\n**Available Model Constants:**\n- `OmniAI::Google::Transcribe::Model::LATEST_SHORT` - Optimized for audio \u003c 60 seconds\n- `OmniAI::Google::Transcribe::Model::LATEST_LONG` - Optimized for long-form audio\n- `OmniAI::Google::Transcribe::Model::TELEPHONY_SHORT` - For short phone calls\n- `OmniAI::Google::Transcribe::Model::TELEPHONY_LONG` - For long phone calls  \n- `OmniAI::Google::Transcribe::Model::MEDICAL_CONVERSATION` - For medical conversations\n- `OmniAI::Google::Transcribe::Model::MEDICAL_DICTATION` - For medical dictation\n- `OmniAI::Google::Transcribe::Model::CHIRP_2` - Enhanced universal model\n- `OmniAI::Google::Transcribe::Model::CHIRP` - Universal model\n\n#### Supported Formats\n\n- **Input**: MP3, WAV, FLAC, and other common audio formats\n- **GCS URIs**: Direct transcription from Google Cloud Storage\n- **File uploads**: Automatic upload to GCS for files \u003e 10MB or long-form models\n\n#### Advanced Features\n\n**Automatic Processing Selection:**\n- Files \u003c 60 seconds: Uses synchronous recognition\n- Files \u003e 60 seconds or long-form models: Uses asynchronous batch recognition\n- Large files: Automatically uploaded to Google Cloud Storage\n\n**GCS Integration:**\n- Automatic file upload and cleanup\n- Support for existing GCS URIs\n- Configurable bucket names\n\n**Error Handling:**\n- Automatic retry logic for temporary failures\n- Clear error messages for common issues\n- Graceful handling of network timeouts\n\n[Google Speech-to-Text API Reference](https://cloud.google.com/speech-to-text/docs)\n\n#### GCS Setup for Transcription\n\nFor transcription to work properly with automatic file uploads, you need to set up Google Cloud Storage and configure the appropriate permissions.\n\n##### 1. Create a GCS Bucket\n\nYou must create a bucket named `{project_id}-speech-audio` manually before using transcription features:\n\n```bash\n# Using gcloud CLI\ngsutil mb gs://your-project-id-speech-audio\n\n# Or create via Google Cloud Console\n# Navigate to Cloud Storage \u003e Browser \u003e Create Bucket\n```\n\n##### 2. Service Account Permissions\n\nYour service account needs the following IAM roles for transcription to work:\n\n**Required Roles:**\n- **Cloud Speech Editor** - Grants access to edit resources in Speech-to-Text\n- **Storage Bucket Viewer** - Grants permission to view buckets and their metadata, excluding IAM policies\n- **Storage Object Admin** - Grants full control over objects, including listing, creating, viewing, and deleting objects\n\n**To assign roles via gcloud CLI:**\n\n```bash\n# Replace YOUR_SERVICE_ACCOUNT_EMAIL and YOUR_PROJECT_ID with actual values\nSERVICE_ACCOUNT=\"your-service-account@your-project-id.iam.gserviceaccount.com\"\nPROJECT_ID=\"your-project-id\"\n\n# Grant Speech-to-Text permissions\ngcloud projects add-iam-policy-binding $PROJECT_ID \\\n    --member=\"serviceAccount:$SERVICE_ACCOUNT\" \\\n    --role=\"roles/speech.editor\"\n\n# Grant Storage permissions\ngcloud projects add-iam-policy-binding $PROJECT_ID \\\n    --member=\"serviceAccount:$SERVICE_ACCOUNT\" \\\n    --role=\"roles/storage.objectAdmin\"\n\ngcloud projects add-iam-policy-binding $PROJECT_ID \\\n    --member=\"serviceAccount:$SERVICE_ACCOUNT\" \\\n    --role=\"roles/storage.legacyBucketReader\"\n```\n\n**Or via Google Cloud Console:**\n1. Go to IAM \u0026 Admin \u003e IAM\n2. Find your service account\n3. Click \"Edit Principal\" \n4. Add the required roles listed above\n\n##### 3. Enable Required APIs\n\nEnsure the following APIs are enabled in your Google Cloud Project:\n\n```bash\n# Enable Speech-to-Text API\ngcloud services enable speech.googleapis.com\n\n# Enable Cloud Storage API  \ngcloud services enable storage.googleapis.com\n```\n\n##### 4. Bucket Configuration (Optional)\n\nYou can customize the bucket name by configuring it in your application:\n\n```ruby\n# Custom bucket name in your transcription calls\n# The bucket must exist and your service account must have access\nclient.transcribe(\"audio.mp3\", bucket_name: \"my-custom-audio-bucket\")\n```\n\n**Important Notes:**\n- The default bucket name follows the pattern: `{project_id}-speech-audio`\n- You must create the bucket manually before using transcription features\n- Choose an appropriate region for your bucket based on your location and compliance requirements\n- Audio files are automatically deleted after successful transcription\n- If transcription fails, temporary files may remain and should be cleaned up manually\n\n### Embed\n\nText can be converted into a vector embedding for similarity comparison usage via:\n\n```ruby\nresponse = client.embed('The quick brown fox jumps over a lazy dog.')\nresponse.embedding # [0.0, ...]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fksylvest%2Fomniai-google","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fksylvest%2Fomniai-google","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fksylvest%2Fomniai-google/lists"}