{"id":28581186,"url":"https://github.com/hormold/voiceoverbot","last_synced_at":"2025-08-17T01:32:53.731Z","repository":{"id":294361024,"uuid":"985017896","full_name":"Hormold/voiceoverbot","owner":"Hormold","description":"VoiceOverBot is a Telegram bot that transcribes voice messages sent to it using Google's Generative AI (Gemini)","archived":false,"fork":false,"pushed_at":"2025-05-23T21:13:37.000Z","size":114,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-11T04:16:24.278Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Hormold.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-16T23:09:00.000Z","updated_at":"2025-05-23T21:13:40.000Z","dependencies_parsed_at":"2025-05-20T04:24:15.740Z","dependency_job_id":"100c5667-77d4-400e-b37a-94e5084d1530","html_url":"https://github.com/Hormold/voiceoverbot","commit_stats":null,"previous_names":["hormold/voiceoverbot"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Hormold/voiceoverbot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hormold%2Fvoiceoverbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hormold%2Fvoiceoverbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hormold%2Fvoiceoverbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hormold%2Fvoiceoverbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Hormold","download_url":"https://codeload.github.com/Hormold/voiceoverbot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hormold%2Fvoiceoverbot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270796221,"owners_count":24647319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-11T04:16:18.279Z","updated_at":"2025-08-17T01:32:53.701Z","avatar_url":"https://github.com/Hormold.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VoiceOverBot\n\nVoiceOverBot is a Telegram bot that transcribes voice messages and video notes (video circles) sent to it using Google's Generative AI (Gemini). It's built with Node.js, TypeScript, and the `node-telegram-bot-api` library.\n\n**Author:** Gemini (via Google)\n\n## Features\n\n*   Receives voice messages and video notes (video circles) in Telegram chats.\n*   Downloads voice messages directly or extracts audio from video notes using FFmpeg.\n*   Transcribes the audio using Google's Gemini Pro model (specifically `gemini-2.5-pro-preview-05-06` by default) via the Vercel AI SDK.\n*   Replies to the original voice message or video note with the transcribed text.\n*   Supports various audio file formats sent as documents (MP3, M4A, OGG, WAV, AAC).\n*   Handles chat member updates: greets when added to a new chat and informs about the need for admin rights to read messages.\n*   Includes basic error handling and retry mechanisms.\n\n## Project Structure\n\n```\n/\n├── dist/                     # Compiled JavaScript files\n├── src/\n│   └── index.ts              # Main application logic\n├── .env                      # Environment variables (create this file)\n├── .gitignore                # Git ignore file\n├── package.json              # Project dependencies and scripts\n├── README.md                 # This file\n└── tsconfig.json             # TypeScript compiler options\n```\n\n## Prerequisites\n\n*   Node.js (v18 or higher recommended)\n*   pnpm (or npm/yarn)\n*   A Telegram Bot Token\n*   A Google Generative AI API Key\n*   FFmpeg (for video note audio extraction) - Install from [https://ffmpeg.org/](https://ffmpeg.org/)\n\n## Setup\n\n1.  **Clone the repository (or set up your existing project):**\n    ```bash\n    # If you have a git repo already, skip this\n    git clone https://github.com/hormold/voiceoverbot.git\n    cd voiceoverbot\n    ```\n\n2.  **Install dependencies:**\n    ```bash\n    pnpm install\n    ```\n\n3.  **Create a `.env` file** in the root of the project and add your API keys and bot token:\n    ```env\n    BOT_TOKEN=YOUR_TELEGRAM_BOT_TOKEN\n    GOOGLE_GENERATIVE_AI_API_KEY=YOUR_GOOGLE_GENERATIVE_AI_API_KEY\n\n    # Optional: Specify a Gemini model ID (defaults to gemini-2.5-pro-preview-05-06 in the code)\n    # GEMINI_MODEL_ID=gemini-2.5-pro-preview-05-06\n    ```\n    *   Replace `YOUR_TELEGRAM_BOT_TOKEN` with your actual Telegram bot token.\n    *   Replace `YOUR_GOOGLE_GENERATIVE_AI_API_KEY` with your Google AI API key.\n\n4.  **Build the project (compile TypeScript to JavaScript):**\n    ```bash\n    pnpm build\n    ```\n\n## Running the Bot\n\n*   **To start the bot for development (with auto-reloading via nodemon):**\n    ```bash\n    pnpm dev\n    ```\n    This command uses `nodemon` to watch for changes in `src/index.ts` and automatically restarts the bot.\n\n*   **To start the bot for production:**\n    ```bash\n    pnpm start\n    ```\n    This command runs the compiled JavaScript from the `dist` directory.\n\n## How it Works\n\n1.  The bot connects to Telegram using the `node-telegram-bot-api`.\n2.  When a voice message is received, the bot downloads the audio file into a buffer.\n3.  When a video note (video circle) is received, the bot:\n    *   Downloads the MP4 video file\n    *   Uses FFmpeg via `fluent-ffmpeg` to extract the audio track from the video\n    *   Converts the audio to MP3 format for compatibility\n4.  For audio documents (MP3, M4A, OGG, WAV, AAC), the bot downloads the file directly.\n5.  The audio data is structured as a `CoreMessage` part with `type: \"file\"`, appropriate `mimeType`, and the audio `Buffer`. This, along with a text prompt, is sent to the specified Google Gemini model using the `generateText` function from the Vercel AI SDK (`ai` package) with the `@ai-sdk/google` provider.\n6.  A system prompt instructs the AI on how to behave: transcribe accurately, preserve the original language, apply proper formatting, avoid extraneous content, and strictly use the `outputTranscription` tool for its response.\n7.  The AI is forced (via `toolChoice`) to use the `outputTranscription` tool. This tool is defined with a Zod schema ensuring the AI provides the transcribed text in the expected string format.\n8.  When the AI calls the tool, the `execute` function within the tool definition resolves with the transcribed text.\n9.  The bot then sends this text back to the Telegram chat as a reply to the original voice message, video note, or audio document.\n10. The bot also handles being added to new chats by sending a welcome message and mentioning the need for admin permissions to function correctly.\n\n## Dependencies\n\n*   `node-telegram-bot-api`: For interacting with the Telegram Bot API.\n*   `ai`: Vercel AI SDK for streamlined access to AI models.\n*   `@ai-sdk/google`: Google provider for the Vercel AI SDK.\n*   `dotenv`: For loading environment variables from a `.env` file.\n*   `zod`: For schema validation (used for defining the AI tool's parameters).\n*   `fluent-ffmpeg`: For extracting audio from video notes (video circles).\n\n## Development Dependencies\n\n*   `typescript`: For TypeScript language support.\n*   `ts-node`: To run TypeScript files directly.\n*   `nodemon`: To automatically restart the application during development.\n*   `@types/*`: Type definitions for various libraries.\n\n## Contributing\n\nContributions are welcome! If you have suggestions or improvements, feel free to open an issue or submit a pull request.\n\n---\n\n*This project was generated with assistance from Gemini.* ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhormold%2Fvoiceoverbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhormold%2Fvoiceoverbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhormold%2Fvoiceoverbot/lists"}