https://github.com/stawa/gtts

This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.
https://github.com/stawa/gtts

ai gemini google-gemini stt tts typescript

Last synced: 3 months ago
JSON representation

This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.

Host: GitHub
URL: https://github.com/stawa/gtts
Owner: Stawa
License: mit
Created: 2024-03-05T17:02:28.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-08-28T01:09:25.000Z (10 months ago)
Last Synced: 2024-08-28T23:27:35.822Z (10 months ago)
Topics: ai, gemini, google-gemini, stt, tts, typescript
Language: TypeScript
Homepage: https://stawa.github.io/GTTS/
Size: 261 KB
Stars: 7
Watchers: 0
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


  

  Gemini Text-To-Speech 

  



Transform written content into speech using Google AI (Gemini) for text generation and internet-based information retrieval.




  

    

  

  

    

  

  

    

  

  

    

  

  

    

  





📜 Table of Contents



  How It Works

  Project Note

  Project Installation

  Project Examples

  Contributors





❓ How It Works

This project is based on an example in test/app.ts. It performs the following steps:



  Fetches a voice input

  Sends a request to the Google Gemini API to receive an AI-generated response

  Automatically converts the response to speech using Text-To-Speech (TTS) technology

  Plays the generated audio





📌 Project Note

This project has been tested on Linux (Ubuntu 24.04 LTS x86_64). Windows users can install SoX via SourceForge. MacOS-specific information is currently unavailable.


  

    Task

    Priority

    Status

  

  

    Implement Gemini Chat

    High

    ✅ Completed

  

  

    Develop Voice Recognition

    High

    ✅ Completed

  

  

    Implement Audio Language Detection

    High

    ✅ Completed

  

  

    Implement Text Language Detection

    Medium

    ✅ Completed

  

  

    Implement an Audio Player

    Low

    ✅ Completed

  

  

    Define Enums

    Low

    ✅ Completed

  

  

    Integrate Debugging

    Low

    ✅ Completed

  



📦 Project Installation

Before using this repository, ensure the following dependencies are installed on your system:


Linux



  

SoX: sudo apt-get install sox



  

libsox-fmt-all: sudo apt-get install libsox-fmt-all



  

FFmpeg: sudo apt install ffmpeg





Windows



  

SoX: Download from SourceForge



  

FFmpeg: choco install ffmpeg (using Chocolatey) or Download from official website





MacOS

MacOS-specific installation instructions are not available at this time.


To install the package, use one of the following commands based on your preferred package manager:


```bash

# npm

$ npm install git+https://github.com/Stawa/GTTS.git --legacy-peer-deps

# Bun

$ bun install git+https://github.com/Stawa/GTTS.git --trust

```



📄 Project Examples

Before diving into the examples, ensure you have the following API keys and credentials:



  

Google Gemini API Key (lib.GoogleGemini)

    Obtain from Google Cloud Console



  

  

TikTok SessionID (lib.TextToSpeech)

    Extract from TikTok browser cookies after logging in

  

  

Google Speech API Key (lib.VoiceRecognition.fetchTranscriptGoogle)

    Generate from Google Cloud Console Credentials



  

  

Deepgram API Key (lib.VoiceRecognition.fetchTranscriptDeepgram)

    Create an account and obtain from Deepgram Console



  

  

EdenAI API Key (lib.SummarizeText)

    Sign up and retrieve from EdenAI Dashboard



  



Ensure to store these API keys securely and never commit them to version control. Consider using environment variables or a secure key management system.


Here's a concise example demonstrating how to generate a response using the Google Gemini API:


```ts

import { GoogleGemini } from "@stawa/gtts";

import dotenv from "dotenv";

dotenv.config();

const gemini = new GoogleGemini({

  apiKey: process.env.GEMINI_API_KEY,

  model: "gemini-1.5-flash",

  enableLogging: true,

});

async function main() {

  try {

    const question = "When was Facebook launched?";

    console.log(`Question: ${question}`);

    const response = await gemini.chat(question);

    console.log(`Gemini's response: ${response}`);

  } catch (error) {

    console.error("An error occurred:", error);

  }

}

main();

```



👥 Contributors

We appreciate the contributions of all our collaborators. Each person's effort helps make this project better. A special thanks to all our contributors who have helped shape this project!