An open API service indexing awesome lists of open source software.

https://github.com/stawa/gtts

This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.
https://github.com/stawa/gtts

ai gemini google-gemini stt tts typescript

Last synced: 3 months ago
JSON representation

This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.

Awesome Lists containing this project

README

        


Gemini Icon
Gemini Text-To-Speech
Gemini Icon

Transform written content into speech using Google AI (Gemini) for text generation and internet-based information retrieval.



Google Gemini


Made with TypeScript


Powered by Bun


Documentation


SonarCloud Reliability Rating


📜 Table of Contents



  1. How It Works

  2. Project Note

  3. Project Installation

  4. Project Examples

  5. Contributors


❓ How It Works


This project is based on an example in test/app.ts. It performs the following steps:



  1. Fetches a voice input

  2. Sends a request to the Google Gemini API to receive an AI-generated response

  3. Automatically converts the response to speech using Text-To-Speech (TTS) technology

  4. Plays the generated audio


📌 Project Note


This project has been tested on Linux (Ubuntu 24.04 LTS x86_64). Windows users can install SoX via SourceForge. MacOS-specific information is currently unavailable.


Task
Priority
Status


Implement Gemini Chat
High
✅ Completed


Develop Voice Recognition
High
✅ Completed


Implement Audio Language Detection
High
✅ Completed


Implement Text Language Detection
Medium
✅ Completed


Implement an Audio Player
Low
✅ Completed


Define Enums
Low
✅ Completed


Integrate Debugging
Low
✅ Completed


📦 Project Installation


Before using this repository, ensure the following dependencies are installed on your system:

Linux




  • SoX: sudo apt-get install sox


  • libsox-fmt-all: sudo apt-get install libsox-fmt-all


  • FFmpeg: sudo apt install ffmpeg

Windows


MacOS


MacOS-specific installation instructions are not available at this time.

To install the package, use one of the following commands based on your preferred package manager:

```bash
# npm
$ npm install git+https://github.com/Stawa/GTTS.git --legacy-peer-deps
# Bun
$ bun install git+https://github.com/Stawa/GTTS.git --trust
```


📄 Project Examples


Before diving into the examples, ensure you have the following API keys and credentials:




  • Google Gemini API Key (lib.GoogleGemini)



  • TikTok SessionID (lib.TextToSpeech)
    • Extract from TikTok browser cookies after logging in



  • Google Speech API Key (lib.VoiceRecognition.fetchTranscriptGoogle)



  • Deepgram API Key (lib.VoiceRecognition.fetchTranscriptDeepgram)



  • EdenAI API Key (lib.SummarizeText)


Ensure to store these API keys securely and never commit them to version control. Consider using environment variables or a secure key management system.

Here's a concise example demonstrating how to generate a response using the Google Gemini API:

```ts
import { GoogleGemini } from "@stawa/gtts";
import dotenv from "dotenv";
dotenv.config();

const gemini = new GoogleGemini({
apiKey: process.env.GEMINI_API_KEY,
model: "gemini-1.5-flash",
enableLogging: true,
});

async function main() {
try {
const question = "When was Facebook launched?";
console.log(`Question: ${question}`);

const response = await gemini.chat(question);
console.log(`Gemini's response: ${response}`);
} catch (error) {
console.error("An error occurred:", error);
}
}

main();
```


👥 Contributors


We appreciate the contributions of all our collaborators. Each person's effort helps make this project better. A special thanks to all our contributors who have helped shape this project!


Contributors