https://github.com/cypher-o/voice-bridge

APIs for Text-to-Speech (TTS), Speech-to-Text (STT), and Document Reading (DOCX/PDF)
https://github.com/cypher-o/voice-bridge

document-reader stt-api tts-api typescript

Last synced: 4 months ago
JSON representation

APIs for Text-to-Speech (TTS), Speech-to-Text (STT), and Document Reading (DOCX/PDF)

Host: GitHub
URL: https://github.com/cypher-o/voice-bridge
Owner: Cypher-O
License: mit
Created: 2024-11-12T21:34:25.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-12T22:56:30.000Z (8 months ago)
Last Synced: 2025-01-29T21:50:46.850Z (6 months ago)
Topics: document-reader, stt-api, tts-api, typescript
Language: TypeScript
Homepage:
Size: 81.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Document Processing and Speech API

A TypeScript-based Express API for document processing (PDF/DOCX), speech-to-text, and text-to-speech conversion using Google Cloud services.

## Features

- 📄 Document Processing (PDF & DOCX)

- 🎤 Speech to Text Conversion

- 🔊 Text to Speech Conversion

- ⚡ Rate Limiting

- 🔒 Type Safety

- 📝 Standardized API Responses

## Prerequisites

- Node.js (v14 or higher)

- TypeScript (v4 or higher)

- Google Cloud Account with Speech & Text-to-Speech APIs enabled

- Service Account Key from Google Cloud

## Installation

1. Clone the repository:

```bash

git clone 

cd document-speech-api

```

2. Install dependencies:

```bash

npm install

```

3. Set up environment variables:

Create a `.env` file in the root directory:

```env

PORT=3000

GOOGLE_APPLICATION_CREDENTIALS="path/to/your/service-account-key.json"

RATE_LIMIT_WINDOW_MS=900000

RATE_LIMIT_MAX_REQUESTS=100

```

## Project Structure

```

src/

├── controllers/

│   ├── documentReaderController.ts

│   ├── speechToTextController.ts

│   └── textToSpeechController.ts

├── services/

│   ├── documentReaderService.ts

│   ├── speechToTextService.ts

│   └── textToSpeechService.ts

├── routes/

│   └── apiRoutes.ts

├── types/

│   ├── express.d.ts

│   └── api_response.ts

├── utils/

|   ├── api_response.ts

│   └── logger.ts

├── middlewares/

|   ├── errorHandlerMiddleware.ts

│   └── rateLimitMiddleware.ts

├── app.ts

└── server.ts

```

## Dependencies

```json

{

  "dependencies": {

    "@google-cloud/speech": "^latest",

    "@google-cloud/text-to-speech": "^latest",

    "express": "^latest",

    "multer": "^latest",

    "pdf-parse": "^latest",

    "mammoth": "^latest",

    "express-rate-limit": "^latest",

    "dotenv": "^latest"

  },

  "devDependencies": {

    "@types/express": "^latest",

    "@types/multer": "^latest",

    "@types/node": "^latest",

    "typescript": "^latest",

    "ts-node": "^latest",

    "nodemon": "^latest"

  }

}

```

## API Endpoints

### 1. Document Reading

```http

POST /api/read-document

Content-Type: multipart/form-data

```

#### Request

- `file`: PDF or DOCX file

#### Response

```json

{

  "code": 0,

  "status": "success",

  "message": "Document read successfully",

  "data": {

    "text": "extracted text content",

    "fileName": "document.pdf",

    "fileType": "application/pdf"

  }

}

```

### 2. Speech to Text

```http

POST /api/speech-to-text

Content-Type: multipart/form-data

```

#### Request

- `audio`: Audio file (MP3)

#### Response

```json

{

  "code": 0,

  "status": "success",

  "message": "Audio transcribed successfully",

  "data": {

    "text": "transcribed text",

    "audioFileName": "audio.mp3",

    "duration": 10.5

  }

}

```

### 3. Text to Speech

```http

POST /api/text-to-speech

Content-Type: application/json

```

#### Request

```json

{

  "text": "Text to convert to speech",

  "voice": "en-US",  // optional

  "speed": 1.0       // optional

}

```

#### Response

- Audio stream (audio/mpeg) if successful

- Error response if failed:

```json

{

  "code": 1,

  "status": "error",

  "message": "Error message"

}

```

## Error Codes

- 0: Success

- 400: Bad Request

- 401: Unauthorized

- 403: Forbidden

- 404: Not Found

- 500: Internal Server Error

## Usage Examples

### Using axios

```typescript

import axios from 'axios';

// Document Reading

const readDocument = async (file: File) => {

  const formData = new FormData();

  formData.append('file', file);

  

  try {

    const response = await axios.post('/api/read-document', formData, {

      headers: {

        'Content-Type': 'multipart/form-data'

      }

    });

    return response.data;

  } catch (error) {

    console.error('Error reading document:', error);

    throw error;

  }

};

// Speech to Text

const convertSpeechToText = async (audioFile: File) => {

  const formData = new FormData();

  formData.append('audio', audioFile);

  

  try {

    const response = await axios.post('/api/speech-to-text', formData, {

      headers: {

        'Content-Type': 'multipart/form-data'

      }

    });

    return response.data;

  } catch (error) {

    console.error('Error converting speech to text:', error);

    throw error;

  }

};

// Text to Speech

const convertTextToSpeech = async (text: string) => {

  try {

    const response = await axios.post('/api/text-to-speech', 

      { text },

      { responseType: 'blob' }

    );

    return response.data;

  } catch (error) {

    console.error('Error converting text to speech:', error);

    throw error;

  }

};

```

## Running the Application

1. Development mode:

```bash

npm run dev

```

2. Production mode:

```bash

npm run build

npm start

```

## Setting Up Google Cloud Credentials

1. Create a project in Google Cloud Console

2. Enable Speech-to-Text and Text-to-Speech APIs

3. Create a service account and download the key file

4. Set the path to your key file in the `GOOGLE_APPLICATION_CREDENTIALS` environment variable

## Rate Limiting

The API includes rate limiting to prevent abuse. Default settings:

- 100 requests per 15 minutes window

- Customize these values in the `.env` file

## Contributing

1. Fork the repository

2. Create your feature branch (`git checkout -b feature/amazing-feature`)

3. Commit your changes (`git commit -m 'Add some amazing feature'`)

4. Push to the branch (`git push origin feature/amazing-feature`)

5. Open a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cypher-o/voice-bridge

Awesome Lists containing this project

README