Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cypher-o/voice-bridge
APIs for Text-to-Speech (TTS), Speech-to-Text (STT), and Document Reading (DOCX/PDF)
https://github.com/cypher-o/voice-bridge
document-reader stt-api tts-api typescript
Last synced: about 1 month ago
JSON representation
APIs for Text-to-Speech (TTS), Speech-to-Text (STT), and Document Reading (DOCX/PDF)
- Host: GitHub
- URL: https://github.com/cypher-o/voice-bridge
- Owner: Cypher-O
- License: mit
- Created: 2024-11-12T21:34:25.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-12T22:21:11.000Z (2 months ago)
- Last Synced: 2024-11-12T22:32:29.151Z (2 months ago)
- Topics: document-reader, stt-api, tts-api, typescript
- Language: TypeScript
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Document Processing and Speech API
A TypeScript-based Express API for document processing (PDF/DOCX), speech-to-text, and text-to-speech conversion using Google Cloud services.
## Features
- 📄 Document Processing (PDF & DOCX)
- 🎤 Speech to Text Conversion
- 🔊 Text to Speech Conversion
- ⚡ Rate Limiting
- 🔒 Type Safety
- 📝 Standardized API Responses## Prerequisites
- Node.js (v14 or higher)
- TypeScript (v4 or higher)
- Google Cloud Account with Speech & Text-to-Speech APIs enabled
- Service Account Key from Google Cloud## Installation
1. Clone the repository:
```bash
git clone
cd document-speech-api
```2. Install dependencies:
```bash
npm install
```3. Set up environment variables:
Create a `.env` file in the root directory:```env
PORT=3000
GOOGLE_APPLICATION_CREDENTIALS="path/to/your/service-account-key.json"
RATE_LIMIT_WINDOW_MS=900000
RATE_LIMIT_MAX_REQUESTS=100
```## Project Structure
```
src/
├── controllers/
│ ├── documentReaderController.ts
│ ├── speechToTextController.ts
│ └── textToSpeechController.ts
├── services/
│ ├── documentReaderService.ts
│ ├── speechToTextService.ts
│ └── textToSpeechService.ts
├── routes/
│ └── apiRoutes.ts
├── types/
│ ├── express.d.ts
│ └── api_response.ts
├── utils/
| ├── api_response.ts
│ └── logger.ts
├── middlewares/
| ├── errorHandlerMiddleware.ts
│ └── rateLimitMiddleware.ts
├── app.ts
└── server.ts
```## Dependencies
```json
{
"dependencies": {
"@google-cloud/speech": "^latest",
"@google-cloud/text-to-speech": "^latest",
"express": "^latest",
"multer": "^latest",
"pdf-parse": "^latest",
"mammoth": "^latest",
"express-rate-limit": "^latest",
"dotenv": "^latest"
},
"devDependencies": {
"@types/express": "^latest",
"@types/multer": "^latest",
"@types/node": "^latest",
"typescript": "^latest",
"ts-node": "^latest",
"nodemon": "^latest"
}
}
```## API Endpoints
### 1. Document Reading
```http
POST /api/read-document
Content-Type: multipart/form-data
```#### Request
- `file`: PDF or DOCX file
#### Response
```json
{
"code": 0,
"status": "success",
"message": "Document read successfully",
"data": {
"text": "extracted text content",
"fileName": "document.pdf",
"fileType": "application/pdf"
}
}
```### 2. Speech to Text
```http
POST /api/speech-to-text
Content-Type: multipart/form-data
```#### Request
- `audio`: Audio file (MP3)
#### Response
```json
{
"code": 0,
"status": "success",
"message": "Audio transcribed successfully",
"data": {
"text": "transcribed text",
"audioFileName": "audio.mp3",
"duration": 10.5
}
}
```### 3. Text to Speech
```http
POST /api/text-to-speech
Content-Type: application/json
```#### Request
```json
{
"text": "Text to convert to speech",
"voice": "en-US", // optional
"speed": 1.0 // optional
}
```#### Response
- Audio stream (audio/mpeg) if successful
- Error response if failed:```json
{
"code": 1,
"status": "error",
"message": "Error message"
}
```## Error Codes
- 0: Success
- 400: Bad Request
- 401: Unauthorized
- 403: Forbidden
- 404: Not Found
- 500: Internal Server Error## Usage Examples
### Using axios
```typescript
import axios from 'axios';// Document Reading
const readDocument = async (file: File) => {
const formData = new FormData();
formData.append('file', file);
try {
const response = await axios.post('/api/read-document', formData, {
headers: {
'Content-Type': 'multipart/form-data'
}
});
return response.data;
} catch (error) {
console.error('Error reading document:', error);
throw error;
}
};// Speech to Text
const convertSpeechToText = async (audioFile: File) => {
const formData = new FormData();
formData.append('audio', audioFile);
try {
const response = await axios.post('/api/speech-to-text', formData, {
headers: {
'Content-Type': 'multipart/form-data'
}
});
return response.data;
} catch (error) {
console.error('Error converting speech to text:', error);
throw error;
}
};// Text to Speech
const convertTextToSpeech = async (text: string) => {
try {
const response = await axios.post('/api/text-to-speech',
{ text },
{ responseType: 'blob' }
);
return response.data;
} catch (error) {
console.error('Error converting text to speech:', error);
throw error;
}
};
```## Running the Application
1. Development mode:
```bash
npm run dev
```2. Production mode:
```bash
npm run build
npm start
```## Setting Up Google Cloud Credentials
1. Create a project in Google Cloud Console
2. Enable Speech-to-Text and Text-to-Speech APIs
3. Create a service account and download the key file
4. Set the path to your key file in the `GOOGLE_APPLICATION_CREDENTIALS` environment variable## Rate Limiting
The API includes rate limiting to prevent abuse. Default settings:
- 100 requests per 15 minutes window
- Customize these values in the `.env` file## Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request## License
This project is licensed under the MIT License - see the LICENSE file for details.