https://github.com/abhi347/vid-to-speech-api-json

Generates google speech api response from any video using google cloud functions and google cloud storage
https://github.com/abhi347/vid-to-speech-api-json

ffmpeg google-api google-cloud google-cloud-functions google-cloud-platform google-cloud-storage google-speech javascript nodejs

Last synced: about 1 month ago
JSON representation

Generates google speech api response from any video using google cloud functions and google cloud storage

Host: GitHub
URL: https://github.com/abhi347/vid-to-speech-api-json
Owner: Abhi347
License: mit
Created: 2018-03-18T15:50:30.000Z (about 7 years ago)
Default Branch: main
Last Pushed: 2023-01-25T04:02:17.000Z (over 2 years ago)
Last Synced: 2025-03-26T19:51:15.811Z (about 2 months ago)
Topics: ffmpeg, google-api, google-cloud, google-cloud-functions, google-cloud-platform, google-cloud-storage, google-speech, javascript, nodejs
Language: JavaScript
Homepage:
Size: 168 KB
Stars: 53
Watchers: 3
Forks: 19
Open Issues: 61
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Vid to Speech API json

This project when run will generate a speech api json file from the provided videos.

## Current Status

The Project is not updated in a very long time. I am being really busy, but I am trying to revive this. Feel free to provide suggestions and improvements. I am available on Twitter [@Abhi347](https://twitter.com/abhi347).

## Prerequisites

- A Google Cloud Platform project

- gcloud sdk installed and initialised with proper GCP project configuration. Check gcloud configuration using `gcloud config list` command. If no configuration is found, run `gcloud init`.

- Three separate google cloud storage buckets with regional or multiregional settings in the same gcp project. Add the name of the buckets to `app-config.json`.

- Access to Google Cloud Functions.

## Steps to initialize

- Run `npm install`.

- Replace the `video`, `audio` and `speechResponse` bucket names in `app-config.json` file with the name of your own GCS bucket names.

- Deploy both functions using `npm run deploy` command.

- Upload a video to the GCS bucket the name of which you have mentioned as your video bucket in `app-config.json`.

- After some time the audio of the video file will be extracted in your `audio` bucket as a flac file of the same name as the video file.

- After few more minutes depending on the size of the video the google speech api response json will be available in your `speechResponse` bucket.

- Upload more videos to have multiple responses.

## Process flow

- The cloud function `extractAudio` will watch for any new upload in the `video` bucket.

- As soon as a new file is uploaded, the `extractAudio` function will extract the audio from the file and upload the audio as a flac file with the same name as the video file to the `audio` bucket.

- The second cloud function `transcribeAudio` will watch any new upload in the `audio` bucket.

- As soon as the `extractAudio` function uploads the extracted flac audio file to the `audio` bucket, the `transcribeAudio` function will be triggered.

- `transcribeAudio` function will then upload the audio file to the google speech api.

- As soon as the response is received from google speech api, the response is converted to the json file and is uploaded to the `speechResponse` bucket.

## Known Issues

- Currently both the functions doesn't check whether the uploaded files are of video or audio type. It'll run and throw error if you try to upload any other file than a video file to `video` bucket or any other file than an flac audio file to `audio` bucket.

- Extracting the audio is fast, although same can't be said about transcribing the audio using google speech api. The function will timeout, thus you'll have to go to your GCP console and increase the timeout of at least the `transcribeAudio` cloud function by editing it.

## Source

The original idea and most of the source is taken from the Hackernoon article mentioned below. I have created this repo, because they didn't provided any working code, just code fragments. The code is modularised and the deploy script is created by me.

- [HackerNoon Article](https://hackernoon.com/making-audio-searchable-with-cloud-speech-36ce63b6b4d3)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abhi347/vid-to-speech-api-json

Awesome Lists containing this project

README