{"id":20786071,"url":"https://github.com/thomassuedbroecker/watson-stt-invocation","last_synced_at":"2026-05-06T19:36:16.430Z","repository":{"id":127030672,"uuid":"567217066","full_name":"thomassuedbroecker/watson-stt-invocation","owner":"thomassuedbroecker","description":"Simple example to invoke Watson STT by command line.","archived":false,"fork":false,"pushed_at":"2022-12-03T12:12:34.000Z","size":3529,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-18T09:52:51.243Z","etag":null,"topics":["bash","curl","ibmcloudcli","stt","watson-speech-to-text"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomassuedbroecker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-17T10:21:18.000Z","updated_at":"2023-02-03T10:27:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"9a7b110b-a785-405f-825b-49dc823882f3","html_url":"https://github.com/thomassuedbroecker/watson-stt-invocation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomassuedbroecker%2Fwatson-stt-invocation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomassuedbroecker%2Fwatson-stt-invocation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomassuedbroecker%2Fwatson-stt-invocation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomassuedbroecker%2Fwatson-stt-invocation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomassuedbroecker","download_url":"https://codeload.github.com/thomassuedbroecker/watson-stt-invocation/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243129790,"owners_count":20241068,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","curl","ibmcloudcli","stt","watson-speech-to-text"],"created_at":"2024-11-17T14:50:32.924Z","updated_at":"2026-05-06T19:36:16.387Z","avatar_url":"https://github.com/thomassuedbroecker.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Watson STT invocation\n\nRelated blog post[WATSON SPEECH TO TEXT LANGUAGE MODEL CUSTOMIZATION](https://suedbroecker.net/2022/11/19/watson-speech-to-text-language-model-customization/).\n\nThis project contains a bash script automation example for the IBM Cloud Watson Speech to Text service.\n\nThe automation contains two flows:\n\n1. Basic usage for extract the text from an audio saved in [FLAC format](https://simple.wikipedia.org/wiki/FLAC) using a base language model.\n2. Customization of an existing language model for a domain in this example for drums ;-) \n\n_Note:_ If you record your own voice for example in a [M4A format](https://en.wikipedia.org/wiki/MP4_file_format) here is a possibiltiy to convert M4A to [FLAC format](https://simple.wikipedia.org/wiki/FLAC) for free with [Converio](https://convertio.co/m4a-flac/).\n\n### Prerequsites\n\n* [IBM Cloud CLI](https://cloud.ibm.com/docs/cli?topic=cli-getting-started) installed\n* A [Watson Text to Speech](https://cloud.ibm.com/catalog/services/speech-to-text#about) service with an [Plus plan](https://cloud.ibm.com/docs/billing-usage?topic=billing-usage-changing\u0026interface=ui) is created.\n* Install the cURL command line on the local computer\n\nJust execute following steps to run the example.\n\n### Step 1: Clone the project\n\n```sh\ngit clone https://github.com/thomassuedbroecker/watson-stt-invocation.git\ncd watson-stt-invocation\n```\n\n### Step 2: Configure the `.env` file\n\n```sh\ncp ./code/.env-template ./code/.env\n```\n\n### Step 3: Set the correct values in the `.env` file\n\n* [Create an IBM Cloud APIKEY](https://www.ibm.com/docs/en/app-connect/containers_cd?topic=servers-creating-cloud-api-key)\n\n```sh\nROOTFOLDER=\"YOUR_PATH\"\nRESOURCE_GROUP=\"default\"\nREGION=\"us-south\"\nAPIKEY=\"YOUR_IBMCLOUD_APIKEY\"\nS2T_SERVICE_INSTANCE_NAME=\"YOUR_S2T_SERVICE_NAME\"\n```\n\n### Step 4: Invoke the bash automation\n\n```sh\nsh code/use-speech-to-text.sh\n```\n\n* Example output\n\n```sh\n#*******************\n# Customization flow\n#*******************\n#------------------\n# Create and train a Custom Language Model\n#------------------\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   160  100    61  100    99    170    277 --:--:-- --:--:-- --:--:--   458\n\ncustomization_id: {\"customization_id\": \"7868e363-4afa-4d64-96fd-c506774eebca\"}\n{\"customizations\": [{\n   \"owner\": \"d3443a47-877c-496d-95b9-f62bce50bb38\",\n   \"base_model_name\": \"en-US_BroadbandModel\",\n   \"customization_id\": \"7868e363-4afa-4d64-96fd-c506774eebca\",\n   \"dialect\": \"en-US\",\n   \"versions\": [\"en-US_BroadbandModel.v2020-01-16\"],\n   \"created\": \"2022-11-18T13:32:44.945Z\",\n   \"name\": \"MyDrums-1\",\n   \"description\": \"MyDrums-demo\",\n   \"progress\": 0,\n   \"language\": \"en-US\",\n   \"updated\": \"2022-11-18T13:32:44.945Z\",\n   \"status\": \"pending\"\n}]}\n{}\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   104  100   104    0     0    319      0 --:--:-- --:--:-- --:--:--   330\nResponse: {\n   \"out_of_vocabulary_words\": 1,\n   \"total_words\": 43,\n   \"name\": \"drums1\",\n   \"status\": \"analyzed\"\n}\nStatus: %-15s ( %d )\n analyzed 10\n{\"corpora\": [{\n   \"out_of_vocabulary_words\": 1,\n   \"total_words\": 43,\n   \"name\": \"drums1\",\n   \"status\": \"analyzed\"\n}]}\n\nTrain ...\n{}\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   449  100   449    0     0   1537      0 --:--:-- --:--:-- --:--:--  1586\nResponse: {\n   \"owner\": \"d3443a47-XXX-XXXX-95b9-f62bce50bb38\",\n   \"base_model_name\": \"en-US_BroadbandModel\",\n   \"customization_id\": \"7868e363-XXX-XXXX-96fd-c506774eebca\",\n   \"dialect\": \"en-US\",\n   \"versions\": [\"en-US_BroadbandModel.v2020-01-16\"],\n   \"created\": \"2022-11-XXX-XXXX\",\n   \"name\": \"MyDrums-1\",\n   \"description\": \"MyDrums-demo\",\n   \"progress\": 0,\n   \"language\": \"en-US\",\n   \"updated\": \"2022-11-XXX-XXXX\",\n   \"status\": \"training\"\n}\nStatus (training)\nStatus: %-15s ( %d )\n training 10\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   452  100   452    0     0   1293      0 --:--:-- --:--:-- --:--:--  1333\nResponse: {\n   \"owner\": \"d3443a47-XXX-XXXX-95b9-f62bce50bb38\",\n   \"base_model_name\": \"en-US_BroadbandModel\",\n   \"customization_id\": \"7868e363-XXX-XXXX-96fd-c506774eebca\",\n   \"dialect\": \"en-US\",\n   \"versions\": [\"en-US_BroadbandModel.v2020-01-16\"],\n   \"created\": \"2022-11-XXX-XXXX\",\n   \"name\": \"MyDrums-1\",\n   \"description\": \"MyDrums-demo\",\n   \"progress\": 100,\n   \"language\": \"en-US\",\n   \"updated\": \"2022-11-XXX-XXXX\",\n   \"status\": \"available\"\n}\nStatus (available)\nStatus: %-15s ( %d )\n available 20\n{\"words\": [{\n   \"display_as\": \"paradiddles\",\n   \"sounds_like\": [\"paradiddles\"],\n   \"count\": 1,\n   \"source\": [\"drums1\"],\n   \"word\": \"paradiddles\"\n}]}\n{\n   \"owner\": \"d3443a47-XXX-XXXX-95b9-f62bce50bb38\",\n   \"base_model_name\": \"en-US_BroadbandModel\",\n   \"customization_id\": \"7868e363-XXX-XXXX-96fd-c506774eebca\",\n   \"dialect\": \"en-US\",\n   \"versions\": [\"en-US_BroadbandModel.v2020-01-16\"],\n   \"created\": \"2022-11-XXX-XXXX\",\n   \"name\": \"MyDrums-1\",\n   \"description\": \"MyDrums-demo\",\n   \"progress\": 100,\n   \"language\": \"en-US\",\n   \"updated\": \"2022-11-XXX-XXXX\",\n   \"status\": \"available\"\n}\n#------------------\n# Verify a trained model by using an audio\n#------------------\ncustomization_id: 7868e363-XXX-XXXX-96fd-c506774eebca\nbasic_model: en-US_BroadbandModel\n\nTest audio ...\n {\n   \"result_index\": 0,\n   \"results\": [\n      {\n         \"final\": true,\n         \"alternatives\": [\n            {\n               \"transcript\": \"it's great to play the drums The hi hat is something very special \",\n               \"confidence\": 0.98\n            }\n         ]\n      },\n      {\n         \"final\": true,\n         \"alternatives\": [\n            {\n               \"transcript\": \"it forms the basis for many rhythms syncopations are sometimes distributed with paradiddles and they are creating a fantastic rhythm together with the snare and the bass drum and a splash \",\n               \"confidence\": 0.94\n            }\n         ]\n      }\n   ]\n}\n#*******************\n# Basic flow\n#*******************\n{\n   \"result_index\": 0,\n   \"results\": [\n      {\n         \"final\": true,\n         \"alternatives\": [\n            {\n               \"transcript\": \"hi this is my test for Watson \",\n               \"confidence\": 0.94\n            }\n         ]\n      },\n      {\n         \"final\": true,\n         \"alternatives\": [\n            {\n               \"transcript\": \"speech to text \",\n               \"confidence\": 0.99\n            }\n         ]\n      },\n      {\n         \"final\": true,\n         \"alternatives\": [\n            {\n               \"transcript\": \"check it out \",\n               \"confidence\": 0.99\n            }\n         ]\n      }\n   ]\n} \n...\n```\n### Additional information\n\nList of used API calls:\n\n* [recognize](https://cloud.ibm.com/apidocs/speech-to-text#recognize)\n* [models](https://cloud.ibm.com/apidocs/speech-to-text#listmodels)\n* [customizations - custom language models](https://cloud.ibm.com/apidocs/speech-to-text#createlanguagemodel)\n* [corpora - custom](https://cloud.ibm.com/apidocs/speech-to-text#listcorpora)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomassuedbroecker%2Fwatson-stt-invocation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomassuedbroecker%2Fwatson-stt-invocation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomassuedbroecker%2Fwatson-stt-invocation/lists"}