{"id":21482799,"url":"https://github.com/ibmstreams/streamsx.sttgateway","last_synced_at":"2025-03-17T09:22:42.772Z","repository":{"id":74839700,"uuid":"145131652","full_name":"IBMStreams/streamsx.sttgateway","owner":"IBMStreams","description":"This toolkit does Speech To Text transcription using an external provider such as the IBM Watson STT cloud service.","archived":false,"fork":false,"pushed_at":"2022-05-16T22:15:09.000Z","size":20571,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":12,"default_branch":"develop","last_synced_at":"2025-01-23T18:50:36.772Z","etag":null,"topics":["ibm-cloud","ibm-cloud-private","ibm-streams","speech-to-text","stream-processing","stt","toolkit","watson-speech-to-text"],"latest_commit_sha":null,"homepage":"https://ibmstreams.github.io/streamsx.sttgateway/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IBMStreams.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-17T14:38:51.000Z","updated_at":"2022-01-10T18:45:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"f56b1d95-2339-4b54-8e2b-4897182b7d9d","html_url":"https://github.com/IBMStreams/streamsx.sttgateway","commit_stats":null,"previous_names":[],"tags_count":33,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fstreamsx.sttgateway","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fstreamsx.sttgateway/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fstreamsx.sttgateway/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fstreamsx.sttgateway/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IBMStreams","download_url":"https://codeload.github.com/IBMStreams/streamsx.sttgateway/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244006303,"owners_count":20382443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ibm-cloud","ibm-cloud-private","ibm-streams","speech-to-text","stream-processing","stt","toolkit","watson-speech-to-text"],"created_at":"2024-11-23T12:37:15.812Z","updated_at":"2025-03-17T09:22:42.756Z","avatar_url":"https://github.com/IBMStreams.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# STT Gateway toolkit for IBM Streams\n\n## Purpose\nThis toolkit is designed to ingest audio data either stored in files (.wav, .mp3 etc. for a batch workload) or streamed through a telephony infrastructure (for a real-time workload). It then transcribes that audio into text via the IBM Watson STT (Speech To Text) service running on the IBM public cloud or on the IBM Cloud Pak for Data (CP4D i.e. private cloud).\n\nIt provides the following two operators to realize that purpose.\n\n**IBMVoiceGatewaySource** is a source operator that can be used to ingest speech data from the IBM Voice Gateway product v1.0.3.0 or higher. Such speech data comes from multiple live telephone conversations happening between different pairs of speakers e-g: customers and call center agents.\n\n**WatsonSTT** is an analytic operator that can be used to transcribe speech data into text either in real-time or in batch mode.\n\n## Architectural patterns enabled by this toolkit\n1. For the **real-time** speech to text transcription, following are the possible architectural patterns.\n\n- \u003cspan style=\"color:green\"\u003eYour Telephony SIPREC--\u003eIBM Voice Gateway--\u003eIBM Streams\u003c--\u003eWatson Speech To Text on IBM Public Cloud\u003c/span\u003e\n \n- \u003cspan style=\"color:blue\"\u003eYour Telephony SIPREC--\u003eIBM Voice Gateway--\u003eIBM Streams\u003c--\u003eWatson Speech To Text on IBM Cloud Pak for Data (CP4D)\u003c/span\u003e\n \n- \u003cspan style=\"color:purple\"\u003eYour Telephony SIPREC--\u003eIBM Voice Gateway--\u003eIBM Streams\u003c--\u003eWatson Speech To Text engine embedded inside an IBM Streams operator\u003c/span\u003e\n\n2. For the **batch (post call)** speech to text transcription, following are the possible architectural patterns.\n \n- \u003cspan style=\"color:green\"\u003eSpeech data files in a directory--\u003eIBM Streams\u003c--\u003eWatson Speech To Text on IBM Public Cloud\u003c/span\u003e\n \n- \u003cspan style=\"color:blue\"\u003eSpeech data files in a directory--\u003eIBM Streams\u003c--\u003eWatson Speech To Text on IBM Cloud Pak for Data (CP4D)\u003c/span\u003e\n \n- \u003cspan style=\"color:purple\"\u003eSpeech data files in a directory--\u003eIBM Streams\u003c--\u003eWatson Speech To Text engine embedded inside an IBM Streams operator\u003c/span\u003e\n\n## All-in-one Speech to text analytics, Call Recording and Call Replay\nAs described above, Speech To Text is the core feature of this toolkit. In addition, this toolkit enables call recording and call replay. It includes two real-world tested examples that show how to do live voice call recording and call replay from the pre-recorded calls. Many other vendors provide proprietary, rigid black-box solutions for call recording at a hefty price tag with either a non-existent or a minimal call replay facility. But, this toolkit gives those two features for free in a completely open and a flexible manner for users to beneift from them. Such a benefit allows customers to control where the recorded data gets stored in a standard Mu-Law format as well as accessing and using that data for their other purposes. All of them combined, it is a compelling way in which the IBM Voice Gateway, IBM Streams and IBM Watson Speech To Text offerings put the customer in the driver's seat to gather real-time intelligence from their voice infrastructure.\n\n## A visual description of this toolkit's architecture\n![STT Gateway Architecture Diagram](https://github.com/IBMStreams/streamsx.sttgateway/blob/develop/samples/VoiceGatewayToStreamsToWatsonSTT/etc/stt-arch.png)\n\n## Documentation\n1. The official toolkit documentation with extensive details is available at this URL: https://ibmstreams.github.io/streamsx.sttgateway/\n\n2. A file named sttgateway-tech-brief.txt available at this tooolkit's top-level directory also provides a good amount of information about what this toolkit does, how it can be built and how it can be used in the IBM Streams applications.\n\n3. The official documentation for the IBM Voice Gateway product is available [here](https://www.ibm.com/support/knowledgecenter/SS4U29/whatsnew.html)\n\n4. The official documentation for the IBM Watson Speech To Text service is available [here](https://www.ibm.com/watson/services/speech-to-text)\n\n## Requirements\nThere are certain important requirements that need to be satisfied in order to use the IBM Streams STT Gateway toolkit in Streams applications. Such requirements are explained below.\n\n**Note:** This toolkit is **not** supported on Red Hat Enterprise Linux Workstation release **6.x**\n    \n**Note:** This toolkit requires c++11 support.\n\n1. Network connectivity to the IBM Watson Speech To Text (STT) service running either on the public cloud or on the Cloud Pak for Data (CP4D) is needed from the IBM Streams Linux machines where this toolkit will be used. The same is true to integrate with the IBM Voice Gateway product for the use cases involving speech data ingestion for live voice calls.\n\n2. This toolkit uses Websocket to communicate with the IBM Voice Gateway and the Watson STT service. A valid IAM access token is needed to use the Watson STT service on the public cloud and a valid access token to use the Watson STT service on the CP4D. So, users of this toolkit must provide their public cloud STT service instance's API key or the CP4D STT service instance's access token when launching the Streams application(s) that will have a dependency on this toolkit. When using the API key from the public cloud, a utility SPL composite named IAMAccessTokenGenerator available in this toolkit will be able to generate the IAM access token and then subsequently refresh that token to keep it valid. A Streams application employing this toolkit can make use of that utility composite to generate the necessary IAM access token needed in the public cloud. Please do more reading about the IAM access token from [here](https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-websockets#WSopen).\n\n3. On the IBM Streams application development machine(s) (where the application code is compiled to create the application bundle), it is necessary to download and install the toolkit release bundle. The toolkit release bundle contains the necessary ant build script to download the required external libraries: boost, websocketpp and rapidjson. For the essential steps to meet this requirement, please refer to the above-mentioned documentation URL or a file named sttgateway-tech-brief.txt available at this tooolkit's top-level directory.\n\n4. On the IBM Streams application development machine(s) the following toolkits are required:\n* com.ibm.streamsx.inet version 2.3.6 or higher\n* com.ibm.streamsx.json version 1.4.6 or higher\n* com.ibm.streamsx.websocket version 1.0.6 or higher\n\n5. On the IBM Streams application machines, please ensure that the openssl and libcurl are installed including the openssl-devel and libcurl-devel. This is required by the toolkit dependency to streamsx.websocket and the streamsx.inet toolkits. This is required by this toolkit to generate and refresh and refresh the IAM access token which is a must for the STT service on public cloud as well as for the TLS support.\n\n6. For the IBM Streams and the IBM Voice Gateway products to work together, certain configuration steps must be done in both the products. For more details on that, please refer to this toolkit's documentation URL or the sttgateway-tech-brief.txt available at this tooolkit's top-level directory.\n\n## External libraries used\n* boost 1.73.0\n* websocketpp 0.8.2\n* rapidjson 1.1.0\n\n## Example usage of this toolkit inside a Streams application\nHere is a code snippet that shows how to invoke the **WatsonSTT** operator available in this toolkit with a subset of supported features:\n\n```\nuse com.ibm.streamsx.sttgateway.watson::*;\n\n/*\nInvoke one or more instances of the WatsonSTT operator.\nYou can send the audio data to this operator all at once or \nyou can send the audio data for the live-use case as it becomes\navailable from your telephony infrastructure.\nAvoid feeding audio data coming from more than one data source into this \nparallel region which may cause erroneous transcription results.\n\nNOTE: The WatsonSTT operator allows fusing multiple instances of\nthis operator into a single PE. This will help in reducing the \ntotal number of CPU cores used in running the application.\nIt is better to fuse only when there are upto a maximum of \nten WatsonSTT operator instances. Anything more than that, it is \nbetter not to fuse them in order for the application logic to\nwork correctly.\n*/\n@parallel(width = $numberOfSTTEngines, \npartitionBy=[{port=ABC, attributes=[conversationId]}], broadcast=[AT])\n(stream\u003cSTTResult_t\u003e STTResult) as STT = WatsonSTT(AudioBlobContent as ABC; IamAccessToken, AccessTokenForCP4D as AT) {\n   param\n      uri: $sttUri;\n      baseLanguageModel: $sttBaseLanguageModel;\n\t\t\t\n   output\n      STTResult: conversationId = conversationId, \n                 utteranceNumber = getUtteranceNumber(),\n                 utteranceText = getUtteranceText(),\n                 utteranceStartTime = getUtteranceStartTime(),\n                 utteranceEndTime = getUtteranceEndTime(),\n                 finalizedUtterance = isFinalizedUtterance(),\n                 transcriptionCompleted = isTranscriptionCompleted(),\n                 sttErrorMessage = getSTTErrorMessage();\n}\n```\n\nA built-in example inside this toolkit can be compiled and launched with the default STT options to use the STT service on public cloud as shown below. The sample AudioFileWatsonSTT required that the stt service connection details are provided as application configuration properties. To create the application configuration, you can use the following command.\n\n```\nstreamtool mkappconfig --description 'connection configuration for IBM Cloud Watson stt service' \\\n\t--property 'apiKey=\u003cyour api key\u003e' \\\n\t--property 'iamTokenURL=https://iam.cloud.ibm.com/identity/token' \\\n\t--property 'url=\u003cyour stt instance uri\u003e' \\\n\tsttConnection\n```\n\n```\ncd   streamsx.sttgateway/samples/AudioFileWatsonSTT\nmake\nst  submitjob  -d  \u003cYOUR_STREAMS_DOMAIN\u003e  -i  \u003cYOUR_STREAMS_INSTANCE\u003e  output/com.ibm.streamsx.sttgateway.sample.watsonstt.AudioFileWatsonSTT.sab\n```\n\nFollowing IBM Streams job sumission command shows how to override the default values with your own as needed for the various STT options:\n\n```\ncd   streamsx.sttgateway/samples/AudioRawWatsonSTT\nmake\nst submitjob  -d  \u003cYOUR_STREAMS_DOMAIN\u003e  -i  \u003cYOUR_STREAMS_INSTANCE\u003e  output/com.ibm.streamsx.sttgateway.sample.watsonstt.AudioRawWatsonSTT.sab -P  sttApiKey=\u003cYOUR_WATSON_STT_SERVICE_API_KEY\u003e  -P sttBaseLanguageModel=en-US_NarrowbandModel  -P contentType=\"audio/wav\"    -P filterProfanity=true   -P keywordsSpottingThreshold=0.294   -P keywordsToBeSpotted=\"['country', 'learning', 'IBM', 'model']\"   -P smartFormattingNeeded=true   -P wordAlternativesThreshold=0.251   -P maxUtteranceAlternatives=5   -P audioBlobFragmentSize=32768   -P sttLiveMetricsUpdateNeeded=true  -P audioDir=\u003cYOUR_AUDIO_FILES_DIRECTORY\u003e   -P numberOfSTTEngines=50\n```\n\nFollowing is another way to run the same application to access the STT service on the IBM Cloud Pak for Data (CP4D). STT URI shown below is for an illustrative purpose and you must use a valid STT URI obtained from your CP4D cluster.\n\n```\nst  submitjob  -d  \u003cYOUR_STREAMS_DOMAIN\u003e  -i  \u003cYOUR_STREAMS_INSTANCE\u003e  output/com.ibm.streamsx.sttgateway.sample.watsonstt.AudioFileWatsonSTT.sab  -P  sttOnCP4DAccessToken=\u003cYOUR_CP4D_STT_SERVICE_ACCESS_TOKEN\u003e  -P  sttUri=wss://b0610b07:31843/speech-to-text/ibm-wc/instances/1567608964/api/v1/recognize \n```\n\nIf you are planning to ingest the speech data from live voice calls, then you can invoke the **IBMVoiceGatewaySource** operator as shown below.\n\n```\n(stream\u003cBinarySpeech_t\u003e BinarySpeechData as BSD) as VoiceGatewayInferface = \n IBMVoiceGatewaySource() {\n    logic\n       state: {\n          // Initialize the default TLS certificate file name if the \n          // user didn't provide his or her own.\n          rstring _certificateFileName = \n             ($certificateFileName != \"\") ?\n              $certificateFileName : getThisToolkitDir() + \"/etc/ws-server.pem\";\n       }\n\t\t\t\t\n       param\n          tlsPort: $tlsPort;\n          certificateFileName: _certificateFileName;\n          initDelay: $initDelayBeforeSendingDataToSttEngines;\n\t\t\t\n       // Get these values via custom output functions provided by this operator.\n       output\n          BSD: vgwSessionId = getIBMVoiceGatewaySessionId(),\n          callStartDateTime = getCallStartDateTime(), \n          isCustomerSpeechData = isCustomerSpeechData(),\n          vgwVoiceChannelNumber = getVoiceChannelNumber(),\n          callerPhoneNumber = getCallerPhoneNumber(),\n          agentPhoneNumber = getAgentPhoneNumber(),\n          speechDataFragmentCnt = getTupleCnt(),\n          totalSpeechDataBytesReceived = getTotalSpeechDataBytesReceived();\n}\n\n```\n\nIn addition to the code snippet shown above to invoke the IBMVoiceGatewaySource operator, one must do additional logic to allocate a dedicated WatsonSTT operator instance for each voice channel in a given call. A demo application is available for this toolkit has that logic which can be reused in any other application. That particular example can be compiled and launched to ingest speech data from the IBM Voice Gateway for seven concurrent voice calls and send it to the WatsonSTT operator running with most of the default STT options to use the STT service on public cloud as shown below.\n\n```\ncd   streamsx.sttgateway/samples/VoiceGatewayToStreamsToWatsonSTT\nmake\nst  submitjob  -d  \u003cYOUR_STREAMS_DOMAIN\u003e  -i  \u003cYOUR_STREAMS_INSTANCE\u003e  output/com.ibm.streamsx.sttgateway.sample.watsonstt.VoiceGatewayToStreamsToWatsonSTT.sab -P tlsPort=9443  -P numberOfSTTEngines=14  -P sttApiKey=\u003cYOUR_WATSON_STT_SERVICE_API_KEY\u003e  -P contentType=\"audio/mulaw;rate=8000\"\n```\n\n**Special Note**\nFor those customers who are using the speech to text engine embedded in the com.ibm.streams.speech2text.watson::WatsonS2T operator, the following example is available as a reference application to exploit that operator in a real-time voice call analytics scenario. It can be compiled and executed as shown below. You have to replace the hardcoded paths and IP addresses to suit your environment.\n\n```\ncd   streamsx.sttgateway/samples/VoiceGatewayToStreamsToWatsonS2T\nmake\nst submitjob -P tlsPort=9443 -P vgwSessionLoggingNeeded=false -P numberOfS2TEngines=80 -P WatsonS2TConfigFile=/home/streamsadmin/toolkit.speech2text-v2.12.0/model/en_US.8kHz.general.diarization.low_latency.pset -P WatsonS2TModelFile=$HOME/toolkit.speech2text-v2.12.0/model/en_US.8kHz.general.pkg -P ipv6Available=false -P writeTranscriptionResultsToFiles=true -P sendTranscriptionResultsToHttpEndpoint=true -P httpEndpointForSendingTranscriptionResults=http://172.30.105.11:9080 -P callRecordingWriteDirectory=/homes/hny5/sen/call-recording-write -P callRecordingReadDirectory=/homes/hny5/sen/call-recording-read -P numberOfCallReplayEngines=15 -C fusionScheme=legacy  output/com.ibm.streamsx.sttgateway.sample.watsons2t.VoiceGatewayToStreamsToWatsonS2T.sab\n```\n## Examples that showcase this toolkit's features\nThere are many examples available in this toolkit that can be compiled and tested. Couple of them are generic real-world solutions running in production that can be customized and used when needed.\n\nIf you have no need for the call recording and call replay features, you can use the two examples below that end with the word Mini. It will cut down the extra logic to result in a fewer number of overall operators.\n\n* [AccessTokenGenerator](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/AccessTokenGenerator)\n* [AudioFileWatsonSTT](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/AudioFileWatsonSTT)\n* [AudioFileWatsonSTTAllOutput](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/AudioFileWatsonSTTAllOutput)\n* [AudioRawWatsonSTT](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/AudioRawWatsonSTT)\n* [AudioRawWatsonSTTAllOutput](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/AudioRawWatsonSTTAllOutput)\n* [VoiceGatewayToStreamsToWatsonSTT](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VoiceGatewayToStreamsToWatsonSTT)\n* [VoiceGatewayToStreamsToWatsonS2T](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VoiceGatewayToStreamsToWatsonS2T)\n* [STTGatewayUtils](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/STTGatewayUtils)\n* [VgwDataRouter](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VgwDataRouter)\n* [VgwDataRouterToWatsonS2T](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VgwDataRouterToWatsonS2T)\n* [VgwDataRouterToWatsonSTT](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VgwDataRouterToWatsonSTT)\n* [VgwDataRouterMini](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VgwDataRouterMini)\n* [VgwDataRouterToWatsonSTTMini](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VgwDataRouterToWatsonSTTMini)\n* [stt_results_http_receiver](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/stt_results_http_receiver)\n* [audio_files](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/audio-files)\n* [VoiceDataSimulator](https://github.com/IBMStreams/streamsx.sttgateway/tree/develop/samples/VoiceDataSimulator)\n\n## WHATS NEW\n\nsee: [CHANGELOG.md](com.ibm.streamsx.sttgateway/CHANGELOG.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibmstreams%2Fstreamsx.sttgateway","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fibmstreams%2Fstreamsx.sttgateway","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibmstreams%2Fstreamsx.sttgateway/lists"}