Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kimtth/azure-speech-text-batch-speaker
๐๐Azure Speech Services Batch Transcription API Python with Speaker recognition
https://github.com/kimtth/azure-speech-text-batch-speaker
Last synced: 1 day ago
JSON representation
๐๐Azure Speech Services Batch Transcription API Python with Speaker recognition
- Host: GitHub
- URL: https://github.com/kimtth/azure-speech-text-batch-speaker
- Owner: kimtth
- Created: 2023-09-28T05:07:51.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-28T08:08:30.000Z (about 1 year ago)
- Last Synced: 2024-04-16T14:10:20.585Z (8 months ago)
- Language: Python
- Size: 150 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Batch Processing with Azure Speech to Text + Speaker Identification
The availability of speaker recognition in Azure Speech to Text batch processing (python) is not clearly explained. [doc](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription)
The provided information about this feature suggests that it is not available in the Azure service, but this is not accurate. [ref](https://stackoverflow.com/questions/65491550/how-to-identify-speaker-using-python-sdk-in-using-azure-cognitive-speech-transla)
In fact, it can be easily achieved by adding a few parameters to the batch client.
## Setup Steps
1. Install the required dependencies.
```python
pip install -r requirements.txt
```2. pip install .\python_client
Before proceeding this step, please refer to steps written on the `Download and install the API client library`.`
```python
pip install .\python_client
```## Parameters for Speaker Identification
speech.py: By configuring these parameters, the "speaker" attribute will be included in the JSON data.
```json
{
"recognitionStatus": "Success",
"channel": 0,
"speaker": 1,
"offset": "PT3.5S",
"duration": "PT2.08S",
"offsetInTicks": 35000000.0,
"durationInTicks": 20800000.0,
"nBest": [
{
"confidence": 0.5747593,
"lexical": "Hello speech",
"itn": "Hello speech",
"maskedITN": "Hello speech",
"display": "Hello speechใ"
}
]
},```
1. Set `diarization_enabled` to True.
1. `DiarizationSpeakersProperties` specifies the number of speakers.```cmd
properties.diarization_enabled = True
properties.diarization = swagger_client.DiarizationProperties(
swagger_client.DiarizationSpeakersProperties(min_count=1, max_count=10))
```## Download and install the API client library
To execute the sample you need to generate the Python library for the REST API which is generated through [Swagger](swagger.io).
Follow these steps for the installation:
1. Go to https://editor.swagger.io.
1. Click **File**, then click **Import URL**.
1. Enter the Swagger URL for the Speech Services API: `https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/Speech/SpeechToText/stable/v3.1/speechtotext.json`.
1. Click **Generate Client** and select **Python**.
1. Save the client library.
1. Extract the downloaded python-client-generated.zip somewhere in your file system.
1. Install the extracted python-client module in your Python environment using pip: `pip install path/to/package/python-client`.
1. The installed package has the name `swagger_client`. You can check that the installation worked using the command `python -c "import swagger_client"`.## Output
The transcribed text is output to the file specified in the log settings.
```python
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler("output3.txt", encoding='utf-8'),
logging.StreamHandler()
]
)
```Results generated by Azure Speech to Text batch processing.
```json
{
"source": "...",
"timestamp": "2023-07-10T14:28:16Z",
"durationInTicks": 25800000,
"duration": "PT2.58S",
"combinedRecognizedPhrases": [
{
"channel": 0,
"lexical": "hello world",
"itn": "hello world",
"maskedITN": "hello world",
"display": "Hello world."
}
],
"recognizedPhrases": [
{
"recognitionStatus": "Success",
"channel": 0,
"offset": "PT0.76S",
"duration": "PT1.32S",
"offsetInTicks": 7600000.0,
"durationInTicks": 13200000.0,
"nBest": [
{
"confidence": 0.5643338,
"lexical": "hello world",
"itn": "hello world",
"maskedITN": "hello world",
"display": "Hello world.",
"displayWords": [
{
"displayText": "Hello",
"offset": "PT0.76S",
"duration": "PT0.76S",
"offsetInTicks": 7600000.0,
"durationInTicks": 7600000.0
}
]
}
]
}
]
}
```## Description
- cli_conversation_transcribe.py: Streams MP3 audio using GStreamer and sends it to Azure Speech-to-Text for transcription.
- cli_multiproc.py: Divides MP3 files into multiple chunks using PyDub's silent detection and then submits them to Azure Speech-to-Text for transcription, allowing for faster processing.
- cli_s2t_console.py: `Please note: Use this code for batch processing with speaker recognition` Performs batch processing using Azure Speech to Text with speaker identification.
- speech.py: Swagger Python client interface.
- web_conversation_transcribe.py: `Please note: Do not use this code` as it has been discontinued due to a Streamlit thread context issue.
- web_main.py: Performs batch processing with Azure Speech to Text and speaker identification using a Streamlit web-based user interface.