Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kimtth/azure-speech-text-batch-speaker

๐Ÿ”Š๐Ÿ“Azure Speech Services Batch Transcription API Python with Speaker recognition
https://github.com/kimtth/azure-speech-text-batch-speaker

Last synced: 1 day ago
JSON representation

๐Ÿ”Š๐Ÿ“Azure Speech Services Batch Transcription API Python with Speaker recognition

Awesome Lists containing this project

README

        

# Batch Processing with Azure Speech to Text + Speaker Identification

The availability of speaker recognition in Azure Speech to Text batch processing (python) is not clearly explained. [doc](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription)

The provided information about this feature suggests that it is not available in the Azure service, but this is not accurate. [ref](https://stackoverflow.com/questions/65491550/how-to-identify-speaker-using-python-sdk-in-using-azure-cognitive-speech-transla)

In fact, it can be easily achieved by adding a few parameters to the batch client.

## Setup Steps

1. Install the required dependencies.

```python
pip install -r requirements.txt
```

2. pip install .\python_client

Before proceeding this step, please refer to steps written on the `Download and install the API client library`.`

```python
pip install .\python_client
```

## Parameters for Speaker Identification

speech.py: By configuring these parameters, the "speaker" attribute will be included in the JSON data.

```json
{
"recognitionStatus": "Success",
"channel": 0,
"speaker": 1,
"offset": "PT3.5S",
"duration": "PT2.08S",
"offsetInTicks": 35000000.0,
"durationInTicks": 20800000.0,
"nBest": [
{
"confidence": 0.5747593,
"lexical": "Hello speech",
"itn": "Hello speech",
"maskedITN": "Hello speech",
"display": "Hello speechใ€‚"
}
]
},

```

1. Set `diarization_enabled` to True.
1. `DiarizationSpeakersProperties` specifies the number of speakers.

```cmd
properties.diarization_enabled = True
properties.diarization = swagger_client.DiarizationProperties(
swagger_client.DiarizationSpeakersProperties(min_count=1, max_count=10))
```

## Download and install the API client library

To execute the sample you need to generate the Python library for the REST API which is generated through [Swagger](swagger.io).

Follow these steps for the installation:

1. Go to https://editor.swagger.io.
1. Click **File**, then click **Import URL**.
1. Enter the Swagger URL for the Speech Services API: `https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/Speech/SpeechToText/stable/v3.1/speechtotext.json`.
1. Click **Generate Client** and select **Python**.
1. Save the client library.
1. Extract the downloaded python-client-generated.zip somewhere in your file system.
1. Install the extracted python-client module in your Python environment using pip: `pip install path/to/package/python-client`.
1. The installed package has the name `swagger_client`. You can check that the installation worked using the command `python -c "import swagger_client"`.

## Output

The transcribed text is output to the file specified in the log settings.

```python
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler("output3.txt", encoding='utf-8'),
logging.StreamHandler()
]
)
```

Results generated by Azure Speech to Text batch processing.

```json
{
"source": "...",
"timestamp": "2023-07-10T14:28:16Z",
"durationInTicks": 25800000,
"duration": "PT2.58S",
"combinedRecognizedPhrases": [
{
"channel": 0,
"lexical": "hello world",
"itn": "hello world",
"maskedITN": "hello world",
"display": "Hello world."
}
],
"recognizedPhrases": [
{
"recognitionStatus": "Success",
"channel": 0,
"offset": "PT0.76S",
"duration": "PT1.32S",
"offsetInTicks": 7600000.0,
"durationInTicks": 13200000.0,
"nBest": [
{
"confidence": 0.5643338,
"lexical": "hello world",
"itn": "hello world",
"maskedITN": "hello world",
"display": "Hello world.",
"displayWords": [
{
"displayText": "Hello",
"offset": "PT0.76S",
"duration": "PT0.76S",
"offsetInTicks": 7600000.0,
"durationInTicks": 7600000.0
}
]
}
]
}
]
}
```

## Description

- cli_conversation_transcribe.py: Streams MP3 audio using GStreamer and sends it to Azure Speech-to-Text for transcription.
- cli_multiproc.py: Divides MP3 files into multiple chunks using PyDub's silent detection and then submits them to Azure Speech-to-Text for transcription, allowing for faster processing.
- cli_s2t_console.py: `Please note: Use this code for batch processing with speaker recognition` Performs batch processing using Azure Speech to Text with speaker identification.
- speech.py: Swagger Python client interface.
- web_conversation_transcribe.py: `Please note: Do not use this code` as it has been discontinued due to a Streamlit thread context issue.
- web_main.py: Performs batch processing with Azure Speech to Text and speaker identification using a Streamlit web-based user interface.