{"id":15600586,"url":"https://github.com/jason-fox/fox.jason.audiobook","last_synced_at":"2026-03-18T16:56:08.981Z","repository":{"id":54521606,"uuid":"182962292","full_name":"jason-fox/fox.jason.audiobook","owner":"jason-fox","description":"Transform DITA to speech","archived":false,"fork":false,"pushed_at":"2024-02-18T07:40:56.000Z","size":874,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-18T19:49:12.952Z","etag":null,"topics":["audio-processing","audiobook","bing-speech","dita","dita-ot","dita-ot-plugin","ffmpeg-script","mp3","speech-synthesis","ssml","text-to-speech","watson-speech"],"latest_commit_sha":null,"homepage":"https://jason-fox.github.io/dita-ot-plugins/audiobook","language":"XSLT","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jason-fox.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-23T07:46:37.000Z","updated_at":"2023-07-18T09:09:52.000Z","dependencies_parsed_at":"2024-02-17T16:28:47.282Z","dependency_job_id":"6647bf74-8cf5-4ac8-92ce-fb2c15abe154","html_url":"https://github.com/jason-fox/fox.jason.audiobook","commit_stats":{"total_commits":121,"total_committers":4,"mean_commits":30.25,"dds":0.05785123966942152,"last_synced_commit":"1c779680b9b2e06fa273d75764b9968d97e94de2"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/jason-fox/fox.jason.audiobook","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jason-fox%2Ffox.jason.audiobook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jason-fox%2Ffox.jason.audiobook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jason-fox%2Ffox.jason.audiobook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jason-fox%2Ffox.jason.audiobook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jason-fox","download_url":"https://codeload.github.com/jason-fox/fox.jason.audiobook/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jason-fox%2Ffox.jason.audiobook/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28952578,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T18:30:42.805Z","status":"ssl_error","status_checked_at":"2026-01-31T18:30:19.593Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-processing","audiobook","bing-speech","dita","dita-ot","dita-ot-plugin","ffmpeg-script","mp3","speech-synthesis","ssml","text-to-speech","watson-speech"],"created_at":"2024-10-03T02:04:39.975Z","updated_at":"2026-01-31T20:01:15.401Z","avatar_url":"https://github.com/jason-fox.png","language":"XSLT","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audiobook for DITA-OT [\u003cimg src=\"https://jason-fox.github.io/fox.jason.audiobook/audiobook.png\" align=\"right\" width=\"300\"\u003e](https://audiobookdita-ot.rtfd.io/)\n\n[![license](https://img.shields.io/github/license/jason-fox/fox.jason.audiobook.svg)](http://www.apache.org/licenses/LICENSE-2.0)\n[![DITA-OT 4.2](https://img.shields.io/badge/DITA--OT-4.2-green.svg)](http://www.dita-ot.org/4.2)\n[![CI](https://github.com/jason-fox/fox.jason.audiobook/workflows/CI/badge.svg)](https://github.com/jason-fox/fox.jason.audiobook/actions?query=workflow%3ACI)\n[![Coverage Status](https://coveralls.io/repos/github/jason-fox/fox.jason.audiobook/badge.svg?branch=master)](https://coveralls.io/github/jason-fox/fox.jason.audiobook?branch=master)\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=fox.jason.audiobook\u0026metric=alert_status)](https://sonarcloud.io/dashboard?id=fox.jason.audiobook)\n\nThis [DITA-OT Plug-in](https://www.dita-ot.org/plugins) transforms DITA to speech in the form of an audiobook.\n\n### DITA Topic\n\n```xml\n\u003ctask id=\"replacecover\" xml:lang=\"en-us\"\u003e\n  \u003ctitle\u003eReplace the cover of your system.\u003c/title\u003e\n  \u003cshortdesc\u003eThe cover needs to be put back on to reduce problems from dust.\u003c/shortdesc\u003e\n  \u003ctaskbody\u003e\n    \u003csteps\u003e\n      \u003cstep\u003e\n        \u003ccmd\u003eRetrieve the computer's cover from its safe place. Put it back on.\u003c/cmd\u003e\n      \u003c/step\u003e\n      \u003cstep\u003e\n        \u003ccmd\u003eRetrieve the screws from the safe place. Put them back in.\u003c/cmd\u003e\n      \u003c/step\u003e\n      \u003cstep\u003e\n        \u003ccmd\u003ePut away your screwdriver before you lose it.\u003c/cmd\u003e\n      \u003c/step\u003e\n    \u003c/steps\u003e\n  \u003c/taskbody\u003e\n\u003c/task\u003e\n```\n\n### MP3 Output File\n\n\u003caudio controls\u003e\n  \u003csource src=\"https://jason-fox.github.io/fox.jason.audiobook/replacecover.mp3\" type=\"audio/mpeg\"\u003e\n  \u003ca href=\"https://jason-fox.github.io/fox.jason.audiobook/replacecover.mp3\"\u003e\n    \u003cimg src=\"https://jason-fox.github.io/fox.jason.audiobook/mp3.png\"/\u003e\n  \u003c/a\u003e\n\u003c/audio\u003e\n\n:arrow_forward: [Video from DITA-OT Day 2019](https://youtu.be/icbLaNGdV8c)\n\n[![](https://jason-fox.github.io/fox.jason.audiobook/cloud-video.png)](https://youtu.be/icbLaNGdV8c)\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eTable of Contents\u003c/strong\u003e\u003c/summary\u003e\n\n-   [Install](#install)\n    -   [Installing DITA-OT](#installing-dita-ot)\n    -   [Installing the Plug-in](#installing-the-plug-in)\n    -   [Installing the FFMpeg tool](#installing-the-ffmpeg-tool)\n    -   [Signing up for a Text-to-Speech Service](#signing-up-for-a-text-to-speech-service)\n        -   [IBM Cloud Services](#ibm-cloud-services)\n        -   [Microsoft Azure](#microsoft-azure)\n-   [Usage](#usage)\n    -   [Invocation from the Command line](#invocation-from-the-command-line)\n        -   [Obtaining a series of SSML Files](#obtaining-a-series-of-ssml-files)\n        -   [Obtaining a series of MP3 Files](#obtaining-a-series-of-mp3-files)\n        -   [Creating an audiobook](#creating-an-audiobook)\n        -   [Parameter Reference](#parameter-reference)\n    -   [Selecting a voice to use](#selecting-a-voice-to-use)\n    -   [Marking up SSML tags.](#marking-up-ssml-tags)\n-   [Contribute](#contribute)\n-   [License](#license)\n\n\u003c/details\u003e\n\n## Install\n\nThe audiobook plug-in has been tested against [DITA-OT 3.x](http://www.dita-ot.org/download). It is recommended that you\nupgrade to the latest version.\n\n### Installing DITA-OT\n\n\u003ca href=\"https://www.dita-ot.org\"\u003e\u003cimg src=\"https://www.dita-ot.org/images/dita-ot-logo.svg\" align=\"right\" height=\"55\"\u003e\u003c/a\u003e\n\nThe DITA-OT Audiobook transform is a plug-in for the DITA Open Toolkit.\n\n-   Full installation instructions for downloading DITA-OT can be found\n    [here](https://www.dita-ot.org/4.0/topics/installing-client.html).\n\n    1.  Download the `dita-ot-4.2.zip` package from the project website at\n        [dita-ot.org/download](https://www.dita-ot.org/download)\n    2.  Extract the contents of the package to the directory where you want to install DITA-OT.\n    3.  **Optional**: Add the absolute path for the `bin` directory to the _PATH_ system variable.\n\n    This defines the necessary environment variable to run the `dita` command from the command line.\n\n```console\ncurl -LO https://github.com/dita-ot/dita-ot/releases/download/4.2/dita-ot-4.2.zip\nunzip -q dita-ot-4.2.zip\nrm dita-ot-4.2.zip\n```\n\n### Installing the Plug-in\n\n-   Run the plug-in installation command:\n\n```console\ndita install https://github.com/jason-fox/fox.jason.audiobook/archive/master.zip\n```\n\nThe `dita` command line tool requires no additional configuration.\n\n---\n\n### Installing the FFMpeg tool\n\n\u003ca href=\"https://ffmpeg.org\"\u003e\u003cimg src=\"https://tecadmin.net/wp-content/uploads/2013/11/ffmpeg-logo-370x250.png\" align=\"right\" height=\"80\"\u003e\u003c/a\u003e\n\nFFmpeg is a free software project consisting of a software suite of libraries and programs for handling video, audio,\nand other multimedia files and streams. FFmpeg is published under the GNU Lesser General Public License 2.1+ or GNU\nGeneral Public License 2+ (depending on which options are enabled).\n\nTo download a copy follow the instructions on the [Download page](https://ffmpeg.org/download.html)\n\n---\n\n### Signing up for a Text-to-Speech Service\n\nSeveral publically available **text-to-speech** cloud services are available for use, they typically offer a\n_try-before-you-buy_ option and generally offer sample access to the service for without cost. Upgrading to a paid\nversion will be necessary when transforming larger documents.\n\n---\n\n#### IBM Cloud Services\n\n\u003ca href=\"https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-gettingStarted\"\u003e\u003cimg src=\"https://www.nasuni.com/wp-content/uploads/2017/06/ibm-cloud.png\" align=\"right\" height=\"85\"\u003e\u003c/a\u003e\n\nThe IBM Text to Speech service processes text and natural language to generate synthesized audio output complete with\nappropriate cadence and intonation. It is available in several voices:\n\nIntroduction: [Getting Started](https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-gettingStarted)\n\nCreate an instance of the service:\n\n1.  Go to the [Text to Speech](https://cloud.ibm.com/catalog/services/text-to-speech) External link icon page in the IBM\n    Cloud Catalog.\n2.  Sign up for a free IBM Cloud account or log in.\n3.  Click Create.\n\nCopy the credentials to authenticate to your service instance:\n\n1.  From the [IBM Cloud dashboard](https://cloud.ibm.com/dashboard/apps) External link icon, click on your **Text to\n    Speech** service instance to go to the **Text to Speech** service dashboard page.\n2.  On the Manage page, click Show to view your credentials.\n3.  Copy the `API Key` and `URL` values.\n4.  Within the plug-in alter the file `cfg/configuration.properties` to hold your `API Key` and `URL`.\n\n---\n\n#### Microsoft Azure\n\n\u003ca href=\"https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started\"\u003e\u003cimg src=\"https://www.confluent.io/wp-content/uploads/MS-Azure_logo_stacked_c-gray_rgb.png\" align=\"right\" height=\"85\"\u003e\u003c/a\u003e\n\nThe Speech Services allow you to convert text into synthesized speech and get a list of supported voices for a region\nusing a set of REST APIs. Each available endpoint is associated with a region. A subscription key for the\nendpoint/region you plan to use is required.\n\nIntroduction: [Getting Started](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started)\n\nCreate an instance of the service:\n\n1.  Go to Try [Cognitive Services](https://azure.microsoft.com/try/cognitive-services/)\n2.  Select the Speech APIs tab.\n3.  Under Speech Services, select the Get API Key button.\n4.  Agree to the terms and select your locale from the drop-down menu.\n5.  Sign in by using your Microsoft, Facebook, LinkedIn, or GitHub account.\n\nYou can sign up for a free Microsoft account at the Microsoft account portal. To get started, click Sign in with\nMicrosoft and then, when asked to sign in, click Create one. Follow the steps to create and verify your new Microsoft\naccount.\n\nAfter you sign in to Try Cognitive Services, your free trial begins. The displayed webpage lists all the Azure Cognitive\nServices services for which you currently have trial subscriptions. Two subscription keys are listed beside Speech\nServices. You can use either key in your applications.\n\nCopy the credentials to authenticate to your service instance:\n\n1.  Copy either of the `API Key` and `Endpoint` values.\n2.  Within the plug-in alter the file `cfg/configuration.properties` to hold your `API Key` and `URL`.\n\n---\n\n## Usage\n\n### Invocation from the Command line\n\n#### Obtaining a series of SSML Files\n\nTo run, use the `ssml` transform.\n\n```console\nPATH_TO_DITA_OT/bin/dita -f ssml  -o out -i PATH_TO_DITAMAP\n```\n\nOnce the command has run, a `list.txt` and a series of `*.ssml` files will be available in the output directory.\n\n#### Obtaining a series of MP3 Files\n\nTo run, use the `mp3` transform.\n\n```console\nPATH_TO_DITA_OT/bin/dita -f mp3  -o out -i PATH_TO_DITAMAP --ssml.service=[bing|watson]\n```\n\nOnce the command has run, a `list.txt` and a series of `*.mp3` files will be available in the output directory.\n\n#### Creating an audiobook\n\nTo run, use the `audiobook` transform.\n\n```console\nPATH_TO_DITA_OT/bin/dita -f audiobook  -o out -i PATH_TO_DITAMAP --ssml.service=[bing|watson]\n```\n\nOnce the command has run, an `*.m4a` file will be created in the output directory.\n\n### Parameter Reference\n\n-   `ssml.service` - Decides which translation service to use:\n    -   `dummy` - Avoids accessing a Speech-to-Text service, uses a dummy MP3 file for all outputs\n    -   `custom` - Sends the SSML to an arbitrary URL using POST - use this to connect to proxies for Amazon\n        [Polly](https://docs.aws.amazon.com/polly/) or Google Cloud\n        [Text-to-Speech](https://cloud.google.com/text-to-speech/)\n    -   `watson` - Connects to the IBM Cloud Speech-to-Text service\n    -   `bing` - Connects to the Microsoft Speech-to-Text service\n-   `ssml.gender` - Prefered Voice Gender:\n    -   `male` - Use a male voice for text-to-speech where available.\n    -   `female` - Use a female voice for text-to-speech where available.\n-   `ssml.authentication.url` - URL for creating an OAuth token if needed for a service. Defaults to the value in\n    `configuration.properties`\n-   `ssml.output.format` - Output format override for a Speech-to-Text service. Defaults to the value in\n    `configuration.properties`\n-   `ssml.apikey` - API Key for the Speech-to-Text service. Defaults to the value in `configuration.properties`\n-   `ssml.url` - URL for a Speech-to-Text service. Defaults to the value in `configuration.properties`\n-   `mp3.cachefile` - Specifies the location of a cache file to be used. If the SSML file matches to a previously\n    generated mp3 file in the cache the mp3 file will be copied over and the Speech-to-Text service will not be called.\n-   `mp3.cover.art.add` - Specifies whether or not cover art is to be added to an album (default `no`)\n-   `mp3.cover.art.image` - Specifies the cover art to be used for an album, the default will use the image plug-in\n    alter the file `cfg/cover-art.png`\n-   `audiobook.format` - mp4 Output Format (with or without DRM)\n    -   `m4a` - audio file created in the MPEG-4 format (default)\n    -   `m4b` - audio file created in the MPEG-4 format with DRM\n\n### Selecting a voice to use\n\nWhen running the `mp3` or `audiobook` transforms, the _male voice_ corresponding to the `xml:lang` attribute of the root\ntopic will be chosen to render the speech. Use the `--ssml.gender=female` parameter to switch to the _female voice_. If\nno voice of the preferred gender can be found, the default will be used.\n\nA list of available voices can be found within following files:\n\n-   `cfg/attrs/bing.voice-attr.xsl`\n-   `cfg/attrs/watson.voice-attr.xsl`\n\nEach listing shows the default male and female voices for a language, plus any regional variants which are available:\n\n```xml\n\u003c!-- Voices speaking in English --\u003e\n\u003cxsl:attribute-set name=\"__voice__en__male\"\u003e\n    \u003cxsl:attribute name=\"voice\"\u003een-US_MichaelVoice\u003c/xsl:attribute\u003e\n\u003c/xsl:attribute-set\u003e\n\u003cxsl:attribute-set name=\"__voice__en__female\"\u003e\n    \u003cxsl:attribute name=\"voice\"\u003een-US_AllisonVoice\u003c/xsl:attribute\u003e\n\u003c/xsl:attribute-set\u003e\n\u003c!-- Voices speaking in Regional English --\u003e\n\u003cxsl:attribute-set name=\"__voice__en-us__female\"\u003e\n    \u003cxsl:attribute name=\"voice\"\u003een-US_AllisonVoice\u003c/xsl:attribute\u003e\n\u003c/xsl:attribute-set\u003e\n\u003c!--xsl:attribute-set name=\"__voice__en-us__female\"\u003e\n    \u003cxsl:attribute name=\"voice\"\u003een-US_LisaVoice\u003c/xsl:attribute\u003e\n\u003c/xsl:attribute-set--\u003e\n\u003cxsl:attribute-set name=\"__voice__en-gb__female\"\u003e\n    \u003cxsl:attribute name=\"voice\"\u003een-GB_KateVoice\u003c/xsl:attribute\u003e\n\u003c/xsl:attribute-set\u003e\n```\n\nAs you can see the `en-US_AllisonVoice` is currently the preferred female voice for all documents marked up as\n`xml:lang=\"en\"` and `xml:lang=\"en-US\"`.\n\n-   to alter the `en` preferences, replace the text within the `\u003cxsl:attribute name=\"voice\"\u003e` element with the preferred\n    voice.\n-   to alter the `en-us` preferences, comment out the existing selection and uncomment the new preferred voice.\n\n### Marking up SSML tags.\n\nSome DITA tags such as `\u003cp\u003e` and `\u003cb\u003e` translate directly to SSML, however there is rich vocabulary of audio effects\nwhich are missing from the vanilla DITA specification. These can be accommodated using the `props` attribute added to\n`\u003cph\u003e` tag. Examples are given below. The listing is mainly based on the\n[IBM Text to Speech Programming Guide](https://www.ibm.com/support/knowledgecenter/SSMQSV_6.1.1/com.ibm.voicetools.ssml.doc/tts_ssml.pdf),\nhowever the DITA plug-in is not service specific so some additional tags can be used. Obviously common substitutions\nshould be replaced with `\u003ckeyword\u003e` elements for consistency of reuse.\n\n**Note**: Not all tags and attributes will be supported by every provider.\n\n#### `\u003csay-as\u003e` Element\n\nThe `say-as` tag allows the author to indicate information on the type of text contained within the tag and to help\nspecify the level of detail for rendering the text. The required attribute for this tag is `interpret-as` . There are\ntwo optional attributes, `format` and `detail`, which are only used with particular values within the `interpret-as`\nattribute. These optional attributes are illustrated within the entries for their associated values.\n\n-   `letters`: This value spells out the characters in a given word within the enclosed tag.\n\n##### Example (This will spell out _\"HELLO\"_):\n\n```xml\n\u003cph props=\"say-as interpret-as(letters)\"\u003eHello\u003c/ph\u003e\n```\n\n-   `digits`: This value spells out the digits in a given number within the enclosed tag.\n\n##### Example (This will spell out _\"123456\"_):\n\n```xml\n\u003cph props=\"say-as interpret-as(digits)\"\u003e123456\u003c/ph\u003e\n```\n\n-   `vxml:digits`: This value performs the same function as the digits value.\n\n##### Example\n\n```xml\n\u003cph props=\"say-as interpret-as(vxml:digits)\"\u003e123456\u003c/ph\u003e\n```\n\n-   `date` This value will speak the date within the enclosed tag, using the format given in the associated `format`\n    attribute. The `format` attribute is required for use with the date value of `interpret-as`, but if `format` is not\n    present, the engine will still attempt to pronounce the date.\n\n##### Example (This gives a list of dates in all the various formats: )\n\n```xml\n\u003cph props=\"say-as interpret-as(date) format(mdy)\"\u003e12/17/2005\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(date) format(ymd)\"\u003e2005/12/17\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(date) format(dmy)\"\u003e17/12/2005\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(date) format(ydm)\"\u003e2005/17/12\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(date) format(my)\"\u003e12/2005\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(date) format(md)\"\u003e12/17\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(date) format(ym)\"\u003e2005/12\u003c/ph\u003e\n```\n\n-   `ordinal` - This value will speak the ordinal value for the given digit within the enclosed tag.\n\n##### Example (This will say _\"second first\"_):\n\n```xml\n\u003cph props=\"say-as interpret-as(ordinal)\"\u003e2\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(ordinal)\"\u003e1\u003c/ph\u003e\n```\n\n-   `cardinal` - This value will speak the cardinal number corresponding to the Roman numeral within the enclosed tag.\n\n##### Example (This will say _\"Super Bowl thirty-nine\"_):\n\n```xml\nSuper Bowl \u003cph props=\"say-as interpret-as(cardinal)\"\u003eXXXIX\u003c/ph\u003e\n```\n\n-   `number` - This value is an alternative to using the values given above. Using the `format` attribute to determine\n    how the number is to be interpreted, you can enter one series of number and have it pronounced several different\n    ways, as in the example. The example also includes two different ways of pronouncing a series of numbers as a\n    telephone number. To have the series pronounced with the punctuation included, you must add the `detail` attribute.\n\n##### Examples\n\n```xml\n\u003cph props=\"say-as interpret-as(number)\"\u003e123456\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(number) format(ordinal)\"\u003e123456\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(number) format(cardinal)\"\u003e123456\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(number) format(telephone)\"\u003e555-555-5555\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(number) format(telephone) detail(punctuation)\"\u003e555-555-5555\u003c/ph\u003e\n```\n\n-   `vxml:boolean` - This value will speak `yes` or `no` depending on the value given within the enclosed tag.\n\n##### Examples\n\n```xml\n\u003cph props=\"say-as interpret-as(vxml:boolean)\"\u003etrue\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(vxml:boolean)\"\u003efalse\u003c/ph\u003e\n```\n\n-   `vxml:date` - This value works like the date value, except that the format is predefined as `YYYYMMDD`. When a value\n    is not known, or you do not wish it to be displayed, a question mark is used to replace that value, as shown in the\n    example.\n\n##### Examples\n\n```xml\n\u003cph props=\"say-as interpret-as(vxml:date)\"\u003e20050720\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(vxml:date)\"\u003e????0720\u003c/ph\u003e\n\u003cph props=\"say-as interpret-as(vxml:date)\"\u003e200507??\u003c/ph\u003e\n```\n\n-   `vxml:currency` - This value is used to control the synthesis of monetary quantities. The string must be written in\n    the `UUUmm.nn` format, where `UUU` is the three character currency indicator specified by ISO standard 4217, and\n    `mm.nn` is the amount.\n\n##### Example (This will say _\"forty-five dollars and thirty cents\"_):\n\n```xml\n\u003cph props=\"say-as interpret-as(vxml:currency)\"\u003eUSD45.30\u003c/ph\u003e\n```\n\nIf there are more than two decimal places in the number within the enclosed tag, the amount will be synthesized as a\ndecimal number followed by the currency indicator. If the three character currency indicator is not present, the number\nwill be synthesized as a decimal only, with no pronunciation of currency type.\n\n##### Example (This will say _\"forty-five point three two nine US dollars\"_):\n\n```xml\n\u003cph props=\"say-as interpret-as(vxml:currency)\"\u003eUSD45.329\u003c/ph\u003e\n```\n\n-   `vxml:phone` - This value will speak a phone number with both digits and punctuation, similar to the `number` value\n    used with `format(telephone)`.\n\n```xml\n\u003cph props=\"say-as interpret-as(vxml:phone)\"\u003e555-555-5555\u003c/ph\u003e\n```\n\n#### `\u003cphoneme\u003e` Element\n\nThe SSML phoneme tag enables users to provide a phonetic pronunciation for the enclosed text. This tag has two\nattributes:\n\n-   `alphabet` - This attribute specifies the phonology used. The supported alphabets to designate are `ipa` for the\n    International Phonetic Alphabet, and `ibm` for the SPR representation.\n\n-   `ph` - This attribute specifies the pronunciation. It is a required attribute. This example shows how a\n    pronunciation for _\"tomato\"_ is specified using the IPA phonology, where the symbols are given using Unicode:\n\n##### Examples\n\n```xml\n\u003cph props=\"phoneme alphabet(ipa) ph(t\u0026#x259;mei\u0026#x27E;ou\u0026#x325;)\"\u003etomato\u003c/ph\u003e\n```\n\nThis example shows how a pronunciation for _\"tomato\"_ is specified using the SPR phonology:\n\n```xml\n\u003cph props=\"phoneme alphabet(ibm) ph(.0tx.1me.0fo)\"\u003etomato\u003c/ph\u003e\n```\n\n#### `\u003csub\u003e` Element\n\nThis tag is used to indicate that the text included in the alias attribute is to replace the text enclosed within the\ntag when speech is synthesized. The only attribute for this tag is the `alias` attribute, and it is required.\n\n##### Example\n\n```xml\n\u003cph props=\"sub alias(International Business Machines)\"\u003eIBM\u003c/ph\u003e\n```\n\n#### `\u003cvoice\u003e` Element\n\nThis tag is used when a change in voice is required. Although all attributes listed are optional, without any attributes\ndefined an error will result. The optional attributes are:\n\n-   `age` Accepted values are positive integers between the ages of 14 and 60 for both male and female.\n-   `gender` Accepted values are `male` and `female`.\n-   `name` Accepted values are the installed voices’ names.\n-   `variant` Accepted values are positive integers.\n\n##### Examples\n\n```xml\n\u003cph props=\"voice age(60)\"\u003eSixty year-old's voice.\u003c/ph\u003e\n\u003cph props=\"voice gender(female)\"\u003eThis is a female voice.\u003c/ph\u003e\n\u003cph props=\"voice name(Allison)\"\u003eUse the IBM TTS voice named Allison.\u003c/ph\u003e\n\u003cph props=\"voice name(Allison, Andrew, Tyler)\"\u003eUse the first available IBM TTS voice named in the given list.\u003c/ph\u003e\n```\n\n#### `\u003cemphasis\u003e` Element\n\nThe `\u003cemphasis\u003e` element equests that the contained text be spoken with emphasis (also referred to as prominence or\nstress).\n\n-   `level`: the optional level attribute indicates the strength of emphasis to be applied. Defined values are `strong`,\n    `moderate`, `none` and `reduced`. The default level is `moderate`. The meaning of `strong` and `moderate` emphasis\n    is interpreted according to the language being spoken (languages indicate emphasis using a possible combination of\n    pitch change, timing changes, loudness and other acoustic differences). The `reduced` level is effectively the\n    opposite of emphasizing a word. For example, when the phrase \"going to\" is reduced it may be spoken as \"gonna\". The\n    `none` level is used to prevent the synthesis processor from emphasizing words that it might typically emphasize.\n\n##### Examples\n\n```xml\nThat is a \u003cph props=\"emphasis\"\u003e big \u003c/ph\u003e car!\nThat is a \u003cph props=\"emphasis level(strong)\"\u003e huge \u003c/ph\u003ebank account!\n```\n\nEmphasis can also be achieved using the `\u003cb\u003e` tag\n\n```xml\nThat is a \u003cb\u003e big \u003c/b\u003e car!\nThat is a \u003cb props=\"level(strong)\"\u003e huge \u003c/b\u003ebank account!\n```\n\n#### `\u003cbreak\u003e` Element\n\nThis tag inserts pauses into the spoken text. It has the following optional attributes:\n\n-   `strength` - This attribute specifies the length of a pause in terms of varying strength values: `none,` `x-weak,`\n    `weak,` `medium,` `strong,` or `x-strong.`\n-   `time` - This attribute specifies the length of the pause in terms of seconds or milliseconds. The values formats\n    are `NNNs` for seconds or `NNNms` for milliseconds.\n\n##### Examples\n\n```xml\nDifferent sized \u003cph props=\"break strength(none)\"/\u003e pauses.\nDifferent sized \u003cph props=\"break strength(x-weak)\"/\u003e pauses.\nDifferent sized \u003cph props=\"break strength(weak)\"/\u003e pauses.\nDifferent sized \u003cph props=\"break strength(medium)\"/\u003e pauses.\nDifferent sized \u003cph props=\"break strength(strong)\"/\u003e pauses.\nDifferent sized \u003cph props=\"break strength(x-strong)\"/\u003e pauses.\nDifferent sized \u003cph props=\"break time(1s)\"/\u003e pauses.\nDifferent sized \u003cph props=\"break time(1000ms)\"/\u003e pauses.\n```\n\n#### `\u003cprosody\u003e` Element\n\nThis tag controls the pitch, range, speaking rate, and volume of the text. all attributes are optional, but if no\nattribute is given an error results.\n\nHere is a description of the optional attributes:\n\n-   `pitch` - This attribute modifies the baseline pitch for the text enclosed within the tag. Accepted values are\n    either:, a number followed by the Hz designation, a relative change, `x-low`, `low`, `medium`, `high`, `x-high`,\n    `default`\n\n-   `range` This attribute modifies the pitch range for the text enclosed within the tag. Accepted values for this\n    attribute are the same as the accepted values for `pitch`.\n-   `rate` - This attribute indicates a change in the speaking rate for contained text. Accepted values are: - a\n    relative change - a positive number, `x-slow`, `slow`, `medium`, `fast`, `x-fast`, `default`\n\nThe `rate` is specified in terms of words-per-minute. If the speaking rate is 50 words per minute, then `rate=50`. If\nthe setting is `rate=+10`, the speaking rate will be 10 words per minute faster than your current `rate` setting.\n\n-   volume - This attribute modifies the volume for the contained text. The range for values is `0.0` to `100.0` or the\n    relative values of : `silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`, `default`\n\n##### Examples\n\n```xml\n\u003cph props=\"prosody pitch(150Hz)\"\u003e Modified pitch \u003c/ph\u003e\n\u003cph props=\"prosody pitch(-20Hz)\"\u003e Modified pitch \u003c/ph\u003e\n\u003cph props=\"prosody pitch(+20Hz)\"\u003e Modified pitch \u003c/ph\u003e\n\u003cph props=\"prosody pitch(-12st)\"\u003e Modified pitch \u003c/ph\u003e\n\u003cph props=\"prosody pitch(+12st)\"\u003e Modified pitch \u003c/ph\u003e\n\u003cph props=\"prosody pitch(x-low)\"\u003e Modified pitch \u003c/ph\u003e\n\u003cph props=\"prosody range(150Hz)\"\u003e Modified pitch range\u003c/ph\u003e\n\u003cph props=\"prosody range(-20Hz)\"\u003e Modified pitch range\u003c/ph\u003e\n\u003cph props=\"prosody range(+20Hz)\"\u003e Modified pitch range\u003c/ph\u003e\n\u003cph props=\"prosody range(-12st)\"\u003e Modified pitch range\u003c/ph\u003e\n\u003cph props=\"prosody range(+12st)\"\u003e Modified pitch range\u003c/ph\u003e\n\u003cph props=\"prosody range(x-high)\"\u003e Modified pitch range\u003c/ph\u003e\n\u003cph props=\"prosody rate(slow)\"\u003e Modified speaking rate\u003c/ph\u003e\n\u003cph props=\"prosody rate(+25)\"\u003e Modified speaking rate\u003c/ph\u003e\n\u003cph props=\"prosody rate(-25)\"\u003e Modified speaking rate\u003c/ph\u003e\n\u003cph props=\"prosody volume(88.9)\"\u003eModified volume\u003c/ph\u003e\n\u003cph props=\"prosody volume(loud)\"\u003eModified volume\u003c/ph\u003e\n```\n\n#### `\u003caudio\u003e` Element\n\nThis tag inserts recorded elements into the generated audio. The only attribute is `src` and is required. This attribute\nspecifies the location of the file to be inserted.\n\n##### Example\n\n```xml\n\u003cph props=\"audio src(http://www.myfiles.com/files/beep.wav)\"/\u003e\n```\n\n## Contribute\n\nPRs accepted.\n\n## License\n\n[Apache 2.0](LICENSE) © 2019 - 2024 Jason Fox\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjason-fox%2Ffox.jason.audiobook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjason-fox%2Ffox.jason.audiobook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjason-fox%2Ffox.jason.audiobook/lists"}