{"id":26119162,"url":"https://github.com/stdlib-js/datasets-cmudict","last_synced_at":"2025-08-24T16:43:00.445Z","repository":{"id":41422748,"uuid":"377249088","full_name":"stdlib-js/datasets-cmudict","owner":"stdlib-js","description":"The Carnegie Mellon Pronouncing Dictionary (CMUdict).","archived":false,"fork":false,"pushed_at":"2025-03-10T02:05:06.000Z","size":5985,"stargazers_count":15,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-12T20:22:16.637Z","etag":null,"topics":["data","dataset","datasets","dictionary","en","english","javascript","language","nlp","node","node-js","nodejs","pronounciation","speech","spelling","stdlib","words"],"latest_commit_sha":null,"homepage":"https://github.com/stdlib-js/stdlib","language":"JavaScript","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stdlib-js.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["stdlib-js"],"open_collective":"stdlib","tidelift":"npm/@stdlib/stdlib"}},"created_at":"2021-06-15T17:49:40.000Z","updated_at":"2025-04-09T10:18:46.000Z","dependencies_parsed_at":"2023-02-17T07:15:29.856Z","dependency_job_id":"6fc53d88-33d7-49e6-b351-6fe48addb973","html_url":"https://github.com/stdlib-js/datasets-cmudict","commit_stats":{"total_commits":54,"total_committers":1,"mean_commits":54.0,"dds":0.0,"last_synced_commit":"9192aaa550deadfe76e85dce1f7cf2b485e061f5"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stdlib-js%2Fdatasets-cmudict","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stdlib-js%2Fdatasets-cmudict/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stdlib-js%2Fdatasets-cmudict/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stdlib-js%2Fdatasets-cmudict/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stdlib-js","download_url":"https://codeload.github.com/stdlib-js/datasets-cmudict/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248696229,"owners_count":21147093,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","dataset","datasets","dictionary","en","english","javascript","language","nlp","node","node-js","nodejs","pronounciation","speech","spelling","stdlib","words"],"created_at":"2025-03-10T12:16:05.643Z","updated_at":"2025-04-13T10:21:10.285Z","avatar_url":"https://github.com/stdlib-js.png","language":"JavaScript","readme":"\u003c!--\n\n@license Apache-2.0\n\nCopyright (c) 2018 The Stdlib Authors.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n   http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n--\u003e\n\n\n\u003cdetails\u003e\n  \u003csummary\u003e\n    About stdlib...\n  \u003c/summary\u003e\n  \u003cp\u003eWe believe in a future in which the web is a preferred environment for numerical computation. To help realize this future, we've built stdlib. stdlib is a standard library, with an emphasis on numerical and scientific computation, written in JavaScript (and C) for execution in browsers and in Node.js.\u003c/p\u003e\n  \u003cp\u003eThe library is fully decomposable, being architected in such a way that you can swap out and mix and match APIs and functionality to cater to your exact preferences and use cases.\u003c/p\u003e\n  \u003cp\u003eWhen you use stdlib, you can be absolutely certain that you are using the most thorough, rigorous, well-written, studied, documented, tested, measured, and high-quality code out there.\u003c/p\u003e\n  \u003cp\u003eTo join us in bringing numerical computing to the web, get started by checking us out on \u003ca href=\"https://github.com/stdlib-js/stdlib\"\u003eGitHub\u003c/a\u003e, and please consider \u003ca href=\"https://opencollective.com/stdlib\"\u003efinancially supporting stdlib\u003c/a\u003e. We greatly appreciate your continued support!\u003c/p\u003e\n\u003c/details\u003e\n\n# CMUdict\n\n[![NPM version][npm-image]][npm-url] [![Build Status][test-image]][test-url] [![Coverage Status][coverage-image]][coverage-url] \u003c!-- [![dependencies][dependencies-image]][dependencies-url] --\u003e\n\n\u003e The Carnegie Mellon Pronouncing Dictionary.\n\n\u003csection class=\"intro\"\u003e\n\nThe [Carnegie Mellon University Pronouncing Dictionary (CMUDict)][cmudict], created by the Speech Group in the School of Computer Science at CMU, is \"an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words\".\n\n\u003c/section\u003e\n\n\u003c!-- /.intro --\u003e\n\n\u003csection class=\"installation\"\u003e\n\n## Installation\n\n```bash\nnpm install @stdlib/datasets-cmudict\n```\n\nAlternatively,\n\n-   To load the package in a website via a `script` tag without installation and bundlers, use the [ES Module][es-module] available on the [`esm`][esm-url] branch (see [README][esm-readme]).\n-   If you are using Deno, visit the [`deno`][deno-url] branch (see [README][deno-readme] for usage intructions).\n-   For use in Observable, or in browser/node environments, use the [Universal Module Definition (UMD)][umd] build available on the [`umd`][umd-url] branch (see [README][umd-readme]).\n-   To use as a general utility for the command line, install the corresponding [CLI package][cli-section] globally.\n\nThe [branches.md][branches-url] file summarizes the available branches and displays a diagram illustrating their relationships.\n\nTo view installation and usage instructions specific to each branch build, be sure to explicitly navigate to the respective README files on each branch, as linked to above.\n\n\u003c/section\u003e\n\n\u003csection class=\"usage\"\u003e\n\n## Usage\n\n```javascript\nvar cmudict = require( '@stdlib/datasets-cmudict' );\n```\n\n#### cmudict( \\[options] )\n\nReturns datasets from the [Carnegie Mellon Pronouncing Dictionary (CMUdict)][cmudict].\n\n```javascript\nvar data = cmudict();\n/* returns\n    {\n        'dict': {...},\n        'phones': {...},\n        'symbols': [...],\n        'vp': {...}\n    }\n*/\n```\n\nThe function accepts the following `options`:\n\n-   **data**: dataset name. The following names are recognized:\n\n    -   **dict**: the main pronouncing dictionary.\n    -   **phones**: manners of articulation for each sound.\n    -   **symbols**: complete list of ARPABET symbols used by the dictionary.\n    -   **vp**: verbal pronunciations of punctuation marks.\n\nTo only return the main pronouncing dictionary, set the `data` option to `dict`.\n\n```javascript\nvar opts = {\n    'data': 'dict'\n};\n\nvar data = cmudict( opts );\n/* returns\n    {\n        'A': 'AH0',\n        'A(1)': 'EY1',\n        'A\\'S': 'EY1 Z',\n        // ...\n    }\n*/\n```\n\nTo return only sound articulation manners, set the `data` option to `phones`.\n\n```javascript\nvar opts = {\n    'data': 'phones'\n};\n\nvar data = cmudict( opts );\n/* returns\n    {\n        'AA': 'vowel',\n        'AE': 'vowel',\n        'AH': 'vowel',\n        // ...\n    }\n*/\n```\n\nTo return only ARPABET symbols used by the dictionary, set the `data` option to `symbols`.\n\n```javascript\nvar opts = {\n    'data': 'symbols'\n};\n\nvar data = cmudict( opts );\n/* returns\n    [\n        'AA',\n        'AA0',\n        'AA1',\n        // ...\n    ]\n*/\n```\n\nTo return only the verbal pronunciations of punctuation marks, set the `data` option to `vp`.\n\n```javascript\nvar opts = {\n    'data': 'vp'\n};\n\nvar data = cmudict( opts );\n/* returns\n    {\n        '!exclamation-point': 'EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T',\n        '\"close-quote': 'K L OW1 Z K W OW1 T',\n        '\"double-quote': 'D AH1 B AH0 L K W OW1 T',\n        // ...\n    }\n*/\n```\n\n\u003c/section\u003e\n\n\u003c!-- /.usage --\u003e\n\n\u003csection class=\"notes\"\u003e\n\n## Notes\n\n-   Vowels carry a lexical stress marker (0: No stress, 1: Primary stress, 2: Secondary stress).\n-   The phoneme set is based on the [ARPAbet symbol set][arpabet] developed for speech recognition.\n\n\u003c/section\u003e\n\n\u003c!-- /.notes --\u003e\n\n\u003csection class=\"examples\"\u003e\n\n## Examples\n\n\u003c!-- eslint no-undef: \"error\" --\u003e\n\n```javascript\nvar cmudict = require( '@stdlib/datasets-cmudict' );\n\nvar opts = {};\n\nopts.data = 'phones';\nconsole.dir( cmudict( opts ) );\n\nopts.data = 'symbols';\nconsole.dir( cmudict( opts ) );\n\nopts.data = 'dict';\nconsole.dir( cmudict( opts ) );\n```\n\n\u003c/section\u003e\n\n\u003c!-- /.examples --\u003e\n\n* * *\n\n\u003csection class=\"cli\"\u003e\n\n## CLI\n\n\u003csection class=\"installation\"\u003e\n\n## Installation\n\nTo use as a general utility, install the CLI package globally\n\n```bash\nnpm install -g @stdlib/datasets-cmudict-cli\n```\n\n\u003c/section\u003e\n\n\u003c!-- CLI usage documentation. --\u003e\n\n\u003csection class=\"usage\"\u003e\n\n### Usage\n\n```text\nUsage: cmudict [options]\n\nOptions:\n\n  -h,    --help                Print this message.\n  -V,    --version             Print the package version.\n         --data name           Dataset name: dict, phones, symbols, vp.\n```\n\n\u003c/section\u003e\n\n\u003c!-- /.usage --\u003e\n\n\u003csection class=\"notes\"\u003e\n\n### Notes\n\n-   If the `--data` option is set to a supported dataset name, the CLI prints the contents of the respective dataset file as plain text. Otherwise, the output format is newline-delimited JSON ([NDJSON][ndjson]).\n\n\u003c/section\u003e\n\n\u003c!-- /.notes --\u003e\n\n\u003csection class=\"examples\"\u003e\n\n### Examples\n\n```bash\n$ cmudict --data symbols\nAA\nAA0\nAA1\nAA2\n...\n```\n\n\u003c/section\u003e\n\n\u003c!-- /.examples --\u003e\n\n\u003c/section\u003e\n\n\u003c!-- /.cli --\u003e\n\n* * *\n\n\u003c!-- \u003clicense\u003e --\u003e\n\n## License\n\nThe data files (databases) and their contents are licensed under a [BSD-2-Clause license][bsd-license]. The software is licensed under [Apache License, Version 2.0][apache-license].\n\n\u003c!-- \u003c/license\u003e --\u003e\n\n\u003c!-- Section for related `stdlib` packages. Do not manually edit this section, as it is automatically populated. --\u003e\n\n\u003csection class=\"related\"\u003e\n\n\u003c/section\u003e\n\n\u003c!-- /.related --\u003e\n\n\u003c!-- Section for all links. Make sure to keep an empty line after the `section` element and another before the `/section` close. --\u003e\n\n\n\u003csection class=\"main-repo\" \u003e\n\n* * *\n\n## Notice\n\nThis package is part of [stdlib][stdlib], a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more.\n\nFor more information on the project, filing bug reports and feature requests, and guidance on how to develop [stdlib][stdlib], see the main project [repository][stdlib].\n\n#### Community\n\n[![Chat][chat-image]][chat-url]\n\n---\n\n## Copyright\n\nCopyright \u0026copy; 2016-2025. The Stdlib [Authors][stdlib-authors].\n\n\u003c/section\u003e\n\n\u003c!-- /.stdlib --\u003e\n\n\u003c!-- Section for all links. Make sure to keep an empty line after the `section` element and another before the `/section` close. --\u003e\n\n\u003csection class=\"links\"\u003e\n\n[npm-image]: http://img.shields.io/npm/v/@stdlib/datasets-cmudict.svg\n[npm-url]: https://npmjs.org/package/@stdlib/datasets-cmudict\n\n[test-image]: https://github.com/stdlib-js/datasets-cmudict/actions/workflows/test.yml/badge.svg?branch=main\n[test-url]: https://github.com/stdlib-js/datasets-cmudict/actions/workflows/test.yml?query=branch:main\n\n[coverage-image]: https://img.shields.io/codecov/c/github/stdlib-js/datasets-cmudict/main.svg\n[coverage-url]: https://codecov.io/github/stdlib-js/datasets-cmudict?branch=main\n\n\u003c!--\n\n[dependencies-image]: https://img.shields.io/david/stdlib-js/datasets-cmudict.svg\n[dependencies-url]: https://david-dm.org/stdlib-js/datasets-cmudict/main\n\n--\u003e\n\n[chat-image]: https://img.shields.io/gitter/room/stdlib-js/stdlib.svg\n[chat-url]: https://app.gitter.im/#/room/#stdlib-js_stdlib:gitter.im\n\n[stdlib]: https://github.com/stdlib-js/stdlib\n\n[stdlib-authors]: https://github.com/stdlib-js/stdlib/graphs/contributors\n\n[cli-section]: https://github.com/stdlib-js/datasets-cmudict#cli\n[cli-url]: https://github.com/stdlib-js/datasets-cmudict/tree/cli\n[@stdlib/datasets-cmudict]: https://github.com/stdlib-js/datasets-cmudict/tree/main\n\n[umd]: https://github.com/umdjs/umd\n[es-module]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules\n\n[deno-url]: https://github.com/stdlib-js/datasets-cmudict/tree/deno\n[deno-readme]: https://github.com/stdlib-js/datasets-cmudict/blob/deno/README.md\n[umd-url]: https://github.com/stdlib-js/datasets-cmudict/tree/umd\n[umd-readme]: https://github.com/stdlib-js/datasets-cmudict/blob/umd/README.md\n[esm-url]: https://github.com/stdlib-js/datasets-cmudict/tree/esm\n[esm-readme]: https://github.com/stdlib-js/datasets-cmudict/blob/esm/README.md\n[branches-url]: https://github.com/stdlib-js/datasets-cmudict/blob/main/branches.md\n\n[cmudict]: http://www.speech.cs.cmu.edu/cgi-bin/cmudict#about\n\n[arpabet]: https://en.wikipedia.org/wiki/ARPABET\n\n[ndjson]: http://specs.frictionlessdata.io/ndjson/\n\n[bsd-license]: https://opensource.org/licenses/bsd-license.html\n\n[apache-license]: https://www.apache.org/licenses/LICENSE-2.0\n\n\u003c/section\u003e\n\n\u003c!-- /.links --\u003e\n","funding_links":["https://github.com/sponsors/stdlib-js","https://opencollective.com/stdlib","https://tidelift.com/funding/github/npm/@stdlib/stdlib"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstdlib-js%2Fdatasets-cmudict","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstdlib-js%2Fdatasets-cmudict","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstdlib-js%2Fdatasets-cmudict/lists"}