https://github.com/prosegrinder/python-cmudict
A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).
https://github.com/prosegrinder/python-cmudict
cmudict counting python syllables
Last synced: 11 months ago
JSON representation
A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).
- Host: GitHub
- URL: https://github.com/prosegrinder/python-cmudict
- Owner: prosegrinder
- License: gpl-3.0
- Created: 2018-02-03T13:49:46.000Z (over 8 years ago)
- Default Branch: main
- Last Pushed: 2025-04-12T15:21:18.000Z (about 1 year ago)
- Last Synced: 2025-06-07T18:07:18.888Z (about 1 year ago)
- Topics: cmudict, counting, python, syllables
- Language: Python
- Homepage:
- Size: 343 KB
- Stars: 63
- Watchers: 4
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# CMUdict: Python wrapper for cmudict
[](https://pypi.python.org/pypi/cmudict)
[](https://github.com/prosegrinder/python-cmudict/actions?query=workflow%3A%22Python+CI%22+branch%3Amain)
CMUdict is a versioned python wrapper package for
[The CMU Pronouncing Dictionary](https://github.com/cmusphinx/cmudict) data
files. The main purpose is to expose the data with little or no assumption on
how it is to be used.
## Installation
`cmudict` is available on PyPI. Simply install it with `pip`:
```bash
pip install cmudict
```
## Usage
The cmudict data set includes 4 data files: cmudict.dict, cmudict.phones,
cmudict.symbols, and cmudict.vp. See
[The CMU Pronouncing Dictionary](https://github.com/cmusphinx/cmudict) for
details on the data. Chances are, if you're here, you already know what's in the
files.
Each file can be accessed through three functions, one which returns the raw
(string) contents, one which returns a binary stream of the file, and one which
does minimal processing of the file into an appropriate structure:
```python
>>> import cmudict
>>> cmudict.dict() # Compatible with NLTK
>>> cmudict.dict_string()
>>> cmudict.dict_stream()
>>> cmudict.phones()
>>> cmudict.phones_string()
>>> cmudict.phones_stream()
>>> cmudict.symbols()
>>> cmudict.symbols_string()
>>> cmudict.symbols_stream()
>>> cmudict.vp()
>>> cmudict.vp_string()
>>> cmudict.vp_stream()
```
Three additional functions are included to maintain compatibility with NLTK:
cmudict.entries(), cmudict.raw(), and cmudict.words(). See the
[nltk.corpus.reader.cmudict](http://www.nltk.org/_modules/nltk/corpus/reader/cmudict.html)
documentation for details:
```python
>>> cmudict.entries() # Compatible with NLTK
>>> cmudict.raw() # Compatible with NLTK
>>> cmudict.words() # Compatible with NTLK
```
And finally, the license for the cmudict data set is available as well:
```python
>>> cmudict.license_string() # Returns the cmudict license as a string
```
## Credits
Built on or modeled after the following open source projects:
- [The CMU Pronouncing Dictionary](https://github.com/cmusphinx/cmudict)
- [NLTK](https://github.com/nltk/nltk)