{"id":20589296,"url":"https://github.com/ciscodevnet/g2p_seq2seq_pytorch","last_synced_at":"2025-07-27T04:02:52.227Z","repository":{"id":47664863,"uuid":"395778341","full_name":"CiscoDevNet/g2p_seq2seq_pytorch","owner":"CiscoDevNet","description":"Grapheme to phoneme model for PyTorch","archived":false,"fork":false,"pushed_at":"2022-07-21T21:41:39.000Z","size":1554,"stargazers_count":41,"open_issues_count":0,"forks_count":11,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-05-22T17:34:52.426Z","etag":null,"topics":["asr","g2p","g2p-seq2seq","grapheme-to-phoneme","pytorch","transformer","transformer-architecture"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CiscoDevNet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null}},"created_at":"2021-08-13T19:55:07.000Z","updated_at":"2025-04-03T09:31:25.000Z","dependencies_parsed_at":"2022-08-24T13:31:18.045Z","dependency_job_id":null,"html_url":"https://github.com/CiscoDevNet/g2p_seq2seq_pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CiscoDevNet/g2p_seq2seq_pytorch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CiscoDevNet%2Fg2p_seq2seq_pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CiscoDevNet%2Fg2p_seq2seq_pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CiscoDevNet%2Fg2p_seq2seq_pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CiscoDevNet%2Fg2p_seq2seq_pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CiscoDevNet","download_url":"https://codeload.github.com/CiscoDevNet/g2p_seq2seq_pytorch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CiscoDevNet%2Fg2p_seq2seq_pytorch/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267298242,"owners_count":24065878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-27T02:00:11.917Z","response_time":82,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","g2p","g2p-seq2seq","grapheme-to-phoneme","pytorch","transformer","transformer-architecture"],"created_at":"2024-11-16T07:28:39.598Z","updated_at":"2025-07-27T04:02:52.182Z","avatar_url":"https://github.com/CiscoDevNet.png","language":"Python","readme":"# Sequence-to-Sequence G2P toolkit for PyTorch\n\nGrapheme to Phoneme (G2P) is a function that generates pronunciations (phonemes) for words based on their written form (graphemes). \nIt has an important role in automatic speech recognition systems, natural language processing and text-to-speech engines. \nThis G2P model implements a transformer architecture on python [PyTorch](https://pytorch.org/) and  [FairSeq](https://fairseq.readthedocs.io/en/latest/).\nThis repo implements a G2P model with two APIs:\n1. load_g2p_model: Loads the G2P model from disk.\n2. decode_word: Outputs phonemes given a word. It optionally exposes phoneme stress information.\n\n## Installation\n\nThis repo works on Python\u003e=3.7.8 and uses poetry to install dependencies. Assuming `pyenv` and `poetry` is installed, the repo can be downloaded as follows:\n```bash\ncd g2p_seq2seq_pytorch/\npyenv virtualenv 3.7.8 g2p\npyenv activate g2p\npoetry install\n```\n\n## Download the model\n\nWe provide a pretrained 3x3 layer transformer model with 256 hidden units [here](https://developer.cisco.com/fileMedia/download/5b20821d-f092-3b57-a438-546046ffaa61/).\nThe model should be named `20210722.pt`. Place the model file in the `g2p_seq2seq_pytorch/g2p_seq2seq_pytorch/models/` folder.\n\n## How to use the APIs\n\n```python\nfrom g2p_seq2seq_pytorch.g2p import G2PPytorch\nmodel = G2PPytorch()\nmodel.load_model()\nmodel.decode_word(\"amsterdam\") # \"AE M S T ER D AE M\"\nmodel.decode_word(\"amsterdam\", with_stress=True) # \"AE1 M S T ER0 D AE2 M\"\n```\n\n## How to train/test the model\n\nWe use [CMUDict latest](https://github.com/cmusphinx/cmudict) for train and validation. Validation is ~10% of the total dataset. \nNote that CMUDict latest doesn't have any test splits. Note also that CMUDict latest has phoneme stress information.\n\nWe use [CMUDict PRONASYL 2007](https://sourceforge.net/projects/cmusphinx/files/G2P%20Models/phonetisaurus-cmudict-split.tar.gz/download)\ntest set for testing. Note that CMUDict PRONASYL 2007 doesn't have stress information.\n\n1. Prepare the training/validation/test data for model ingestion. This step involves tokenization, \n   removing stop words and binarization of data\n   \n2. Train the model on the binarized data and generate predictions on the test data.\n\nWe cannot directly look at the output of the test evaluation results since the test set does not have the stress information. \nWe have to remove that stress information from the generated output to directly compare to the test set. We do this since\nwe want the model to learn from the stress information even though we want to quantify it's performance on the test set.\n\n```bash\ncd scripts/\nsh prepare-g2p.sh\nsh train-and-generate.sh\n```\n\n## Evaluation of the model\n\nWe benchmarked the PyTorch model against the [CMUSphinx](https://github.com/cmusphinx/g2p-seq2seq) TensorFlow model with the following metrics: \n- Phonetic error rate (%): For each word, calculate the percentage of the total number of predicted phonemes that are correct when compared to the gold phonemes. Average this across all words. \n- Word error rate (%): For each word, compare the entire sequence of predicted phonemes to the gold phonemes. We calculate the percentage of words whose predicted phonemes are an exact match to the gold phonemes. \n- CPU Latency (milli-seconds): Time taken to execute the G2P function on a CPU instance.\n- GPU Latency (milli-seconds): Time taken to execute the G2P function on a GPU instance.\n\n| Architecture   | PER (%)  | WER (%)  | CPU Latency (ms)  | GPU Latency (ms)  |\n|----------------|----------|----------|-------------------|-------------------|\n| CMUSphinx      | 4.16     | 19.91    | 13.76             | -                 |\n| PyTorch  | 5.26     | 23.80    | 10.19             | 5.41              |\n\nMore details on the benchmarking datasets can be found in our [blog post](https://blogs.cisco.com/developer/graphemephoneme01).","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fciscodevnet%2Fg2p_seq2seq_pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fciscodevnet%2Fg2p_seq2seq_pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fciscodevnet%2Fg2p_seq2seq_pytorch/lists"}