{"id":20021740,"url":"https://github.com/bshall/vectorquantizedcpc","last_synced_at":"2025-11-01T09:03:46.086Z","repository":{"id":119398875,"uuid":"242967119","full_name":"bshall/VectorQuantizedCPC","owner":"bshall","description":"Vector-Quantized Contrastive Predictive Coding for Acoustic Unit Discovery and Voice Conversion","archived":false,"fork":false,"pushed_at":"2020-09-01T07:26:36.000Z","size":10894,"stargazers_count":141,"open_issues_count":5,"forks_count":23,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-08T14:45:48.808Z","etag":null,"topics":["acoustic-features","contrastive-predictive-coding","pytorch","speech-synthesis","voice-conversion","vq-vae","zerospeech"],"latest_commit_sha":null,"homepage":"https://bshall.github.io/VectorQuantizedCPC/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bshall.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-25T10:00:32.000Z","updated_at":"2025-02-27T11:10:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"29e6e849-01ea-4598-a914-d08d2266dbc9","html_url":"https://github.com/bshall/VectorQuantizedCPC","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshall%2FVectorQuantizedCPC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshall%2FVectorQuantizedCPC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshall%2FVectorQuantizedCPC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bshall%2FVectorQuantizedCPC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bshall","download_url":"https://codeload.github.com/bshall/VectorQuantizedCPC/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252423013,"owners_count":21745531,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acoustic-features","contrastive-predictive-coding","pytorch","speech-synthesis","voice-conversion","vq-vae","zerospeech"],"created_at":"2024-11-13T08:38:05.103Z","updated_at":"2025-11-01T09:03:46.045Z","avatar_url":"https://github.com/bshall.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vector-Quantized Contrastive Predictive Coding\n\nTrain and evaluate the VQ-VAE model for our submission to the [ZeroSpeech 2020 challenge](https://zerospeech.com/).\nVoice conversion samples can be found [here](https://bshall.github.io/VectorQuantizedCPC/).\nPretrained weights for the 2019 English and Indonesian datasets can be found [here](https://github.com/bshall/VectorQuantizedCPC/releases/tag/v0.1).\nLeader-board for the ZeroSpeech 2020 challenge can be found [here](https://zerospeech.com/2020/results.html).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"784\" height=\"340\" alt=\"VQ-CPC model summary\"\n    src=\"https://raw.githubusercontent.com/bshall/VectorQuantizedCPC/master/model.png\"\u003e\u003cbr\u003e\n  \u003csup\u003e\u003cstrong\u003eFig 1:\u003c/strong\u003e VQ-CPC model architecture.\u003c/sup\u003e\n\u003c/p\u003e\n\n## Requirements\n\n1.  Ensure you have Python 3 and PyTorch 1.4 or greater.\n\n2.  Install [NVIDIA/apex](https://github.com/NVIDIA/apex) for mixed precision training.\n\n3.  Install pip dependencies:\n    ```\n    pip install requirements.txt\n    ```\n    \n4.  For evaluation install [bootphon/zerospeech2020](https://github.com/bootphon/zerospeech2020).\n\n## Data and Preprocessing\n\n1.  Download and extract the [ZeroSpeech2020 datasets](https://download.zerospeech.com/).\n\n2.  Download the train/test splits [here](https://github.com/bshall/VectorQuantizedCPC/releases/tag/v0.1) \n    and extract in the root directory of the repo.\n    \n3.  Preprocess audio and extract train/test log-Mel spectrograms:\n    ```\n    python preprocess.py in_dir=/path/to/dataset dataset=[2019/english or 2019/surprise]\n    ```\n    Note: `in_dir` must be the path to the `2019` folder. \n    For `dataset` choose between `2019/english` or `2019/surprise`.\n    Other datasets will be added in the future.\n    \n    Example usage:\n    ```\n    python preprocess.py in_dir=../datasets/2020/2019 dataset=2019/english\n    ```\n    \n## Training\n   \n1.  Train the VQ-CPC model (or download pretrained weights [here](https://github.com/bshall/VectorQuantizedCPC/releases/tag/v0.1)):\n    ```\n    python train_cpc.py checkpoint_dir=path/to/checkpoint_dir dataset=[2019/english or 2019/surprise]\n    ```\n    Example usage:\n    ```\n    python train_cpc.py checkpoint_dir=checkpoints/cpc/2019english dataset=2019/english\n    ```\n    \n2.  Train the vocoder:\n    ```\n    python train_vocoder.py cpc_checkpoint=path/to/cpc/checkpoint checkpoint_dir=path/to/checkpoint_dir dataset=[2019/english or 2019/surprise]\n    ```\n    Example usage:\n    ```\n    python train_vocoder.py cpc_checkpoint=checkpoints/cpc/english2019/model.ckpt-24000.pt checkpoint_dir=checkpoints/vocoder/english2019\n    ```\n\n## Evaluation\n    \n### Voice conversion\n\n```\npython convert.py cpc_checkpoint=path/to/cpc/checkpoint vocoder_checkpoint=path/to/vocoder/checkpoint in_dir=path/to/wavs out_dir=path/to/out_dir synthesis_list=path/to/synthesis_list dataset=[2019/english or 2019/surprise]\n```\nNote: the `synthesis list` is a `json` file:\n```\n[\n    [\n        \"english/test/S002_0379088085\",\n        \"V002\",\n        \"V002_0379088085\"\n    ]\n]\n```\ncontaining a list of items with a) the path (relative to `in_dir`) of the source `wav` files;\nb) the target speaker (see `datasets/2019/english/speakers.json` for a list of options);\nand c) the target file name.\n\nExample usage:\n```\npython convert.py cpc_checkpoint=checkpoints/cpc/english2019/model.ckpt-25000.pt vocoder_checkpoint=checkpoints/vocoder/english2019/model.ckpt-150000.pt in_dir=../datasets/2020/2019 out_dir=submission/2019/english/test synthesis_list=datasets/2019/english/synthesis.json in_dir=../../Datasets/2020/2019 dataset=2019/english\n```\nVoice conversion samples are available [here](https://bshall.github.io/VectorQuantizedCPC/).\n\n### ABX Score\n    \n1.  Encode test data for evaluation:\n    ```\n    python encode.py checkpoint=path/to/checkpoint out_dir=path/to/out_dir dataset=[2019/english or 2019/surprise]\n    ```\n    ```\n    e.g. python encode.py checkpoint=checkpoints/2019english/model.ckpt-500000.pt out_dir=submission/2019/english/test dataset=2019/english\n    ```\n    \n2. Run ABX evaluation script (see [bootphon/zerospeech2020](https://github.com/bootphon/zerospeech2020)).\n\nThe ABX score for the pretrained english model is:\n```\n{\n    \"2019\": {\n        \"english\": {\n            \"scores\": {\n                \"abx\": 13.444869807551896,\n                \"bitrate\": 421.3347459545065\n            },\n            \"details_bitrate\": {\n                \"test\": 421.3347459545065,\n                \"auxiliary_embedding1\": 817.3706731019037,\n                \"auxiliary_embedding2\": 817.6857350383482\n            },\n            \"details_abx\": {\n                \"test\": {\n                    \"cosine\": 13.444869807551896,\n                    \"KL\": 50.0,\n                    \"levenshtein\": 27.836903478166363\n                },\n                \"auxiliary_embedding1\": {\n                    \"cosine\": 12.47147337307366,\n                    \"KL\": 50.0,\n                    \"levenshtein\": 43.91132599798928\n                },\n                \"auxiliary_embedding2\": {\n                    \"cosine\": 12.29162067184495,\n                    \"KL\": 50.0,\n                    \"levenshtein\": 44.29540315886812\n                }\n            }\n        }\n    }\n}\n```\n\n## References\n\nThis work is based on:\n\n1.  Aaron van den Oord, Yazhe Li, and Oriol Vinyals. [\"Representation learning with contrastive predictive coding.\"](https://arxiv.org/abs/1807.03748)\n    arXiv preprint arXiv:1807.03748 (2018).\n\n2.  Aaron van den Oord, and Oriol Vinyals. [\"Neural discrete representation learning.\"](https://arxiv.org/abs/1711.00937)\n    Advances in Neural Information Processing Systems. 2017.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbshall%2Fvectorquantizedcpc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbshall%2Fvectorquantizedcpc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbshall%2Fvectorquantizedcpc/lists"}