https://github.com/qualcomm/voiceai-dataset
This project is used to release voice samples dataset we use in voiceai models
https://github.com/qualcomm/voiceai-dataset
Last synced: 2 months ago
JSON representation
This project is used to release voice samples dataset we use in voiceai models
- Host: GitHub
- URL: https://github.com/qualcomm/voiceai-dataset
- Owner: qualcomm
- License: bsd-3-clause
- Created: 2025-10-27T11:52:15.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2026-04-02T01:51:21.000Z (3 months ago)
- Last Synced: 2026-04-02T14:39:06.912Z (3 months ago)
- Size: 10.7 KB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE-OF-CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# VoiceAI dataset
This project provides a collection of datasets that are used in the VoiceAI notebooks from QPM (Qualcomm Package Manager) releases. You can download the datasets from this repository to eliminate the need to download the full large dataset.
## Dataset list
| Dataset | Description | Download |
| :---------------- | :---------------|:-------------:|
| Common Voice for Whisper notebook | a small portion of [Common Voice ](https://commonvoice.mozilla.org/en/datasets) V9 English dataset | [Link](https://github.com/qualcomm/voiceai-dataset/releases/download/whisper_dataset/common_voice_9.0_for_whisper_notebook.zip) |
| LibriSpeech for Whisper notebook | a small portion of `train-clean-100` and `train-other-500`
datasets from [LibriSpeech](https://www.openslr.org/12) | [Link](https://github.com/qualcomm/voiceai-dataset/releases/download/whisper_dataset/LibriSpeech_for_whisper_notebook.zip) |
| Common Voice for Zipformer notebook | a small portion of [Common Voice](https://commonvoice.mozilla.org/en/datasets) V9 English and Chinese datasets | [Link](https://github.com/qualcomm/voiceai-dataset/releases/download/zipformer_dataset/common_voice_9.0_for_zipformer_notebook.zip) |
## Usage
Download the datasets from the releases page and follow the notebook of the VoiceAI model you are using for further instructions.
## Getting in Contact
* [Report an Issue on GitHub](../../issues)
## License
The project is licensed under the [BSD-3-clause License](https://spdx.org/licenses/BSD-3-Clause.html). See [LICENSE.txt](LICENSE.txt) for the full license text.