Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vaibhavs10/common_voice_dataset_generator
https://github.com/vaibhavs10/common_voice_dataset_generator
Last synced: 17 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/vaibhavs10/common_voice_dataset_generator
- Owner: Vaibhavs10
- Created: 2023-03-06T14:27:32.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-30T08:52:05.000Z (almost 2 years ago)
- Last Synced: 2023-08-26T05:22:30.649Z (over 1 year ago)
- Language: Python
- Size: 49.8 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
---
duplicated_from: anton-l/common_voice_generator
---
## Common voice release generator1. Copy the latest release id from the `RELEASES` dict in https://github.com/common-voice/common-voice/blob/main/web/src/components/pages/datasets/releases.ts
to the `VERSIONS` variable in `generate_datasets.py`.
2. Copy the languages from https://github.com/common-voice/common-voice/blob/release-v1.78.0/web/locales/en/messages.ftl
(replacing `release-v1.78.0` with the latest version tag) to the `languages.ftl` file.
3. Run `python generate_datasets.py` to generate the dataset repos.
4. `cd ..`
5. `huggingface-cli repo create --type dataset --organization mozilla-foundation common_voice_11_0`
6. `git clone https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0`
7. `cd common_voice_11_0`
8. `cp ../common_voice_generator/common_voice_11_0/* ./`
9. `git add . && git commit -m "Release" && git push`