{"id":16956226,"url":"https://github.com/juliuskunze/speechless","last_synced_at":"2025-03-17T08:37:30.056Z","repository":{"id":73768732,"uuid":"78672235","full_name":"juliuskunze/speechless","owner":"juliuskunze","description":"Speech-to-text based on wav2letter built for transfer learning","archived":false,"fork":false,"pushed_at":"2022-10-21T14:03:56.000Z","size":195,"stargazers_count":97,"open_issues_count":9,"forks_count":25,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-02-27T21:38:24.158Z","etag":null,"topics":["keras","python3","speech-recognition","tensorflow"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/1706.00290.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/juliuskunze.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-01-11T19:31:18.000Z","updated_at":"2024-12-31T09:53:47.000Z","dependencies_parsed_at":"2023-03-23T23:47:35.425Z","dependency_job_id":null,"html_url":"https://github.com/juliuskunze/speechless","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliuskunze%2Fspeechless","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliuskunze%2Fspeechless/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliuskunze%2Fspeechless/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliuskunze%2Fspeechless/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/juliuskunze","download_url":"https://codeload.github.com/juliuskunze/speechless/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243852499,"owners_count":20358271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["keras","python3","speech-recognition","tensorflow"],"created_at":"2024-10-13T22:14:29.594Z","updated_at":"2025-03-17T08:37:29.736Z","avatar_url":"https://github.com/juliuskunze.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# speechless\nSpeech recognizer based on [wav2letter architecture](https://arxiv.org/pdf/1609.03193v2.pdf) built with [Keras](https://keras.io/).\n\nSupports CTC loss, KenLM and greedy decoding and transfer learning between different languages. ASG loss is currently not supported.\n\nTraining for English with the [1000h LibriSpeech corpus](http://www.openslr.org/12) works out of the box, \nwhile training for the German language requires downloading data manually.\n\n## Installation\n\nPython 3.4+ and [TensorFlow](https://www.tensorflow.org/install/) are required.\n\n    pip3 install git+git@github.com:JuliusKunze/speechless.git\n\nwill install speechless together with minimal requirements.\n\nIf you want to use the KenLM decoder, [this modified version](https://github.com/timediv/tensorflow-with-kenlm) of TensorFlow needs to be installed first.\n\nYou need to have an audio backend available, for example ffmpeg (run `brew install ffmpeg` on Mac OS).  \n\n## Training\n\n```python\nfrom speechless.configuration import Configuration\n\nConfiguration.minimal_english().train_from_beginning()\n```\n    \nwill automatically download a small English example corpus (337MB), \ntrain a net based on it while giving you updated loss and predictions.\nIf you use a strong consumer-grade GPU, you should observe training predictions become similar to the input after ~12h, e. g.\n```\nExpected:  \"just thrust and parry and victory to the stronger\"\nPredicted: \"jest thcrus and pary and bettor o the stronter\"\nErrors: 10 letters (20%), 6 words (67%), loss: 37.19.\n```\n\nAll data (corpus, nets, logs) will be stored in `~/speechless-data`.\n\nThis directory can be changed:\n```python\nfrom pathlib import Path\n\nfrom speechless import configuration\nfrom speechless.configuration import Configuration, DataDirectories\n\nconfiguration.default_data_directories = DataDirectories(Path(\"/your/data/path\"))\n\nConfiguration.minimal_english().train_from_beginning()\n```\n\nTo download and train on the full 1000h LibriSpeech corpus, replace `mininal_english` with `english`.\n\n`main.py` contains various other functions that were executed to train and use models.\n\nIf you want completely flexible where data is saved and loaded from, \nyou should not use `Configuration` at all but instead use the code from `net`, `corpus`, `german_corpus`, `english_corpus` and `recording` directly.\n\n## Loading\n\nBy default, all trained models are stored in the `~/speechless-data/nets` directory. \nYou use models from [here](https://drive.google.com/drive/folders/0B0Azt-a50ylyal9JVDJnbXJJd2c?resourcekey=0-fVYtlyCyldcgeVdo_-OQ6A\u0026usp=sharing) by downloading them into this folder (keep the subfolder from Google Drive).\nTo load a such a model use `load_best_english_model` or `load_best_german_model` e. g.\n\n```python\nfrom speechless.configuration import Configuration\n\nwav2letter = Configuration.german().load_best_german_model()\n```\n\nIf the language was originally trained with a different character set (e. g. a corpus of another language),\nspecifying the `allowed_characters_for_loaded_model` parameter of `load_model` still allows you to use that model for training, \nthereby allowing transfer learning. \n\n## Recording\n\nYou can record your own audio with a microphone and get a prediction for it:\n```python\n# ... after loading a model, see above\n\nfrom speechless.recording import record_plot_and_save\n\nlabel = record_plot_and_save()\n\nprint(wav2letter.predict(label))\n```\n\nThree seconds of silence will end the recording and silence will be truncated.\nBy default, this will generate a `wav`-file and a spectrogram plot in `~/speechless-data/recordings`.\n\n\n## Testing\n\nGiven that you downloaded the German corpus into the corpus directory, you can evaluate the German model on the test set:\n\n```python\ngerman.test_model_grouped_by_loaded_corpus_name(wav2letter)\n```\n\nTesting will write to the standard output and a log to `~/speechless-data/test-results` by default.\n\n## Plotting\n\nPlotting labeled audio examples from the corpus like this one [here](https://docs.google.com/presentation/d/1X30IcB-CzCxnGt780ze0qOrbsRtDrxbWrZ_zQ91TOZQ/edit#slide=id.g1b9173e933_0_15) can be done with `LabeledExamplePlotter.save_spectrogram`.\n\n## German \u0026 Sections\n\nFor some German datasets, it is possible to retrieve which word is said at which point of time, \nallowing to extract labeled sections, e. g.:\n\n```python\nfrom speechless.configuration import Configuration\n\ngerman = Configuration.german()\nwav2letter = german.load_best_german_model()\nexample = german.corpus.examples[0]\nsections = example.sections()\nfor section in sections:\n    print(wav2letter.test_and_predict(section))\n```\n\nIf you need to access the section labels only (e. g. for filtering for particular words), \nuse `example.positional_label.labels` (which is faster because no audio data needs to be sliced).\nIf no positional info is available, `sections` and `positional_label` are `None`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliuskunze%2Fspeechless","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuliuskunze%2Fspeechless","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliuskunze%2Fspeechless/lists"}