https://github.com/espnet/notebook
https://github.com/espnet/notebook
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/espnet/notebook
- Owner: espnet
- Created: 2019-07-30T08:58:49.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2025-06-09T18:43:25.000Z (about 1 year ago)
- Last Synced: 2025-06-23T00:39:19.283Z (12 months ago)
- Language: Jupyter Notebook
- Size: 17.5 MB
- Stars: 67
- Watchers: 6
- Forks: 40
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ESPnet Notebooks
## Demo
### ASR (Speech recognition)
- [`asr_realtime_demo.ipynb`](ESPnet2/Demo/ASR/asr_realtime_demo.ipynb): ASR realtime inference with various pre-trained models.
- [`asr_transfer_learning_demo.ipynb`](ESPnet2/Demo/ASR/asr_transfer_learning_demo.ipynb): Demo on how to use pre-trained ASR models for fine-tuning.
- [`streaming_asr_demo.ipynb`](ESPnet2/Demo/ASR/streaming_asr_demo.ipynb): Streaming ASR realtime inference with pre-trained models.
### SE (Speech enhancement/separation)
- [`se_demo.ipynb`](ESPnet2/Demo/SE/se_demo.ipynb): Speech enhancement/separation inference with various pre-trained models.
- [`se_demo_for_waspaa_2021.ipynb`](ESPnet2/Demo/SE/se_demo_for_waspaa_2021.ipynb): WASPAA2021 version of ESPnet-SE demo.
### SLU (Spoken language understanding)
- [`2pass_slu_demo.ipynb`](ESPnet2/Demo/SLU/2pass_slu_demo.ipynb): Two pass spoken language understanding pre-trained model examples.
### TTS (Text-to-speech)
- [`tts_realtime_demo.ipynb`](ESPnet2/Demo/TTS/tts_realtime_demo.ipynb): TTS realtime inference with various pre-trained models.
### Other utilities
- [`onnx_conversion_demo.ipynb`](ESPnet2/Demo/Others/onnx_conversion_demo.ipynb): How to convert ESPnet models into ONNX format.
## ESPnet-EZ
### ASR (Speech recognition)
- [`train_from_scratch.ipynb`](ESPnetEZ/ASR/train_from_scratch.ipynb): Training an ASR model with ESPnet-EZ on LibriSpeech-100.
- [`ASR_finetune_owsm.ipynb`](ESPnetEZ/ASR/ASR_finetune_owsm.ipynb): Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on custom dataset.
### ST (Speech-to-text translation)
- [`integrate_huggingface.ipynb`](ESPnetEZ/ST/integrate_huggingface.ipynb): Integrating the weakly-supervised model (OWSM) and huggingface's pre-trained language model with ESPnet-EZ on MuST-C-v2.
- [`ST_finetune_owsm.ipynb`](ESPnetEZ/ST/ST_finetune_owsm.ipynb): Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on MuST-C-v2.
### SLU (Spoken language understanding)
- [`SLU_finetune_owsm.ipynb`](ESPnetEZ/SLU/SLU_finetune_owsm.ipynb): Fine-tuning the weakly-supervised model (OWSM) with ESPnet-EZ on SLURP.
### TTS (Text-to-speech)
- [`TTS_finetune_vctk_dump.ipynb`](ESPnetEZ/TTS/TTS_finetune_vctk_dump.ipynb): Fine-tuning a pre-trained VITS model with ESPnet-EZ on the VCTK dataset.
### SVS (Singing voice synthesis)
- [`SVS_finetune_ace-kising.ipynb`](ESPnetEZ/SVS/SVS_finetune_ace-kising.ipynb): Fine-tuning a pre-trained VISinger 2 model with ESPnet-EZ on ACE-KiSing.
## Course
### CMU SpeechProcessing Spring2023
- [`assignment0_data-prep.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment0_data-prep.ipynb): Course assignment on how to prepare ESPnet-format data.
- [`assignment1_espnet-tutorial.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment1_espnet-tutorial.ipynb): A simplified version of previous year's new task tutorial.
- [`assignemnt3_spk.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment3_spk.ipynb): Examples of using ESPnet to extract speaker embeddings and conduct speaker recognition.
- [`assignment4_ssl.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment4_ssl.ipynb): Exploration on using self-supervised speech representation to ESPnet ASR training.
- [`assignment5_st.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment5_st.ipynb): Examples of state-of-the-art speech translation models in ESPnet.
- [`assignment6_slu.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment6_slu.ipynb): Examples of state-of-the-art spoken language understanding models in ESPnet.
- [`assignment7_se.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment7_se.ipynb): Examples of state-of-the-art speech enhancement/separation in ESPnet.
- [`assignment8_tts.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/assignment8_tts.ipynb): A student version of espnet2-tts realtime demonstration.
- [`s2st_demo.ipynb`](ESPnet2/Course/CMU_SpeechProcessing_Spring2023/s2st_demo.ipynb): An example of existing speech-to-speech translation model for ESPnet.
### CMU SpeechRecognition Fall2022
- [`recipe_tutorial.ipynb`](ESPnet2/Course/CMU_SpeechRecognition_Fall2022/recipe_tutorial.ipynb): A general tutorial of stage-by-stage explanation of ESPnet2 recipes (with new functions).
- [`new_task_tutorial.ipynb`](ESPnet2/Course/CMU_SpeechRecognition_Fall2022/new_task_tutorial.ipynb): A tutorial on how to add new models/tasks to ESPnet framework.
### CMU SpeechRecognition Fall2021
- [`general_tutorial.ipynb`](ESPnet2/Course/CMU_SpeechRecognition_Fall2021/general_tutorial.ipynb): A general tutorial of stage-by-stage explanation of ESPnet2 recipes.
## ESPnet1 (Legacy)
- [`asr_library.ipynb`](ESPnet1/asr_library.ipynb): Speech recognition library explanation with network training.
- [`asr_recipe.ipynb`](ESPnet1/asr_recipe.ipynb): Speech recognition recipe explanation.
- [`pretrained.ipynb`](ESPnet1/pretrained.ipynb): Tutorial on how to use pre-trained models.
- [`st_demo.ipynb`](ESPnet1/st_demo.ipynb): Speech translation demonstration with a TTS model to achieve speech-to-speech translation.
- [`tts_realtime_demo.ipynb`](ESPnet1/tts_realtime_demo.ipynb): TTS demonstration with different pre-trained TTS models.
- [`tts_recipe.ipynb`](ESPnet1/tts_recipe.ipynb): Stage explanation for TTS recipes.