{"id":13958542,"url":"https://github.com/cywang97/StreamingTransformer","last_synced_at":"2025-07-21T00:31:08.576Z","repository":{"id":48164962,"uuid":"262717238","full_name":"cywang97/StreamingTransformer","owner":"cywang97","description":null,"archived":false,"fork":false,"pushed_at":"2021-01-15T08:15:35.000Z","size":167024,"stargazers_count":272,"open_issues_count":10,"forks_count":42,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-28T02:34:48.699Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cywang97.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-10T05:14:29.000Z","updated_at":"2024-09-05T08:28:03.000Z","dependencies_parsed_at":"2022-07-26T10:18:22.165Z","dependency_job_id":null,"html_url":"https://github.com/cywang97/StreamingTransformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cywang97/StreamingTransformer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cywang97%2FStreamingTransformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cywang97%2FStreamingTransformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cywang97%2FStreamingTransformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cywang97%2FStreamingTransformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cywang97","download_url":"https://codeload.github.com/cywang97/StreamingTransformer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cywang97%2FStreamingTransformer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266221257,"owners_count":23894965,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-08T13:01:42.776Z","updated_at":"2025-07-21T00:31:08.113Z","avatar_url":"https://github.com/cywang97.png","language":"Python","readme":"# Streaming Transformer\n**This repo contains the streaming Transformer of our work ``On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition``, which is based on ESPnet0.6.0. The streaming Transformer includes a streaming encoder, either chunk-based or look-ahead based, and a trigger-attention based decoder.**\n\nWe will release following models and show reproducible results on Librispeech\n\n*  Streaming_transformer-chunk32 with ESPnet Conv2d Encoder. (https://drive.google.com/file/d/1LSBY_vK50Jxvw_GeiYrPwRtJ0DsKU6zL/view?usp=sharing)\n\n*  Streaming_transformer-chunk32 with VGG Encoder. (https://drive.google.com/file/d/12P6TsxtOCxrHezqgtk0USjSKBsYHIe7K/view?usp=sharing)\n\n*  Streaming_transformer-lookahead with ESPnet Conv2d Encoder. (https://drive.google.com/file/d/1YJQaofzsk9_KsL2W9Zb42sGLRRIKRs9X/view?usp=sharing)\n\n*  Streaming_transformer-lookahead with VGG Encoder. (https://drive.google.com/file/d/1LO_0pPxU5XJffqJMgtx4W4IL-Aih5m0M/view?usp=sharing)\n\n## Results on Librispeech (beam=10)\n| Model        | test-clean   |  test-other  |latency  |size  |\n| --------   | -----:  | :----:  |:----:  |:----:  |\n| streaming_transformer-chunk32-conv2d     | 2.8   |   7.5  | 640ms  | 78M |\n| streaming_transformer-chunk32-vgg\t| 2.8 | 7.0| 640ms | 78M |\n| streaming_transformer-lookahead2-conv2d | 3.0 | 8.6| 1230ms | 78M |\n| streaming_transformer-lookahead2-vgg | 2.8 | 7.5 | 1230ms | 78M  |\n\n\n\n\n## Installation\nOur installation follow the installation process of ESPnet\n### Step 1. setting of the environment\n    CUDAROOT=/path/to/cuda\n    \n    export PATH=$CUDAROOT/bin:$PATH\n    export LD_LIBRARY_PATH=$CUDAROOT/lib64:$LD_LIBRARY_PATH\n    export CFLAGS=\"-I$CUDAROOT/include $CFLAGS\"\n    export CUDA_HOME=$CUDAROOT\n    export CUDA_PATH=$CUDAROOT`\n### Step 2. installation including Kaldi\n    cd tools\n    make -j 10\n    \n## Build a streaming Transformer model\n### Step 1. Data Prepare\n    cd egs/librispeech/asr1\n    ./run.sh \nBy default. the processed data will stored in the current directory. You can change the path by editing the scripts.\n### Step 2. Viterbi decoding\nTo train a TA based streaming Transformer, the alignments between CTC paths and transcriptions are required. In our work, we apply Viterbi decoding using the offline Transformer model.\n\n    cd egs/librispeech/asr1\n    ./viterbi_decode.sh /path/to/model\n\n\n### Step 3. Train a streaming Transformer\nHere, we train a chunk-based streaming Transformer which is initialized with an offline Transformer provided by ESPnet. Set `enc-init` in `conf/train_streaming_transformer.yaml` to the path of your offline model.\n\n\tcd egs/librispeech/asr1\n\t./train.sh\n\nIf you want to train a look-ahead based streaming Transformer, set `chunk` to False and change the `left-window, right-window, dec-left-window, dec-right-window` arguments. The training log is written in `exp/streaming_transformer/train.log`. You can monitor the output through `tail -f exp/streaming_transformer/train.log`\n\n### Step 4. Decoding\nExecute the following script with to decoding on test_clean and test_other sets\n\n\t./decode.sh num_of_gpu job_per_gpu\n\n### Offline Transformer Reference\nRegarding the offline Transformer model, Please visit [here](https://github.com/MarkWuNLP/SemanticMask)\n\n","funding_links":[],"categories":["语音识别"],"sub_categories":["网络服务_其他"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcywang97%2FStreamingTransformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcywang97%2FStreamingTransformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcywang97%2FStreamingTransformer/lists"}