{"id":23725980,"url":"https://github.com/voidful/asrp","last_synced_at":"2025-06-15T17:34:02.412Z","repository":{"id":50746173,"uuid":"368447583","full_name":"voidful/asrp","owner":"voidful","description":"ASR text preprocessing utility","archived":false,"fork":false,"pushed_at":"2024-08-05T18:26:53.000Z","size":16396,"stargazers_count":21,"open_issues_count":0,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-07T13:07:39.277Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/voidful.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-18T07:58:29.000Z","updated_at":"2024-12-02T15:18:04.000Z","dependencies_parsed_at":"2025-01-10T12:43:11.537Z","dependency_job_id":"913518e6-90b3-45ea-b38c-296659e0ef35","html_url":"https://github.com/voidful/asrp","commit_stats":{"total_commits":59,"total_committers":2,"mean_commits":29.5,"dds":"0.016949152542372836","last_synced_commit":"6f4f48dbfd9adf97e31f5751eff1d0165cc82f08"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fasrp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fasrp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fasrp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fasrp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/voidful","download_url":"https://codeload.github.com/voidful/asrp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252883220,"owners_count":21819161,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-31T00:18:05.088Z","updated_at":"2025-05-07T13:07:44.342Z","avatar_url":"https://github.com/voidful.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ASRP: Automatic Speech Recognition Preprocessing Utility\n\nASRP is a python package that offers a set of tools to preprocess and evaluate ASR (Automatic Speech Recognition) text.\nThe package also provides a speech-to-text transcription tool and a text-to-speech conversion tool. The code is\nopen-source and can be installed using pip.\n\nKey Features\n\n- [Preprocess ASR text with ease](#preprocess)\n- [Evaluate ASR output quality](#Evaluation)\n- [Transcribe speech to Hubert code](#speech-to-discrete-unit)\n- [Convert unit code to speech](#discrete-unit-to-speech)\n- [Enhance speech quality with a noise reduction tool](#speech-enhancement)\n- [LiveASR tool for real-time speech recognition](#liveasr---huggingfaces-model)\n- [Speaker Embedding Extraction (x-vector/d-vector)](#speaker-embedding-extraction---x-vector)\n\n## install\n\n`pip install asrp`\n\n## Preprocess\n\nASRP offers an easy-to-use set of functions to preprocess ASR text data.   \nThe input data is a dictionary with the key 'sentence', and the output is the preprocessed text.     \nYou can either use the fun_en function or use dynamic loading. Here's how to use it:\n\n```python\nimport asrp\n\nbatch_data = {\n    'sentence': \"I'm fine, thanks.\"\n}\nasrp.fun_en(batch_data)\n```\n\ndynamic loading\n\n```python\nimport asrp\n\nbatch_data = {\n    'sentence': \"I'm fine, thanks.\"\n}\npreprocessor = getattr(asrp, 'fun_en')\npreprocessor(batch_data)\n```\n\n## Evaluation\n\nASRP provides functions to evaluate the output quality of ASR systems using     \nthe Word Error Rate (WER) and Character Error Rate (CER) metrics.   \nHere's how to use it:\n\n```python\nimport asrp\n\ntargets = ['HuggingFace is great!', 'Love Transformers!', 'Let\\'s wav2vec!']\npreds = ['HuggingFace is awesome!', 'Transformers is powerful.', 'Let\\'s finetune wav2vec!']\nprint(\"chunk size WER: {:2f}\".format(100 * asrp.chunked_wer(targets, preds, chunk_size=None)))\nprint(\"chunk size CER: {:2f}\".format(100 * asrp.chunked_cer(targets, preds, chunk_size=None)))\n```\n\n## Speech to Discrete Unit\n\n```python\nimport asrp\nimport nlp2\n\n# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md\n# https://github.com/facebookresearch/fairseq/tree/main/examples/textless_nlp/gslm/ulm\nnlp2.download_file(\n    'https://huggingface.co/voidful/mhubert-base/resolve/main/mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', './')\nhc = asrp.HubertCode(\"voidful/mhubert-base\", './mhubert_base_vp_en_es_fr_it3_L11_km1000.bin', 11,\n                     chunk_sec=30,\n                     worker=20)\nhc('voice file path')\n```\n\n## Discrete Unit to speech\n\n```python\nimport asrp\n\ncode = []  # discrete unit\n# https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/unit2speech\n# https://github.com/facebookresearch/fairseq/blob/ust/examples/speech_to_speech/docs/textless_s2st_real_data.md\ncs = asrp.Code2Speech(tts_checkpoint='./tts_checkpoint_best.pt', waveglow_checkpint='waveglow_256channels_new.pt')\ncs(code)\n\n# play on notebook\nimport IPython.display as ipd\n\nipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)\n```\n\nmhubert English hifigan vocoder example\n\n```python\nimport asrp\nimport nlp2\nimport IPython.display as ipd\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\nnlp2.download_file(\n    'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',\n    './')\n\n\ntokenizer = AutoTokenizer.from_pretrained(\"voidful/mhubert-unit-tts\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"voidful/mhubert-unit-tts\")\nmodel.eval()\ncs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')\n\ninputs = tokenizer([\"The quick brown fox jumps over the lazy dog.\"], return_tensors=\"pt\")\ncode = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]\ncode = [int(i) for i in code.replace(\"\u003c/s\u003e\",\"\").replace(\"\u003cs\u003e\",\"\").split(\"v_tok_\")[1:]]\nprint(code)\nipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)\n\n```\n\n## Speech Enhancement\n\nASRP also provides a tool to enhance speech quality with a noise reduction tool.  \nfrom https://github.com/facebookresearch/fairseq/tree/main/examples/speech_synthesis/preprocessing/denoiser\n\n```python\nfrom asrp import SpeechEnhancer\n\nase = SpeechEnhancer()\nprint(ase('./test/xxx.wav'))\n```\n\n## LiveASR - huggingface's model\n\n* modify from https://github.com/oliverguhr/wav2vec2-live\n\n```python\nfrom asrp.live import LiveSpeech\n\nenglish_model = \"voidful/wav2vec2-xlsr-multilingual-56\"\nasr = LiveSpeech(english_model, device_name=\"default\")\nasr.start()\n\ntry:\n    while True:\n        text, sample_length, inference_time = asr.get_last_text()\n        print(f\"{sample_length:.3f}s\"\n              + f\"\\t{inference_time:.3f}s\"\n              + f\"\\t{text}\")\n\nexcept KeyboardInterrupt:\n    asr.stop()\n```\n\n## LiveASR - whisper's model\n\n```python\nfrom asrp.live import LiveSpeech\n\nwhisper_model = \"tiny\"\nasr = LiveSpeech(whisper_model, vad_mode=2, language='zh')\nasr.start()\nlast_text = \"\"\nwhile True:\n    asr_text = \"\"\n    try:\n        asr_text, sample_length, inference_time = asr.get_last_text()\n        if len(asr_text) \u003e 0:\n            print(asr_text, sample_length, inference_time)\n    except KeyboardInterrupt:\n        asr.stop()\n        break\n\n```\n\n## Speaker Embedding Extraction - x vector\n\nfrom https://speechbrain.readthedocs.io/en/latest/API/speechbrain.lobes.models.Xvector.html\n\n```python\nfrom asrp.speaker_embedding import extract_x_vector\n\nextract_x_vector('./test/xxx.wav')\n```\n\n## Speaker Embedding Extraction - d vector\n\nfrom https://github.com/yistLin/dvector\n\n```python\nfrom asrp.speaker_embedding import extract_d_vector\n\nextract_d_vector('./test/xxx.wav')\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fasrp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvoidful%2Fasrp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fasrp/lists"}