{"id":13804804,"url":"https://github.com/thecodrr/vspeech","last_synced_at":"2026-03-06T07:32:38.996Z","repository":{"id":66253903,"uuid":"227345788","full_name":"thecodrr/vspeech","owner":"thecodrr","description":"📢 Complete V bindings for Mozilla's DeepSpeech TensorFlow based Speech-to-Text library. 📜","archived":false,"fork":false,"pushed_at":"2020-01-15T17:37:00.000Z","size":30,"stargazers_count":50,"open_issues_count":1,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-29T17:41:38.505Z","etag":null,"topics":["deepspeech","machine-learning","mozilla","speech-to-text","tensorflow","v"],"latest_commit_sha":null,"homepage":"","language":"V","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thecodrr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":"thecodrr","tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2019-12-11T11:06:00.000Z","updated_at":"2025-02-12T20:23:07.000Z","dependencies_parsed_at":"2023-02-20T21:30:48.644Z","dependency_job_id":null,"html_url":"https://github.com/thecodrr/vspeech","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thecodrr/vspeech","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thecodrr%2Fvspeech","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thecodrr%2Fvspeech/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thecodrr%2Fvspeech/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thecodrr%2Fvspeech/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thecodrr","download_url":"https://codeload.github.com/thecodrr/vspeech/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thecodrr%2Fvspeech/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30165636,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T04:43:31.446Z","status":"ssl_error","status_checked_at":"2026-03-06T04:40:30.133Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deepspeech","machine-learning","mozilla","speech-to-text","tensorflow","v"],"created_at":"2024-08-04T01:00:54.114Z","updated_at":"2026-03-06T07:32:38.974Z","avatar_url":"https://github.com/thecodrr.png","language":"V","funding_links":["https://ko-fi.com/thecodrr"],"categories":["Libraries"],"sub_categories":["Audio"],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e📣 vSpeech 📜\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\nV bindings for \u003ca href=\"\"\u003eMozilla's DeepSpeech\u003c/a\u003e \u003ca href=\"\"\u003eTensorFlow\u003c/a\u003e based library for Speech-to-Text.\n\u003c/p\u003e\n\u003ca href=\"https://gifyu.com/image/vOMh\"\u003e\u003cimg src=\"https://s5.gifyu.com/images/showb3037c75870403f5.gif\" alt=\"showb3037c75870403f5.gif\" border=\"0\" /\u003e\u003c/a\u003e\u003c/div\u003e\n\n## Installation:\n\nInstall using `vpkg`\n\n```bash\nvpkg get https://github.com/thecodrr/vspeech\n```\n\nInstall using `V`'s builtin `vpm` (you will need to import the module with: `import thecodrr.vspeech` with this method of installation):\n\n```shell\nv install thecodrr.vspeech\n```\n\nInstall using `git`:\n\n```bash\ncd path/to/your/project\ngit clone https://github.com/thecodrr/vspeech\n```\n\nYou can use [thecodrr.vave](https://github.com/thecodrr/vave) for reading WAV files.\n\nThen in the wherever you want to use it:\n\n```v\nimport thecodrr.vspeech //OR simply vave depending on how you installed\n// Optional\nimport thecodrr.vave\n```\n\n### Manual:\n\n**Perform the following steps:**\n\n1. Download the latest `native_client.\u003cyour system\u003e.tar.xz` matching your system from [DeepSpeech's Releases](https://github.com/mozilla/DeepSpeech/releases/tag/v0.6.1).\n\n2. Extract the `.tar.xz` into your project directory in `libs` folder. **It MUST be in the libs folder. If you don't have one, create it and extract into it.**\n\n3. Download `pre-trained` model from [DeepSpeech's Releases](https://github.com/mozilla/DeepSpeech/releases/tag/v0.6.1) (the file named `deepspeech-0.6.1-models.tar.gz`). It's pretty big (1.1G) so make sure you have the space.\n\n4. Extract the model anywhere you like on your system.\n\n5. **Extra:** If you don't have any audio files for testing etc. you can download the samples from [DeepSpeech's Releases](https://github.com/mozilla/DeepSpeech/releases/tag/v0.6.1) (the file named `audio-0.6.1.tar.gz`)\n\n6. When you are done, run this command in your project directory:\n\n   ```\n   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/lib/\n   ```\n\nAnd done!\n\n### Automatic:\n\n_// TODO_\n\nI will add a `bash` script for automating this process including the downloading and extracting etc. PRs welcome.\n\n## Usage\n\n**There is a complete example of how to use this module in [`cmd/main.v`](https://github.com/thecodrr/vspeech/tree/master/cmd/main.v)**\n\n```v\nimport thecodrr.vspeech\n// specify values for use later\nconst (\n    beam_width            = 300\n    lm_weight             = 0.75\n    valid_word_count_weight = 1.85\n)\n// create a new model\nmut model := vspeech.new(\"/path/to/the/model.pbmm\", 1)\n\nlm := \"/path/to/the/lm/file\" //its in the models archive\ntrie := \"/path/to/the/trie/file\" //its in the models archive\n// enable the decoder with language model (optional)\nmodel.enable_decoder_with_lm(lm, trie, lm_weight, valid_word_count_weight)\n\ndata := byteptr(0)//raw audio samples (use thecodrr.vave module for this)\ndata_len := 0 //the total length of the buffer\n// convert the audio to text\ntext := model.speech_to_text(data, data_len)\nprintln(text)\n\n// make sure to free everything\nunsafe {\n    model.free()\n    model.free_string(text)\n}\n```\n\n## API\n\n#### `vspeech.new(model_path, beam_size)`\n\nCreates a new `Model` with the specified `model_path` and `beam_size`.\n\n`beam_size` decides the balance between accuracy and cost. The larger the `beam_size` the more accurate the decoding will be but at the cost of time and resources.\n\n`model_path` is the path to the model file. It is the file with `.pb` extension but it is better to use `.pbmm` file as it is mmapped and is lighter on the RAM.\n\n### Model `struct`\n\nThe main `struct` represents the interface to the underlying model. It has the following methods:\n\n#### 1. `enable_decoder_with_lm(lm_path, trie_path, lm_weight, valid_word_count_weight)`\n\nLoad the Language Model and enable the decoder to use it. Read the method comments to know what each `param` does.\n\n#### 2. `get_model_sample_rate()`\n\nUse this to get the sample rate expected by the model. The audio samples you need converted **MUST** match this sample rate.\n\n#### 3. `speech_to_text(buffer, buffer_size)`\n\nThis is the method that you are looking for. It's where all the magic happens (and also all the bugs).\n\n`buffer` is the audio data that needs to be decoded. Currently DeepSpeech supports 16-bit RAW PCM audio stream at the appropriate sample rate. You can use [thecodrr.vave](https://github.com/thecodrr/vave) to read audio samples from a WAV file.\n\n`buffer_size` is the total number of bytes in the buffer\n\n#### 4. `speech_to_text_with_metadata(buffer, buffer_size)`\n\nSame as `speech_to_text` except this returns a `Metadata` struct that you can use for output analysis etc.\n\n#### 5. `create_stream()`\n\nCreate a stream for streaming audio data (from a microphone for example) into the decoder. This, however, isn't an actual stream i.e. there's no seek etc. This will initialize the streaming_state`in your`Model` instance which you can use as mentioned below.\n\n#### 6. `free()`\n\nFree the `Model`\n\n#### 7. `free_string(text)`\n\nFree the `string` the decoder outputted in `speech_to_text`.\n\n### StreamingState\n\nThe streaming state is used to handle pseudo-streaming of audio content into the decoder. It exposes the following methods:\n\n#### 1. `feed_audio_content(buffer, buffer_size)`\n\nUse this for feeding multiple chunks of data into the stream continuously.\n\n#### 2. `intermediate_decode()`\n\nYou can use this to get the output of the current data in the stream. However, this is quite expensive due to no streaming capabilities in the decoder. Use this only when necessary.\n\n#### 3. `finish_stream()`\n\nCall this when streaming is finished and you want the final output of the whole stream.\n\n#### 4. `finish_stream_with_metadata()`\n\nSame as `finish_stream` but returns a `Metadata` struct which you can use to analyze the output.\n\n#### 5. `free()`\n\nCall this when done to free the captured StreamingState.\n\n### Metadata\n\n**Fields:**\n\n`items` An array of `MetadataItem`s\n\n`num_items` Total number of items in the items array.\n\n`confidence` Approximated confidence value for this transcription\n\n**Methods:**\n\n`get_items()` - Converts the C pointer `MetadataItem` array into V array which you can iterate over normally.\n\n`get_text()` - Helper method to get the combined text from all the `MetadataItem`s outputting the result in one `string`.\n\n`free()` - Free the `Metadata` instance\n\n### MetadataItem\n\n**Fields:**\n\n`character` - The character generated for transcription\n\n`timestep` - Position of the character in units of 20ms\n\n`start_time` - Position of the character in seconds\n\n**Methods:**\n\n`str()` - Combine and output all the data in the `MetadataItem` nicely into a `string`.\n\n### Find this library useful? :heart:\n\nSupport it by joining **[stargazers](https://github.com/thecodrr/vspeech/stargazers)** for this repository. :star:or [buy me a cup of coffee](https://ko-fi.com/thecodrr)\nAnd **[follow](https://github.com/thecodrr)** me for my next creations! 🤩\n\n# License\n\n```xml\nMIT License\n\nCopyright (c) 2019 Abdullah Atta\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthecodrr%2Fvspeech","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthecodrr%2Fvspeech","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthecodrr%2Fvspeech/lists"}