{"id":20285418,"url":"https://github.com/mramshaw/speech-recognition","last_synced_at":"2025-04-11T08:39:05.212Z","repository":{"id":92906446,"uuid":"126666357","full_name":"mramshaw/Speech-Recognition","owner":"mramshaw","description":"Speech recognition with Python","archived":false,"fork":false,"pushed_at":"2024-08-20T10:50:05.000Z","size":2526,"stargazers_count":19,"open_issues_count":3,"forks_count":11,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-03-25T06:11:18.204Z","etag":null,"topics":["microprocessor","monotonic","nlp","pocketsphinx","pyaudio","python","raspberry-pi","raspberry-pi-3","speech-recognition"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mramshaw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-25T05:05:38.000Z","updated_at":"2025-02-27T23:25:14.000Z","dependencies_parsed_at":"2024-08-20T12:56:01.558Z","dependency_job_id":"358b5992-1219-40c4-955f-716637e720b1","html_url":"https://github.com/mramshaw/Speech-Recognition","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FSpeech-Recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FSpeech-Recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FSpeech-Recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mramshaw%2FSpeech-Recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mramshaw","download_url":"https://codeload.github.com/mramshaw/Speech-Recognition/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248362190,"owners_count":21091065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["microprocessor","monotonic","nlp","pocketsphinx","pyaudio","python","raspberry-pi","raspberry-pi-3","speech-recognition"],"created_at":"2024-11-14T14:26:32.559Z","updated_at":"2025-04-11T08:39:05.191Z","avatar_url":"https://github.com/mramshaw.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speech Recognition with Python\n\n[![Known Vulnerabilities](http://snyk.io/test/github/mramshaw/Speech-Recognition/badge.svg?style=plastic\u0026targetFile=requirements.txt)](http://snyk.io/test/github/mramshaw/Speech-Recognition?style=plastic\u0026targetFile=requirements.txt)\n\nI stumbled across this great tutorial, so why not try it out?\n\n    http://realpython.com/python-speech-recognition/\n\nAs recommended, we will use [SpeechRecognition](http://github.com/Uberi/speech_recognition).\n\nAfter setting up my own repo, I found the author's:\n\n    http://github.com/realpython/python-speech-recognition\n\n![Raspberry](images/favicon.png)\n\nThis also works with Raspberry Pi (using Python 3).\n\n## Contents\n\nThe contents are as follows:\n\n* [Prerequisites](#prerequisites)\n    * [For microphone use](#for-microphone-use)\n    * [Optional: monotonic (for Python 2)](#optional-monotonic-for-python-2)\n    * [For speech recognition](#for-speech-recognition)\n* [Speech Engine](#speech-engine)\n    * [Smoke Test](#smoke-test)\n    * [Ambient Noise](#ambient-noise)\n* [Speech testing](#speech-testing)\n* [And finally, the guessing game](#and-finally-the-guessing-game)\n* [Raspberry Pi](#raspberry-pi)\n* [To Do](#to-do)\n\n## Prerequisites\n\nPython 3 and `pip` installed (Python 2 is scheduled for End-of-life, although the instructions and\ncode have been tested with Python 2 and an approprate `requirements` file for Python 2 is provided).\n\n#### For microphone use\n\n1. Check for `pyaudio`:\n\n    ``` Python\n    \u003e\u003e\u003e import pyaudio as pa\n    Traceback (most recent call last):\n      File \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\n    ImportError: No module named pyaudio\n    \u003e\u003e\u003e\n    ```\n\n[The next step is for linux; check the [pyaudio requirements](http://people.csail.mit.edu/hubert/pyaudio/#downloads) first.]\n\n2. Install `portaudio19-dev`:\n\n    ```\n    $ sudo apt-get install portaudio19-dev\n    ```\n\n3. Install `pyaudio`:\n\n    ```\n    $ pip install --user pyaudio\n    ```\n\n4. Verify installation:\n\n    ``` Python\n    \u003e\u003e\u003e import pyaudio as pa\n    \u003e\u003e\u003e pa.__version__\n    '0.2.11'\n    \u003e\u003e\u003e\n    ```\n\n#### Optional: monotonic (for Python 2)\n\n[SpeechRecognition](http://github.com/Uberi/speech_recognition#monotonic-for-python-2-for-faster-operations-in-some-functions-on-python-2)\nrecommends installing [monotonic](http://pypi.python.org/pypi/monotonic) for Python 2 users.\n\n1. Check for `monotonic`:\n\n    ```\n    $ pip list --format=freeze | grep monotonic\n    ```\n\n2. Install `monotonic`:\n\n    ```\n    $ pip install --user monotonic\n    ```\n\n3. Verify installation:\n\n    ```\n    $ pip list --format=freeze | grep monotonic\n    monotonic==1.4\n    $\n    ```\n\n#### For speech recognition\n\nSpeechRecognition can be used as a _sound recorder_:\n\n    http://github.com/Uberi/speech_recognition/blob/master/examples/write_audio.py\n\nThis is probably fine for occasional use - but there are better options available.\n\n1. Check for `SpeechRecognition`:\n\n    ```\n    $ pip list --format=freeze | grep SpeechRecognition\n    ```\n\n2. Install `SpeechRecognition`:\n\n    ```\n    $ pip install --user SpeechRecognition\n    ```\n\n3. Verify:\n\n    ``` Python\n    \u003e\u003e\u003e import speech_recognition as sr\n    \u003e\u003e\u003e sr.__version__\n    '3.8.1'\n    \u003e\u003e\u003e\n    ```\n\n\n## Speech Engine\n\nThe tutorial uses the __Google Web Speech API__, however installing [PocketSphinx](http://cmusphinx.github.io/)\n(which can work offline) is fairly easy.\n\n[Snowboy](http://snowboy.kitt.ai/) (which can also work offline) is an option for Hotword Detection, but perhaps\nunsuitable for speech recognition (SpeechRecognition tellingly refers to Snowboy as \"Snowboy Hotword Detection\").\n\nFor another online option, there is [Wit.ai](http://github.com/wit-ai/pywit) (which also has a [Node.js SDK](http://github.com/wit-ai/node-wit)).\n\n#### Smoke Test\n\nThe final step may take a few seconds to execute:\n\n``` Python\n\u003e\u003e\u003e import speech_recognition as sr\n\u003e\u003e\u003e r = sr.Recognizer()\n\u003e\u003e\u003e harvard = sr.AudioFile('audio_files/harvard.wav')\n\u003e\u003e\u003e with harvard as source:\n...     audio = r.record(source)\n... \n\u003e\u003e\u003e type(audio)\n\u003cclass 'speech_recognition.AudioData'\u003e\n\u003e\u003e\u003e r.recognize_google(audio)\nu'the stale smell of old beer lingers it takes heat to bring out the odor a cold dip restores health and zest a salt pickle taste fine with ham tacos al Pastore are my favorite a zestful food is the hot cross bun'\n\u003e\u003e\u003e \n```\n\n#### Ambient Noise\n\n``` Python\n\u003e\u003e\u003e jackhammer = sr.AudioFile('audio_files/jackhammer.wav')\n\u003e\u003e\u003e with jackhammer as source:\n...     audio = r.record(source)\n... \n\u003e\u003e\u003e r.recognize_google(audio)\nu'the snail smell of old beer drinkers'\n\u003e\u003e\u003e with jackhammer as source:\n...     r.adjust_for_ambient_noise(source)\n...     audio = r.record(source)\n... \n\u003e\u003e\u003e r.recognize_google(audio)\nu'still smell old gear vendors'\n\u003e\u003e\u003e \n```\n\n[Slightly different from the tutorial's `the snail smell of old gear vendors` and `still smell of old beer vendors`.]\n\nAnd:\n\n``` Python\n\u003e\u003e\u003e with jackhammer as source:\n...     r.adjust_for_ambient_noise(source, duration=0.5)\n...     audio = r.record(source)\n... \n\u003e\u003e\u003e r.recognize_google(audio)\nu'the snail smell like old beermongers'\n\u003e\u003e\u003e\n```\n\n[Pretty much the same as `the snail smell like old Beer Mongers`.]\n\n\n## Speech testing\n\nUsing the speech recognition module:\n\n    $ python -m speech_recognition\n    A moment of silence, please...\n    Set minimum energy threshold to 259.109953712\n    Say something!\n    Got it! Now to recognize it...\n    You said hello hello\n    Got it! Now to recognize it...\n    You said the rain in Spain\n    Say something!\n    ^C$\n\nAnd:\n\n``` Python\n\u003e\u003e\u003e with mic as source:\n...     audio = r.listen(source)\n... \n\u003e\u003e\u003e r.recognize_google(audio)\nu'Shazam'\n\u003e\u003e\u003e\n```\n\nAnd, as stated in the article, a loud hand-clap generates an exception:\n\n``` Python\n\u003e\u003e\u003e with mic as source:\n...     audio = r.listen(source)\n... \n\u003e\u003e\u003e r.recognize_google(audio)\nTraceback (most recent call last):\n  File \"\u003cstdin\u003e\", line 1, in \u003cmodule\u003e\n  File \"/home/owner/.local/lib/python2.7/site-packages/speech_recognition/__init__.py\", line 858, in recognize_google\n    if not isinstance(actual_result, dict) or len(actual_result.get(\"alternative\", [])) == 0: raise UnknownValueError()\nspeech_recognition.UnknownValueError\n\u003e\u003e\u003e\n```\n\n\n## And finally, the guessing game\n\nRun the guessing game as follows:\n\n    $ python guessing_game.py\n    I'm thinking of one of these words:\n    apple, banana, grape, orange, mango, lemon\n    You have 3 tries to guess which one.\n    \n    Guess 1. Speak!\n    You said: banana\n    Incorrect. Try again.\n    \n    Guess 2. Speak!\n    You said: Orange\n    Incorrect. Try again.\n    \n    Guess 3. Speak!\n    You said: mango\n    Sorry, you lose!\n    I was thinking of 'apple'.\n    $\n\n\n## Raspberry Pi\n\n![Raspberry Pi](images/little_pi.png)\n\nAfter hooking up a Raspberry Pi with a Logitech 4000 webcam (for its microphone)\nand configuring with AlsaMixer, everything worked pretty much as expected with\nPython 3.\n\nThere were some installation stumbles, but `sudo apt-get update` fixed them.\n\nIt turned out that `flac` was required so it was also installed.\n\n\n## To Do\n\n- [x] Add original License (this is probably 'fair use' but better safe than sorry)\n- [x] Add `monotonic` as an optional component for Python 2\n- [x] Retry with PocketSphinx (works offline)\n- [x] Retry with [Snowboy](http://snowboy.kitt.ai/) (works offline)\n- [ ] Retry with [Wit.ai](http://github.com/wit-ai/pywit) (which also has a [Node.js SDK](http://github.com/wit-ai/node-wit))\n- [x] Try with Raspberry Pi (works nicely)\n- [x] Update for recent versions of `pip`\n- [x] Update code to conform to `pylint`, `pycodestyle` and `pydocstyle`\n- [x] Update `requirements` files to fix Snyk.io quibbles\n- [x] Update code for Python 3\n- [x] Add table of Contents\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmramshaw%2Fspeech-recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmramshaw%2Fspeech-recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmramshaw%2Fspeech-recognition/lists"}