{"id":13586301,"url":"https://github.com/pannous/tensorflow-speech-recognition","last_synced_at":"2025-05-15T09:02:38.778Z","repository":{"id":64672497,"uuid":"47544193","full_name":"pannous/tensorflow-speech-recognition","owner":"pannous","description":"🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks","archived":false,"fork":false,"pushed_at":"2024-01-17T14:27:13.000Z","size":32630,"stargazers_count":2171,"open_issues_count":33,"forks_count":635,"subscribers_count":186,"default_branch":"master","last_synced_at":"2025-04-14T14:59:44.463Z","etag":null,"topics":["deep-learning","neural-network","speech-recognition","speech-to-text","stt","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pannous.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["pannous"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2015-12-07T10:02:43.000Z","updated_at":"2025-04-02T01:57:57.000Z","dependencies_parsed_at":"2024-08-01T16:32:19.936Z","dependency_job_id":"e9e60233-44f0-49e9-aa08-6a91643e1f3c","html_url":"https://github.com/pannous/tensorflow-speech-recognition","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pannous%2Ftensorflow-speech-recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pannous%2Ftensorflow-speech-recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pannous%2Ftensorflow-speech-recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pannous%2Ftensorflow-speech-recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pannous","download_url":"https://codeload.github.com/pannous/tensorflow-speech-recognition/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254310513,"owners_count":22049468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","neural-network","speech-recognition","speech-to-text","stt","tensorflow"],"created_at":"2024-08-01T15:05:27.654Z","updated_at":"2025-05-15T09:02:38.754Z","avatar_url":"https://github.com/pannous.png","language":"Python","funding_links":["https://github.com/sponsors/pannous"],"categories":["Python"],"sub_categories":[],"readme":"# Tensorflow Speech Recognition\nSpeech recognition using google's [tensorflow](https://github.com/tensorflow/tensorflow/) deep learning framework, [sequence-to-sequence](https://www.tensorflow.org/versions/master/tutorials/seq2seq/index.html) neural networks.\n\nReplaces [caffe-speech-recognition](https://github.com/pannous/caffe-speech-recognition), see there for some background.\n\n\n## Update 2024: Use **Whisper** !\n\nThis (relatively) old project is NO LONGER UP TO DATE. \nThe tensorflow 1.0 used is not compatible anymore and the theory is no longer state of the art either. \nWe highly recommend you check out and use [whisper](https://github.com/ggerganov/whisper.cpp)\n\n## Update 2020: **Mozilla** released [DeepSpeech](https://github.com/mozilla/DeepSpeech)\nThey achieve good [error rates](http://doyouunderstand.me). Free Speech is in good hands, go *there* if you are an end user.\nFor now *this* project is only maintained for educational purposes.\n\n\n## Ultimate goal\nCreate a decent standalone speech recognition for Linux etc.\nSome people say we have the models but not enough training data.\nWe disagree: There is plenty of training data (100GB [here](http://www.openslr.org/12) and 21GB [here on openslr.org](http://www.openslr.org/7/) , synthetic Text to Speech snippets, Movies with transcripts, Gutenberg, YouTube with captions etc etc) we just need a simple yet powerful model. It's only a question of time...\n\n![Sample spectrogram, That's what she said, too laid?](images/0_Karen_160.png)\n\nSample spectrogram, Karen uttering 'zero' with 160 words per minute.\n## Installation\n### clone code\n```\ngit clone https://github.com/pannous/tensorflow-speech-recognition\ncd tensorflow-speech-recognition\ngit clone https://github.com/pannous/layer.git\ngit clone https://github.com/pannous/tensorpeers.git\n```\n\n### pyaudio\n#### requirements portaudio from http://www.portaudio.com/\n```\ngit clone https://git.assembla.com/portaudio.git\n./configure --prefix=/path/to/your/local\nmake\nmake install\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/local/lib\nexport LIDRARY_PATH=$LIBRARY_PATH:/path/to/your/local/lib\nexport CPATH=$CPATH:/path/to/your/local/include\nsource ~/.bashrc\n```\n#### install pyaudio\n```\npip install pyaudio\n```\n\n## Getting started\n\nToy examples:\n`./number_classifier_tflearn.py`\n`./speaker_classifier_tflearn.py`\n\nSome less trivial architectures:\n`./densenet_layer.py`\n\nLater:\n`./train.sh`\n`./record.py`\n\n![Sample spectrogram or record.py](images/spectrogram.demo.png)\n\n\u003c!-- ╮⚆ᴥ⚆╭ --\u003e\n\nUpdate: Nervana [demonstrated](https://www.youtube.com/watch?v=NaqZkV_fBIM) that it is possible for 'independents' to build speech recognizers that are state of the art. \n\u003c!-- ᖗ*﹏*ᖘ --\u003e\n\n### Fun tasks for newcomers\n* Watch video : https://www.youtube.com/watch?v=u9FPqkuoEJ8\n* Understand and correct the corresponding code: [lstm-tflearn.py](/lstm-tflearn.py) \n* Data Augmentation : create on-the-fly modulation of the data: increase the speech frequency, add background noise, alter the pitch etc,...\n\u003c!-- ᕮ◔‿◔ᕭ --\u003e\n\n### Extensions \n**Extensions** to current tensorflow which are probably needed:\n* [WarpCTC on the GPU](https://github.com/baidu-research/warp-ctc/tree/master/tensorflow_binding) see [issue](https://github.com/tensorflow/tensorflow/issues/2146)\n* Incremental collaborative snapshots ('[P2P learning](https://github.com/pannous/tensorpeers)') !\n* Modular graphs/models + persistance\n\u003c!-- ⤜(⨱ᴥ⨱)⤏ --\u003e\n\nEven though this project is far from finished we hope it gives you some starting points.\n\nLooking for a tensorflow collaboration / consultant / deep learning contractor? Reach out to [info@pannous.com](mailto:info@pannous.com?subject=contractor)\n\u003c!--\n Notes\nSTT https://github.com/sotelo/parrot/blob/master/model.py t\n parrot\n\n--\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpannous%2Ftensorflow-speech-recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpannous%2Ftensorflow-speech-recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpannous%2Ftensorflow-speech-recognition/lists"}