Projects in Awesome Lists tagged with speech-processing

https://github.com/tabahi/phoneme-converge-ser

Speech emotion recogniton by phoneme type convergence

affective-computing emotion-classification emotion-recognition machine-learning phonemes speech speech-processing

Last synced: 14 Nov 2024

https://github.com/davidfoerster/kaleidok-examples

KaleidOk invites participants to use a new kind of interactive media tool and take part in an emerging experience which explores speech recognition, media retrieval and visuals generating in a collaborative context (between people, and between people and machines).

affective-computing art java linguistics processing-library speech-processing synesthesia

Last synced: 10 Nov 2024

https://github.com/mindspore-courses/speech_processing

mindspore speech-processing

Last synced: 09 Nov 2024

https://github.com/zabir-nabil/tf2-speaker-recognition

speaker recognition in tensorflow 2

deep-learning speaker-identification speaker-recognition speaker-verification speech-processing tensorflow tensorflow2

Last synced: 02 Dec 2024

https://github.com/ebowwa/irl

your ai companion: a source for augmented memory, human interpreting workers, advocator, and much more

ai aiassistant anthropic claude emotions gpto1-mini gpto1-preview humeai ios-app o1-mini o1-preview openai python speech-processing swift

Last synced: 13 Oct 2024

https://github.com/albertaparicio/bsc-thesis

BSc thesis: Voice Conversion using Deep Learning

bachelor-thesis deep-learning keras python3 pytorch sequence-to-sequence speech-processing tensorflow voice-conversion

Last synced: 30 Oct 2024

https://github.com/alinababer/speech-emotions-recognition-by-feature-extraction-and-machine-learning-matlab

matlab signal-processing speech-emotion-recognition speech-feature-extraction speech-processing

Last synced: 16 Dec 2024

https://github.com/aitor-alvarez/emorabic

Tools for creating speech corpora by extracting audio from YouTube videos

audio corpus-tools speech speech-corpora speech-processing

Last synced: 25 Nov 2024

https://github.com/nourine-nadir/speech_processing

This repository explores speech processing techniques like noise cancellation and speech segmentation through Python code.(Speech recognition soon)

artificial-intelligence noise-cancellation speech-processing speech-segmentation

Last synced: 11 Nov 2024

https://github.com/ittus/speech-processing---detect-voice-and-unvoice

Speech Processing - Detect voiced and unvoiced speech

matlab signal-processing speech-processing

Last synced: 06 Dec 2024

https://github.com/mohamedhany99/speech-to-text-python

This script takes a manual voice from the microphone of the device used as input and transform it into a string datatype that will be printed on the screen.

signal-processing speech-processing speech-to-text voice voice-activity-detection voice-chat voice-recognition

Last synced: 16 Nov 2024

https://github.com/zoobereq/emotional_speech

A script extracting features of emotionally charged speech

parselmouth praat speech speech-analysis speech-emotion-recognition speech-processing

Last synced: 17 Dec 2024

https://github.com/rahulguptagzb09/rayan-voice-assistant-android

Voice Assistant For Android

android android-app android-application android-development android-library android-sdk android-studio android-ui androidstudio java speech-processing speech-recognition speech-to-text xml

Last synced: 20 Nov 2024

https://github.com/linto-ai/sfeatpy

Library to extract MFCC features from audio signal

feature-extraction mfcc mfcc-features python3 speech-processing

Last synced: 27 Dec 2024

https://github.com/gamingpro237/project-title-master-gift-all-in-one-speech-ai-chatbot-nexus

The ~MASTER-GIFT~ All-in-One Speech AI-Chatbot Nexus is an interactive application that allows users to upload PDF or DOCX files and have the content read aloud using text-to-speech. Users can also input text for audio playback or display it on the screen. Additionally, the program features an offline chatbot that responds to user questions.

ai bot chat-application chatbot coding framework interface learning programming python python3 reading reading-notes speech-processing speech-synthesis study text-to-speech thinker thinkerview

Last synced: 09 Nov 2024

https://github.com/nyxflower/butterfly-voice-changing-bowtie

Butterfly Voice Changing Bowtie (BVCB), simple Sound regulator/simulator（PureData Implementation）

puredata sound-generators speech-processing voice-changer

Last synced: 09 Nov 2024

https://github.com/a3ro-dev/voicetypingapp

Python-based application designed to convert speech to text in real-time.

dad googlespeech googlespeechapi project python python3 pyttsx3 script side-project speech-processing speech-recognition speech-synthesis speech-to-text

Last synced: 08 Nov 2024

https://github.com/mahtafetrat/virgoolinformal-speech-dataset

A dataset of informal Persian audio and text chunks, along with a fully open processing pipeline, suitable for ASR and TTS tasks. Created from crawled content on virgool.io.

asr asr-evaluation forced-alignment persian persian-speech-corpus persian-speech-dataset persian-speech-recognition persian-text-to-speech speech-data-collection speech-dataset speech-processing tts

Last synced: 25 Dec 2024

https://github.com/eesunmoon/on-device_multimodal_er

[Research] Multimodal Emotion Recognition for On-device AI

artificial-intelligence data-analysis deep-learning embedded-systems emotion-recognition heart-rate-analysis multimodal-fusion npu on-device python speech-processing speech-recognition tensorflow wearable-devices

Last synced: 22 Dec 2024

https://github.com/pprablanc/cpsrt

c pitch pitch-shift python-algorithms real-time signal-processing speech-processing voice

Last synced: 10 Nov 2024

https://github.com/alecokas/subword-embedding

A tool for generating sub-word (phone or grapheme) level embeddings from an HTK-style MLF ASR corpus

embeddings graphemes machine-learning nlp phones speech-processing subword-units word2vec

Last synced: 24 Nov 2024

https://github.com/codewithmayank-py/python-voice-assistant

Virtual Assistant: a Python tool for simple tasks like search, Wikipedia, and opening apps. CLI-based, flexible, and easy to customize.

python python3 speech-processing speech-recognition speech-to-text voice-assistant voice-commands

Last synced: 17 Nov 2024

https://github.com/utkarsh530/flipkartgridwoz

Noise Detection and Cancellation - Solving for Voice Interactions in Indian Houses & Neighborhoods

machine-learning speech-processing

Last synced: 21 Dec 2024

https://github.com/flumi3/speech-emotion-recognition

Interactive speech emotion recognition with a neural network and technical explanations.

emotion-recognition machine-learning research-project speech-processing

Last synced: 15 Dec 2024

https://github.com/alinababer/speech-emotions-recognition-by-feature-extraction-and-deep-learning-matlab

This repository presents a comprehensive thesis project on Speech Emotion Recognition using a combination of feature extraction techniques and Deep Learning in MATLAB. The research focuses on analyzing speech data from both normal and autistic subjects to classify emotions (Angry, Happy, Neutral, Sad) and detect language disorders.

deep-learning feature-engineering matlab signal-processing speech-emotion-recognition speech-processing

Last synced: 16 Dec 2024

https://github.com/mattbui/voice_macro

A keyboard macro using voice command project

keyboard-macros snowboy speech-processing voice-commands voice-macro

Last synced: 10 Dec 2024

https://github.com/blockjayn/chatgpt-speechtotext-google-chrome-extension

Chome Extension that adds a voice recognition feature to ChatGPT's interface, converting spoken words into text. It provides a convenient and hands-free method for inputting text, enhancing productivity and communication.

chatgpt chrome-extension google-chrome-extension speech-conversion speech-processing speech-recognition speech-to-text voice voice-chat voice-commands voice-control voice-recognition

Last synced: 12 Dec 2024

https://github.com/bunyaminergen/callytics

Callytics is an advanced call analytics solution that leverages speech recognition and large language models (LLMs) technologies to analyze phone conversations from customer service and call centers.

denoising diarization forced-alignment llama3 llm openai opensource sentiment-analysis speech-emotion-recognition speech-processing speech-recognition speech-to-text summary topic-modeling transcription voice-activity-detection voice-recognition

Last synced: 24 Dec 2024

https://github.com/arunr1408/dysarthric-speech-recognition-matlab

A project to classify dysarthric and non-dysarthric speech using deep learning techniques.

cnn deep-learning dysarthria matlab speech-processing speech-recognition

Last synced: 12 Dec 2024

https://github.com/hamedzarei/ms-speechprocessing-hw04

HW04 of Speech Processing

matlab sap-voicebox speech-processing

Last synced: 31 Dec 2024

https://github.com/manojc/speech-recognition

poc for speech recognition using annyang speech recognition library.

angular angular2 css html5 speech-analysis speech-processing speech-recognition speech-recognizer speech-to-text speechtotext typescript

Last synced: 09 Nov 2024

https://github.com/navierula/subreddit-analysis-on-eating-disorders

natural-language-processing speech-processing

Last synced: 21 Nov 2024

https://github.com/alinababer/berlin-data-voice-gender-recognition-by-feature-extraction-and-machine-learning-matlab

feature-engineering machine-learning matlab signal-processing speech-processing voice-gender-detection voice-recognition

Last synced: 16 Dec 2024

https://github.com/abhinavbammidi1401/speech_processing

speech-analysis speech-processing speech-recognition speech-synthesis

Last synced: 20 Nov 2024

https://github.com/srikanth284/speech-to-text

A web-based application that uses HTML, CSS, and JavaScript to convert spoken language into text in real-time, showcasing the Web Speech API in a responsive and interactive interface

css html js speech-processing speech-recognition speech-to-text vercel

Last synced: 10 Nov 2024

https://github.com/andreaschandra/kaggle-competition

A bunch of code for Kaggle competitions

computer-vision deep-learning kaggle-competition machine-learning natural-language-processing speech-processing

Last synced: 19 Dec 2024

https://github.com/djdhairya/speech-emotion-recognition-

Predicting various emotion in human speech signal by detecting different speech components affected by human emotion.

audio-processing convolutional-neural-networks data-science deep-learning emotion-detection emotion-recognition kersa librosa natural-language-processing nuralnetwork python pytorch rnn speech-emotion-recognition speech-processing speech-recognition voice

Last synced: 10 Nov 2024

https://github.com/arunr1408/vocal-removal-and-audio-analysis-tool-matlab

A MATLAB tool for removing vocals and analyzing audio using stereo cancellation.

audio-analysis matlab speech-processing vocal-remover

Last synced: 12 Dec 2024

https://github.com/lexoyo/the-bad-repeater

Continuously listens to you and says aloud what it understands. Use talkey, sphinx and google speech API

audio speech-processing sphinx stt tts

Last synced: 24 Nov 2024

https://github.com/macabdul9/torchmm

PyTorch Data loaders and abstraction for multi-modal data.

computer-vision multimodal-deep-learning natural-language-processing python pytorch speech-processing

Last synced: 01 Oct 2024

https://github.com/dito97/stunning-potato

Speech Processing and Recognition (101803) final project at UniGe

chatbot speech-processing

Last synced: 22 Dec 2024

https://github.com/donbraulio/speechembeddings

Research on speech processing, speaker identification and audio diarization

diarization speaker-identification speech-processing speechbrain

Last synced: 19 Nov 2024

https://github.com/breadrock1/speech-2-text-static

There is frontend module of `speech-2-text` project based on `JavaScript` for testing backend.

conference frontend gcp-project javascript recordrtc speech-processing

Last synced: 11 Nov 2024

https://github.com/sugarcane-mk/whisper

This repository provides a Python script for extracting speech embeddings using OpenAI's Whisper model. The embeddings are high-dimensional feature vectors that capture the acoustic properties of the input audio. These embeddings can be used for downstream tasks such as speech classification, clustering, and speaker recognition.

asr classification feature-extraction openai speech-processing speech-recognition speech-to-text svm-classifier whisper