{"id":22387755,"url":"https://github.com/not-ml/audio-ml","last_synced_at":"2025-03-26T21:14:22.594Z","repository":{"id":264203137,"uuid":"892674325","full_name":"Not-ML/audio-ml","owner":"Not-ML","description":"Standalone Audio ML Application: An innovative Python-based tool integrating Speech Recognition (ASR), Sentiment Analysis (NLP), and Text-to-Speech (TTS) to process audio, analyze sentiment, and generate spoken responses. Features both command-line and GUI interfaces for seamless interaction. ","archived":false,"fork":false,"pushed_at":"2024-11-22T16:25:11.000Z","size":21,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-01T02:43:52.341Z","etag":null,"topics":["asr","audio-processing","machine-learning","natural-language-processing","nlp","open-source","speech-recognition","text-to-speech","tkinter","tts","wav2vec2"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Not-ML.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-22T15:01:03.000Z","updated_at":"2024-11-22T16:21:55.000Z","dependencies_parsed_at":"2024-11-22T16:37:25.111Z","dependency_job_id":null,"html_url":"https://github.com/Not-ML/audio-ml","commit_stats":null,"previous_names":["not-ml/audio-ml"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Not-ML%2Faudio-ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Not-ML%2Faudio-ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Not-ML%2Faudio-ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Not-ML%2Faudio-ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Not-ML","download_url":"https://codeload.github.com/Not-ML/audio-ml/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245735882,"owners_count":20663807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","audio-processing","machine-learning","natural-language-processing","nlp","open-source","speech-recognition","text-to-speech","tkinter","tts","wav2vec2"],"created_at":"2024-12-05T02:11:00.154Z","updated_at":"2025-03-26T21:14:22.573Z","avatar_url":"https://github.com/Not-ML.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Standalone Audio ML Application\n\nThe **Standalone Audio ML Application** is an innovative tool that integrates multiple machine learning models to create a complete audio processing pipeline. It includes **Automatic Speech Recognition (ASR)** to transcribe audio, **Natural Language Processing (NLP)** to analyze the sentiment of the transcribed text, and **Text-to-Speech (TTS)** to generate a spoken response based on the sentiment. This project showcases the potential of combining these technologies for interactive and personalized audio-based applications.\n\nThis repository contains a Python-based implementation that allows users to either record audio through a microphone or process pre-recorded audio files. The application processes the audio in three stages:\n1. **Transcription**: The audio is transcribed into text using a Wav2Vec2 ASR model.\n2. **Sentiment Analysis**: The transcribed text is analyzed for sentiment using a fine-tuned DistilBERT model.\n3. **Response Generation**: Based on the sentiment analysis, a predefined response is generated.\n4. **Speech Synthesis**: The response text is then converted into speech using a Tacotron2-based TTS model.\n\nAdditionally, a simple graphical user interface (GUI) built with Tkinter is included to allow users to easily interact with the application. This makes it accessible for users who prefer not to use the command line interface.\n\n## Features\n\n- **Automatic Speech Recognition (ASR)**: \n  - Uses the Wav2Vec2 model from Hugging Face to transcribe speech into text with high accuracy.\n  - Capable of transcribing audio files with a sample rate of 16kHz.\n  \n- **Sentiment Analysis (NLP)**:\n  - Uses the pre-trained DistilBERT model fine-tuned on the SST-2 dataset for sentiment classification (positive, negative, or neutral).\n  - Analyzes the transcribed text and categorizes it into one of the three sentiment classes.\n  \n- **Text-to-Speech (TTS)**:\n  - Converts the generated response text into natural-sounding speech using the Tacotron2 model.\n  - Generates a `.wav` file that can be played back to the user.\n  \n- **Audio Acquisition**:\n  - Records audio directly from the microphone using the `sounddevice` library.\n  - Option to record audio for a specified duration or process pre-recorded audio files.\n  \n- **Graphical User Interface (GUI)**:\n  - Built using `Tkinter`, the GUI provides an easy-to-use interface for interacting with the application.\n  - Features buttons for recording audio, processing audio files, and quitting the application.\n\n## Files\n\n### `asr.py`\nThis script contains the `ASRModel` class which initializes and uses the Wav2Vec2 model from Hugging Face to perform Automatic Speech Recognition. It includes a method `transcribe` that takes an audio file and returns the transcribed text.\n\n### `nlp.py`\nThe `SentimentAnalyzer` class in this script uses a fine-tuned DistilBERT model for sentiment analysis. It includes an `analyze` method that accepts text input and returns the sentiment of the text (positive, negative, or neutral).\n\n### `tts_module.py`\nContains the `TTSModel` class which interacts with the Tacotron2-based TTS model. It includes a `synthesize` method that converts a given text into speech and saves it as a `.wav` file.\n\n### `audio_acquisition.py`\nThis script provides functions for recording and loading audio. The `record_audio` function records audio for a specified duration, while the `load_audio` function loads an audio file and ensures it has the correct sample rate.\n\n### `main.py`\nThe main script that handles the overall audio processing pipeline. It provides functionality for both command-line and GUI usage:\n- If using the command line, it allows the user to record or select an audio file, process it through the ASR, NLP, and TTS stages, and generate a response.\n- The script also handles cleanup of temporary files after processing.\n\n### `gui_app.py`\nA simple Tkinter-based GUI for interacting with the application. Users can record audio or select an audio file to process. The results, including the transcription, sentiment analysis, and response text, are displayed in the GUI.\n\n## Installation\n\nTo set up the **Standalone Audio ML Application** on your local machine, follow these steps:\n\n1. **Clone the repository**:\n   ```bash\n   git clone https://github.com/yourusername/standalone-audio-ml-app.git\n   cd standalone-audio-ml-app\n\n2. Install the required dependencies: This application requires several Python libraries, which you can install via `pip`:\n```bash\npip install -r requirements.txt\n```\nRun the application:\n\nCommand-line usage: To record audio directly from the microphone:\n\n```bash\npython main.py --record\n```\nTo process an existing audio file:\n\n```bash\npython main.py --file path/to/audio/file\n```\nGraphical User Interface (GUI): To launch the GUI application:\n\n```bash\npython gui_app.py\n```\nDependencies\nThis project relies on several Python libraries for machine learning, audio processing, and GUI development. These include:\n\n`transformers`: For loading pre-trained models (Wav2Vec2, DistilBERT, and Tacotron2) from Hugging Face.\n`torch`: Required for running the deep learning models.\n`librosa`: For audio processing (loading and manipulating audio files).\n`TTS`: For text-to-speech synthesis.\n`sounddevice and soundfile`: For recording and saving audio.\n`tkinter`: For building the GUI.\n`simpleaudio`: For playing the generated audio.\n`termcolor`: For adding color to terminal output (for better readability).\nYou can install all the dependencies using:\n\n```bash\npip install -r requirements.txt\n```\nExample Usage\nCommand-line example:\nRecord 5 seconds of audio from the microphone:\n\n```bash\npython main.py --record\n```\nProcess an existing audio file:\n\n```bash\npython main.py --file path/to/audio/file\n```\nGUI example:\nLaunch the GUI and click \"Record Audio\" to record audio from the microphone.\nClick \"Process Audio File\" to select and process an existing audio file.\nLicense\nThis project is licensed under the MIT License. See the LICENSE file for more details.\n\nContributing\nFeel free to open issues and pull requests to contribute to the project! If you'd like to add new features or fix bugs, your contributions are welcome. To get started, follow these steps:\n\nFork the repository.\nCreate a new branch for your feature or bug fix (`git checkout -b feature-branch`).\nCommit your changes (`git commit -am 'Add new feature'`).\nPush to the branch (`git push origin feature-branch`).\nOpen a pull request to merge your changes into the main repository.\nWhen contributing, please ensure that:\n\nThe code follows the project's style guide.\nAny new features are well-documented.\nTests are included where applicable.\nIf you encounter any issues or have suggestions, please open an issue in the repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnot-ml%2Faudio-ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnot-ml%2Faudio-ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnot-ml%2Faudio-ml/lists"}