{"id":30856516,"url":"https://github.com/yonatanawoke/audio-to-answer-generator","last_synced_at":"2026-05-17T00:43:48.346Z","repository":{"id":310255988,"uuid":"1039242368","full_name":"YonatanAwoke/audio-to-answer-generator","owner":"YonatanAwoke","description":"A modular multi-agent system that transcribes audio, extracts questions, generates AI-powered answers, and exports structured results (PDF/Docx/Markdown). Built with Python, Gemini API, and FPDF for seamless automation of Q\u0026A from spoken input.","archived":false,"fork":false,"pushed_at":"2025-08-23T19:53:47.000Z","size":659,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-24T07:45:24.452Z","etag":null,"topics":["agentic-ai","agentic-workflow","ai","audio-processing","ffmpeg","fpdf","gemini","huggingface","multiagent-systems"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YonatanAwoke.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-16T19:33:02.000Z","updated_at":"2025-08-23T19:53:50.000Z","dependencies_parsed_at":"2025-08-24T02:36:30.276Z","dependency_job_id":"52cf7e3b-582d-49fd-b38d-9f344812bf81","html_url":"https://github.com/YonatanAwoke/audio-to-answer-generator","commit_stats":null,"previous_names":["yonatanawoke/audio-to-answer-generator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/YonatanAwoke/audio-to-answer-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YonatanAwoke%2Faudio-to-answer-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YonatanAwoke%2Faudio-to-answer-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YonatanAwoke%2Faudio-to-answer-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YonatanAwoke%2Faudio-to-answer-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YonatanAwoke","download_url":"https://codeload.github.com/YonatanAwoke/audio-to-answer-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YonatanAwoke%2Faudio-to-answer-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274033043,"owners_count":25210793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","agentic-workflow","ai","audio-processing","ffmpeg","fpdf","gemini","huggingface","multiagent-systems"],"created_at":"2025-09-07T12:01:48.935Z","updated_at":"2026-05-17T00:43:48.341Z","avatar_url":"https://github.com/YonatanAwoke.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audio-to-Answer Generator\n\n## Introduction\n\nThis project is a powerful pipeline that takes an audio file as input and generates answers to questions found within the audio. It leverages a combination of speech-to-text, natural language processing, and large language models to provide a seamless experience from audio to answers.\n\n## Project Description\n\nThe Audio-to-Answer Generator is designed to process audio files, transcribe them, identify questions within the transcript, and generate corresponding answers. The pipeline is built to be robust, with features like audio enhancement, profanity detection, and speaker diarization. It also has the capability to recognize and solve mathematical equations present in the audio.\n\n## Project Structure\n\nThe project is organized into the following directories:\n\n- **agents**: Contains the different agents responsible for specific tasks in the pipeline, such as answering generation, audio enhancement, audio transcription, diarization, profanity checking, and question splitting.\n- **cache**: Caches intermediate results like transcripts and answers to speed up subsequent runs with the same audio file.\n- **orchestration**: Contains the main pipeline logic that orchestrates the different agents and tools.\n- **outputs**: Stores the final output files in the specified format (JSON, text, or PDF).\n- **prompts**: Contains the prompts used to interact with the large language models.\n- **schemas**: Contains the JSON schema for the output.\n- **tests**: Contains tests for the pipeline.\n- **tools**: Contains various utility functions and tools used by the pipeline, such as ASR, math equation solvers, and profanity filters.\n- **utils**: Contains utility functions for the project.\n\n- ## Features\n\n*   **Audio Transcription:** Transcribes audio files using Whisper.\n*   **Question \u0026 Answer Generation:** Generates answers to questions found in the transcript using a Large Language Model.\n*   **Math Problem Solving:** Can solve mathematical equations found in the transcript.\n*   **Performance Evaluation:** Evaluates the performance of the question-answering system.\n*   **Human-in-the-Loop Feedback:** Allows users to provide feedback on the generated answers to improve the system's accuracy.\n\n## System Scope and Limitation\n\n### Scope\n\n-   Transcribe audio files (MP3, WAV, FLAC, M4A, OGG).\n-   Detect and answer questions from the transcribed text.\n-   Enhance audio quality for better transcription.\n-   Detect and handle profanity.\n-   Identify different speakers in the audio.\n-   Recognize and solve mathematical equations.\n-   Output results in JSON, text, or PDF format.\n-   Evaluate performance of the expected output and the original output\n-   Feedback on generated answers which can be fine-tune the model for better accuracy\n\n### Limitations\n\n-   The maximum allowed audio file size is 500 MB.\n-   The system relies on external APIs (like Google Gemini and Hugging Face), so it requires an internet connection and API keys.\n-   The accuracy of the transcription and answers depends on the quality of the audio and the performance of the underlying models.\n\n## Usage\n\nTo use the Audio-to-Answer Generator, run the following command:\n\n```bash\npython main.py \u003caudio_file_path\u003e [--output_format \u003cformat\u003e] [--language \u003clang\u003e] [--enhance-audio]\n```\n\nTo run the evaluation, use the following command:\n\n```bash\npytest tests/test_evaluation.py\n```\n\nTo use the feedback mechanism, run the pipeline with the `--feedback` flag:\n\n```bash\npython -m orchestration.pipeline \u003caudio_file_path\u003e --feedback\n```\n\nFor each generated answer, you will be prompted to provide feedback:\n\n*   Enter `c` if the answer is correct.\n*   Enter `r` if the answer needs revision.\n\nIf you choose to revise the answer, you will be prompted to enter the corrected answer.\n\nThe feedback will be saved to a JSON file in the `feedback` directory.\n\n### Arguments\n\n-   `audio_file_path`: Path to the audio file to process.\n-   `--output_format`: Desired output format (`json`, `text`, or `pdf`). Defaults to `json`.\n-   `--language`: Language of the audio file (e.g., 'en', 'es'). If not provided, the language will be auto-detected.\n-   `--enhance-audio`: Enhance the audio before transcription to improve quality.\n-   `--feedback`: Enable the human-in-the-loop feedback mechanism.\n\n## Project Components\n\nThe pipeline consists of the following main components:\n\n-   **Audio Validator**: Validates the audio file format, size, and codec.\n-   **Audio Enhancer**: Enhances the audio quality.\n-   **Diarizer**: Identifies the different speakers in the audio.\n-   **Audio Transcriber**: Transcribes the audio file to text using Whisper.\n-   **Profanity Checker**: Detects profanity in the transcribed text.\n-   **Answer Generator**: Generates answers to the questions found in the transcript using a large language model.\n-   **Math Pipeline**: A specialized pipeline to handle mathematical equations found in the audio.\n\n## Model Input and Output\n\n### Input\n\n-   An audio file (MP3, WAV, FLAC, M4A, OGG).\n\n### Output\n\nThe output is a JSON, text, or PDF file containing the following information:\n\n-   Transcript of the audio.\n-   A list of questions and their corresponding answers.\n-   Speaker timestamps and transcripts (if diarization is enabled).\n-   Math results (if any math equations are found).\n\n## Requirements\n\nThe project requires the following Python packages:\n\n-   `google-generativeai`\n-   `langchain-google-genai`\n-   `langgraph`\n-   `python-dotenv`\n-   `fpdf`\n-   `langchain`\n-   `ffmpeg`\n-   `jsonschema`\n-   `pyannote.audio`\n-   `torch`\n-   `torchaudio`\n-   `better_profanity`\n-   `demjson`\n-   `spacy`\n\nYou will also need to have `ffprobe` installed and available in your system's PATH.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Contact\n\nFor any questions or feedback, please contact Yonatan Awoke at yonatanawoke@gmail.com.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyonatanawoke%2Faudio-to-answer-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyonatanawoke%2Faudio-to-answer-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyonatanawoke%2Faudio-to-answer-generator/lists"}