{"id":25946924,"url":"https://github.com/neuralwork/audio2chat","last_synced_at":"2025-03-04T10:17:24.778Z","repository":{"id":280466244,"uuid":"924136209","full_name":"neuralwork/audio2chat","owner":"neuralwork","description":"Convert multi-speaker audio files to structured chat data for LLMs","archived":false,"fork":false,"pushed_at":"2025-01-29T14:26:06.000Z","size":2121,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-03-03T16:19:02.292Z","etag":null,"topics":["chat","llm","llm-datasets","speaker-diarization","transcription","whisper"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/audio2chat/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neuralwork.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-29T13:35:51.000Z","updated_at":"2025-01-30T14:18:42.000Z","dependencies_parsed_at":"2025-03-03T16:29:09.295Z","dependency_job_id":null,"html_url":"https://github.com/neuralwork/audio2chat","commit_stats":null,"previous_names":["neuralwork/audio2chat"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralwork%2Faudio2chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralwork%2Faudio2chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralwork%2Faudio2chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralwork%2Faudio2chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neuralwork","download_url":"https://codeload.github.com/neuralwork/audio2chat/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241827168,"owners_count":20026601,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chat","llm","llm-datasets","speaker-diarization","transcription","whisper"],"created_at":"2025-03-04T10:17:24.005Z","updated_at":"2025-03-04T10:17:24.689Z","avatar_url":"https://github.com/neuralwork.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audio2Chat\n\nAudio2Chat converts multi-speaker audio files into chat format using [AssemblyAI](https://www.assemblyai.com/app) for speaker diarization and optionally Whisper for enhanced transcription.\n\n### Features\n- Speaker diarization and transcription using AssemblyAI\n- Optional enhanced transcription using Whisper large-v3-turbo\n- YouTube video download support\n- Word-level timestamp support (can be used for speech-to-text and text-to-speech tasks)\n- Structured chat format output\n\n## Installation\n\n```bash\n# Install from PyPI\npip install audio2chat\n\n# Or install from source\ngit clone https://github.com/neuralwork/audio2chat.git\ncd audio2chat\npip install -e .\n```\n\n### Requirements\n- Python \u003e=3.8\n- FFmpeg (for YouTube downloads)\n- CUDA-capable GPU (recommended for Whisper)\n\nInstall FFmpeg:\n```bash\n# Ubuntu/Debian\nsudo apt update \u0026\u0026 sudo apt install ffmpeg\n\n# MacOS\nbrew install ffmpeg\n\n# Windows (using Chocolatey)\nchoco install ffmpeg\n```\n\nYou need to have an Assembly AI account and an API key to use audio2chat. Once you setup an account, you can find the API key on your [dashboard](https://www.assemblyai.com/app).\n\n## Usage\n\n### Command Line\n\nBasic usage:\n```bash\n# Process local audio file\naudio2chat input.wav --api-key YOUR_ASSEMBLYAI_KEY --output output_dir\n\n# Process YouTube video\naudio2chat \"https://youtube.com/watch?v=xxxxx\" --api-key YOUR_ASSEMBLYAI_KEY --output output_dir\n```\n\nAll options:\n```bash\naudio2chat --help\n\nrequired arguments:\n  input                   Input audio file path or YouTube URL\n  --api-key API_KEY      AssemblyAI API key\n\noutput settings:\n  --output OUTPUT        Output directory for audio and chat data (default: output)\n  --download-format {mp3,wav}\n                        Audio format for YouTube downloads (default: wav)\n\ntranscription settings:\n  --language LANGUAGE    Language code for transcription (default: en)\n  --num-speakers NUM     Expected number of speakers (default: auto-detect)\n  --use-whisper         Use Whisper for enhanced transcription (default: False)\n\nchat generation settings:\n  --min-segment-confidence CONF\n                        Minimum confidence score to include segment (default: 0.5)\n  --merge-threshold THRESH\n                        Time threshold to merge adjacent utterances (default: 1.0)\n  --min-duration DUR    Minimum duration for a chat segment (default: 0.5)\n  --include-metadata    Include additional metadata in output (default: True)\n  --include-word-timestamps\n                        Include word-level timing information (default: False)\n\nvocabulary settings:\n  --word-boost [WORDS ...]\n                        List of words to boost recognition for\n\nother:\n  --verbose, -v         Enable verbose logging\n```\n\n### Python API\n\n```python\nfrom audio2chat.pipeline import AudioChatPipeline\nfrom audio2chat.youtube_downloader import download_audio\n\n# For YouTube videos\naudio_path = download_audio(\n    \"https://youtube.com/watch?v=xxxxx\",\n    output_dir=\"downloads\",\n    audio_format=\"wav\"\n)\n\n# Initialize pipeline\npipeline = AudioChatPipeline(\n    api_key=\"YOUR_ASSEMBLYAI_KEY\",\n    language=\"en\",\n    num_speakers=2,  # or None for auto-detect\n    use_whisper=True,  # enable Whisper for better transcription\n    include_word_timestamps=True\n)\n\n# Process file\nchat_data = pipeline.process_file(audio_path, \"output/chat.json\")\n```\n\n### Output Format\n\n```json\n{\n    \"messages\": [\n        {\n            \"speaker\": \"A\",\n            \"text\": \"Hello there!\",\n            \"start\": 0,\n            \"end\": 1500,\n            \"words\": [\n                {\n                    \"text\": \"Hello\",\n                    \"start\": 0,\n                    \"end\": 750,\n                    \"confidence\": 0.98\n                },\n                {\n                    \"text\": \"there\",\n                    \"start\": 750,\n                    \"end\": 1500,\n                    \"confidence\": 0.95\n                }\n            ]\n        }\n    ],\n    \"metadata\": {\n        \"num_speakers\": 2,\n        \"speakers\": [\"A\", \"B\"],\n        \"transcription\": \"whisper+assemblyai\"\n    }\n}\n```\n\n## Development\n\nRun tests:\n```bash\n# Set up environment\nexport ASSEMBLYAI_API_KEY=your_key_here\n\n# Add test audio file\ncp your_test_audio.wav tests/test_data/input.wav\n\n# Run tests\npytest tests/test_pipeline.py tests/test_chat_builder.py  # without Whisper\npytest tests/  # all tests including Whisper\n```\n\n## License\nThis project is licensed under the [MIT license](https://github.com/neuralwork/audio2chat/blob/main/LICENSE).\n\nFrom [neuralwork](https://neuralwork.ai/) with :heart:\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralwork%2Faudio2chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneuralwork%2Faudio2chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralwork%2Faudio2chat/lists"}