{"id":26359097,"url":"https://github.com/assemblyai/assemblyai-haystack","last_synced_at":"2025-03-16T15:58:42.959Z","repository":{"id":210261051,"uuid":"712505697","full_name":"AssemblyAI/assemblyai-haystack","owner":"AssemblyAI","description":"Haystack integration","archived":false,"fork":false,"pushed_at":"2024-01-12T10:54:50.000Z","size":63,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-05-02T01:47:28.744Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AssemblyAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-10-31T15:48:00.000Z","updated_at":"2024-04-05T01:39:39.000Z","dependencies_parsed_at":"2024-01-11T20:49:47.953Z","dependency_job_id":"ca80c21e-dadd-42a6-8cc9-27dedd700edf","html_url":"https://github.com/AssemblyAI/assemblyai-haystack","commit_stats":null,"previous_names":["assemblyai/assemblyai-haystack"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI%2Fassemblyai-haystack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI%2Fassemblyai-haystack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI%2Fassemblyai-haystack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI%2Fassemblyai-haystack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AssemblyAI","download_url":"https://codeload.github.com/AssemblyAI/assemblyai-haystack/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243893857,"owners_count":20364916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-16T15:58:42.877Z","updated_at":"2025-03-16T15:58:42.951Z","avatar_url":"https://github.com/AssemblyAI.png","language":"Python","readme":"\u003cimg src=\"https://github.com/AssemblyAI/assemblyai-python-sdk/blob/master/assemblyai.png?raw=true\" width=\"500\"/\u003e\n\n---\n\n\n[![CI Passing](https://github.com/AssemblyAI/assemblyai-python-sdk/actions/workflows/test.yml/badge.svg)](https://github.com/AssemblyAI/assemblyai-haystack/actions/workflows/test.yml)\n[![GitHub License](https://img.shields.io/github/license/AssemblyAI/assemblyai-haystack)](https://github.com/AssemblyAI/assemblyai-haystack/blob/main/LICENSE)\n[![PyPI version](https://badge.fury.io/py/assemblyai-haystack.svg)](https://badge.fury.io/py/assemblyai-haystack)\n[![PyPI Python Versions](https://img.shields.io/pypi/pyversions/assemblyai-haystack)](https://pypi.python.org/pypi/assemblyai-haystack/)\n![PyPI - Wheel](https://img.shields.io/pypi/wheel/assemblyai-haystack)\n[![AssemblyAI Twitter](https://img.shields.io/twitter/follow/AssemblyAI?label=%40AssemblyAI\u0026style=social)](https://twitter.com/AssemblyAI)\n[![AssemblyAI YouTube](https://img.shields.io/youtube/channel/subscribers/UCtatfZMf-8EkIwASXM4ts0A)](https://www.youtube.com/@AssemblyAI)\n[![Discord](https://img.shields.io/discord/875120158014853141?logo=discord\u0026label=Discord\u0026link=https%3A%2F%2Fdiscord.com%2Fchannels%2F875120158014853141\u0026style=social)\n](https://assemblyai.com/discord)\n\n# AssemblyAITranscriber\n\nThis custom component is designed for using AssemblyAI with [Haystack (2.x)](https://github.com/deepset-ai/haystack), an open source Python framework for building custom LLM applications. It seamlessly integrates with the AssemblyAI API and enhances Haystack's capabilities.\n\nThe AssemblyAITranscriber goes beyond simple audio transcription; it also offers features such as summarization and speaker diarization. This allows you to not only convert audio to text but also obtain concise summaries and identify speakers in the conversation. To use AssemblyAITranscriber, you should pass your `ASSEMBLYAI_API_KEY` as an argument while adding a component (see usage code example below). \n\nMore info about AssemblyAI:\n\n* [Website](https://www.assemblyai.com/)\n* [Get a Free API key](https://www.assemblyai.com/dashboard/signup)\n* [AssemblyAI API Docs](https://www.assemblyai.com/docs)\n\n## Installation\n\nFirst, install the assemblyai-haystack python package.\n\n```bash\npip install assemblyai-haystack\n```\n\nThis package installs and uses the AssemblyAI Python SDK. You can find more info about the SDK at the [assemblyai-python-sdk GitHub repo]([https://www.assemblyai.com/docs](https://github.com/AssemblyAI/assemblyai-python-sdk)).\n\n## Usage\n\nThe `AssemblyAITranscriber` needs to be initialized with the AssemblyAI API key. \nThe `run` function needs at least the file_path argument. Audio files can be specified as an URL or a local file path.\nYou can also specify whether you want summarization and speaker diarization results in the `run` function.\n\n```python\nimport os\n\nfrom assemblyai_haystack.transcriber import AssemblyAITranscriber\nfrom haystack.document_stores.in_memory import InMemoryDocumentStore\nfrom haystack import Pipeline\nfrom haystack.components.writers import DocumentWriter\n\nASSEMBLYAI_API_KEY = os.environ.get(\"ASSEMBLYAI_API_KEY\")\n\n## Use AssemblyAITranscriber in a pipeline\ndocument_store = InMemoryDocumentStore()\nfile_url = \"https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3\"\n\nindexing = Pipeline()\nindexing.add_component(\"transcriber\", AssemblyAITranscriber(api_key=ASSEMBLYAI_API_KEY))\nindexing.add_component(\"writer\", DocumentWriter(document_store))\nindexing.connect(\"transcriber.transcription\", \"writer.documents\")\nindexing.run(\n    {\n        \"transcriber\": {\n            \"file_path\": file_url,\n            \"summarization\": None,\n            \"speaker_labels\": None,\n        }\n    }\n)\n\nprint(\"Indexed Document Count:\", document_store.count_documents())\n```\n\nNote: Calling `indexing.run()` blocks until the transcription is finished.\n\nThe results of the transcription, summarization and speaker diarization are returned in separate document lists:\n* transcription\n* summarization\n* speaker_labels\n\nThe metadata of the transcription document contains the transcription ID and url of the uploaded audio file.\n\n```json\n{\n   \"transcript_id\":\"73089e32-...-4ae9-97a4-eca7fe20a8b1\",\n   \"audio_url\":\"https://storage.googleapis.com/aai-docs-samples/nbc.mp3\"\n}\n```\n  \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fassemblyai%2Fassemblyai-haystack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fassemblyai%2Fassemblyai-haystack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fassemblyai%2Fassemblyai-haystack/lists"}