{"id":15492949,"url":"https://github.com/nezhar/speech-condenser","last_synced_at":"2025-07-09T22:37:38.834Z","repository":{"id":191588302,"uuid":"585419334","full_name":"nezhar/speech-condenser","owner":"nezhar","description":"A tool for summarizing dialogues from videos or audio","archived":false,"fork":false,"pushed_at":"2023-08-29T19:37:46.000Z","size":247,"stargazers_count":82,"open_issues_count":1,"forks_count":10,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-19T17:26:01.738Z","etag":null,"topics":["asr","speach-recognition","speaker-diarization","speaker-identification","summarization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nezhar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-01-05T05:49:41.000Z","updated_at":"2025-02-27T04:24:43.000Z","dependencies_parsed_at":"2023-08-30T15:13:41.270Z","dependency_job_id":null,"html_url":"https://github.com/nezhar/speech-condenser","commit_stats":null,"previous_names":["nezhar/speech-condenser"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nezhar/speech-condenser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nezhar%2Fspeech-condenser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nezhar%2Fspeech-condenser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nezhar%2Fspeech-condenser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nezhar%2Fspeech-condenser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nezhar","download_url":"https://codeload.github.com/nezhar/speech-condenser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nezhar%2Fspeech-condenser/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264504616,"owners_count":23618831,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","speach-recognition","speaker-diarization","speaker-identification","summarization"],"created_at":"2024-10-02T08:02:57.878Z","updated_at":"2025-07-09T22:37:38.816Z","avatar_url":"https://github.com/nezhar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speech Condenser\n\nSpeech condenser is a tool for reducing the size of a dialogue.\n\n## Pipeline\n\n![Pipeline](./docs/pipeline.png)\n\nIt combines several tools to achieve the goal of reducing the size of a dialogue. Each step of the above pipleine runs inside a container.\n\nSteps:\n\n1. Audio extraction - Extracts the audio from the video file.\n2. Speaker diarization - Identifies the speakers in the audio file.\n3. Split audio - Splits the audio file into smaller chunks based on the speaker diarization.\n4. Speech to text - Transcribes the audio chunks into text.\n5. Combine ASR and diarization - Combines the results of the ASR and diarization to get the text for each speaker as a dialogue.\n6. Summarization - Summarizes the dialogue.\n\n\n## Installation\n\nThe setup uses docker or podman to run the containers. A set of local scripts are provided to run the pipeline.\n\n* build.sh - Builds the containers.\n* pipeline.sh - Runs the pipeline.\n* yt-pipeline.sh - Runs the pipeline on a youtube video.\n\nVideos needs to be provided in the `data/input` directory. `yt-pipeline.sh` will use this directory to download to cache the video.\nThe output will be in the `data/output` directory.\n\nMake sure to create a `.env` based on the `.env.example` file and privide the required values:\n\n* SC_RUNTIME - The runtime to use for the containers. Either `docker` or `podman`.\n* HF_TOKEN - The [Hugging Face token](https://huggingface.co/settings/tokens) to use for the summarization step.\n\nMake sure to visit [hf.co/pyannote/speaker-diarization](http://hf.co/pyannote/speaker-diarization) and [hf.co/pyannote/segmentation](https://hf.co/pyannote/segmentation) and accept user conditions. This required in order to be able to run the speaker diarization.\n\n## Usage\n\nRun agains a local video file:\n\n```bash\n./pipeline.sh \"data/input/video.mp4\"\n```\n\nRun against a youtube video:\n\n```bash\n./yt-pipeline.sh \"https://www.youtube.com/watch?v=video_id\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnezhar%2Fspeech-condenser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnezhar%2Fspeech-condenser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnezhar%2Fspeech-condenser/lists"}