{"id":15642716,"url":"https://github.com/philipperemy/speaker-change-detection","last_synced_at":"2025-07-26T16:35:32.554Z","repository":{"id":70249848,"uuid":"126659354","full_name":"philipperemy/speaker-change-detection","owner":"philipperemy","description":"Paper: https://arxiv.org/abs/1702.02285","archived":false,"fork":false,"pushed_at":"2018-12-19T07:34:36.000Z","size":3725,"stargazers_count":64,"open_issues_count":0,"forks_count":18,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-19T01:32:29.703Z","etag":null,"topics":["deep-learning","keras","speaker-change-detection"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philipperemy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-25T02:57:02.000Z","updated_at":"2025-04-12T18:39:41.000Z","dependencies_parsed_at":"2023-03-04T13:00:32.705Z","dependency_job_id":null,"html_url":"https://github.com/philipperemy/speaker-change-detection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fspeaker-change-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fspeaker-change-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fspeaker-change-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipperemy%2Fspeaker-change-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philipperemy","download_url":"https://codeload.github.com/philipperemy/speaker-change-detection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251693140,"owners_count":21628623,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","keras","speaker-change-detection"],"created_at":"2024-10-03T11:57:15.796Z","updated_at":"2025-04-30T11:45:33.153Z","avatar_url":"https://github.com/philipperemy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speaker Change Detection\n\nImplementation of the paper: https://arxiv.org/abs/1702.02285\n\n[![license](https://img.shields.io/badge/License-Apache_2.0-brightgreen.svg)](https://github.com/philipperemy/keras-attention-mechanism/blob/master/LICENSE) [![dep1](https://img.shields.io/badge/Tensorflow-1.6+-brightgreen.svg)](https://www.tensorflow.org/) [![dep2](https://img.shields.io/badge/Keras-2.0+-brightgreen.svg)](https://keras.io/) \n\n_The mechanism proposed here is for real-time\nspeaker change detection in conversations, which firstly trains\na neural network text-independent speaker classifier using indomain\nspeaker data._\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"misc/img_2.png\" width=\"500\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"misc/img_1.png\" width=\"500\"\u003e\n\u003c/p\u003e\n\nThe accuracy is very high and close to 100%, as reported in the paper.\n\n\n## Get Started\n\nBecause it takes a very long time to generate cache and inputs, I packaged them and uploaded them here:\n\n- Cache uploaded at [cache-speaker-change-detection.zip](https://drive.google.com/open?id=1NRBBE7S1ecpbXQBfIyhY9O1DDNsBc0my)  (unzip it in `/tmp/`)\n- [speaker-change-detection-data.pkl](https://drive.google.com/open?id=12gMYaV-ymQOtkYHCf9HxPurb9vB6dADK) (place it in `/tmp/`)\n- [speaker-change-detection-norm.pkl](https://drive.google.com/open?id=1vykyS3bxKbkuhGtk36eTWfW9ZkqwJi6e) (place it in `/tmp/`)\n\nYou should have this:\n\n- `/tmp/speaker-change-detection-data.pkl`\n- `/tmp/speaker-change-detection-norm.pkl`\n- `/tmp/speaker-change-detection/*.pkl`\n\nThe final plots are generated as `/tmp/distance_test_ID.png` where ID is the id of the plot.\n\nBe careful you have enough space in `/tmp/` because you might run out of disk space there. If it's the case, you can modify all the `/tmp/` references inside the codebase to any folder of your choice.\n\nNow run those commands to reproduce the results.\n\n```bash\ngit clone git@github.com:philipperemy/speaker-change-detection.git\ncd speaker-change-detection\nvirtualenv -p python3.6 venv # probably will work on every python3 impl.\nsource venv/bin/activate\npip install -r requirements.txt\n# download the cache and all the files specified above (you can re-generate them yourself if you wish).\ncd ml/\nexport PYTHONPATH=..:$PYTHONPATH; python 1_generate_inputs.py\nexport PYTHONPATH=..:$PYTHONPATH; python 2_train_classifier.py\nexport PYTHONPATH=..:$PYTHONPATH; python 3_train_distance_classifier.py\n```\n\nTo regenerate only the VCTK cache, run:\n\n```bash\ncd audio/\nexport PYTHONPATH=..:$PYTHONPATH; python generate_all_cache.py\n```\n\n## Contributions\n\nContributions are welcome! Some ways to improve this project:\n- Given any audio file, is it possible to test it and detect any speaker change?\n\n## Questions\n\n- Given any audio file, is it possible to test it and detect any speaker change?\nYes, as long as it follows the same structure as the VCTK Corpus dataset.\n\n- Is there any way to test the trained model to detect speaker changes of our audio files?\nYeah it's possible but it's going to be a bit difficult. I guess you have to choose a dataset and converts it to VCTK format.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilipperemy%2Fspeaker-change-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilipperemy%2Fspeaker-change-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilipperemy%2Fspeaker-change-detection/lists"}