{"id":49473368,"url":"https://github.com/harmlessman/pafts","last_synced_at":"2026-04-30T18:00:56.955Z","repository":{"id":153741550,"uuid":"617290925","full_name":"harmlessman/PAFTS","owner":"harmlessman","description":"PAFTS : Library That Preprocessing Audio For TTS.","archived":false,"fork":false,"pushed_at":"2024-11-15T14:20:15.000Z","size":272,"stargazers_count":23,"open_issues_count":1,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-08T23:52:04.626Z","etag":null,"topics":["asr","diarization","separator","speech-to-text","stt","tts","whisper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/harmlessman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-22T04:32:35.000Z","updated_at":"2025-09-03T04:28:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"ff6b86ee-66f7-4001-9e22-69d7d3e2a850","html_url":"https://github.com/harmlessman/PAFTS","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/harmlessman/PAFTS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmlessman%2FPAFTS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmlessman%2FPAFTS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmlessman%2FPAFTS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmlessman%2FPAFTS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/harmlessman","download_url":"https://codeload.github.com/harmlessman/PAFTS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/harmlessman%2FPAFTS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32472396,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"ssl_error","status_checked_at":"2026-04-30T13:12:06.837Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","diarization","separator","speech-to-text","stt","tts","whisper"],"created_at":"2026-04-30T18:00:30.457Z","updated_at":"2026-04-30T18:00:56.924Z","avatar_url":"https://github.com/harmlessman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PAFTS\n\n\n---\n\n### Library That Preprocessing Audio For TTS.\nThis library enables easy processing of audio files into a format suitable for TTS training data with a simple execution.\n![architecture](architecture.png)\n\n## Description \nPAFTS have three features.\n\n1. Separator\n2. Diarization\n3. STT\n\n* Separator : Removes background music (MR) and noise from each audio file to isolate clean voice tracks.\n* Diarization : Separates speakers within each audio file, identifying distinct voices.\n* STT : Extract text from audio.\n\n\n\n\n```\n# before run()\n\n      path\n        ├── 1_001.wav # have mr or noise\n        ├── 1_002.wav\n        ├── 1_003.wav\n        ├── 1_004.wav\n        └── abc.wav\n\n\n# after run()\n    \n       path\n        ├── SPEAKER_00\n        │   ├── SPEAKER_00_1.wav # removed mr and noise\n        │   ├── SPEAKER_00_2.wav\n        │   └── SPEAKER_00_3.wav\n        ├── SPEAKER_01\n        │   ├── SPEAKER_01_1.wav\n        │   └── SPEAKER_01_2.wav\n        ├── SPEAKER_02\n        │   ├── SPEAKER_02_1.wav\n        │   └── SPEAKER_02_2.wav\n        └── audio.json\n        \n        # audio.json\n        {\n              'SPEAKER_00_1.wav' : \"I have a note.\", \n              'SPEAKER_00_2.wav' : \"I want to eat chicken.\",\n              'SPEAKER_00_3.wav' : \"...\",\n              'SPEAKER_01_1.wav' : \"...\",\n              'SPEAKER_01_2.wav' : \"...\",   \n        }\n```\n\n\n## Features\n* Separator : Using the [UVR](https://github.com/Anjok07/ultimatevocalremovergui) project’s model and code for music source separation.\n* Diarization : Using speaker diarization from [pyannote-audio](https://github.com/pyannote/pyannote-audio)\n* STT : Using STT model whisper from [OpenAI](https://github.com/openai/whisper)\n\n\n## Setup\nThis library was developed using Python 3.10, and we recommend using Python versions 3.8 to 3.10 for compatibility.\n\nWhile the library is compatible with both Linux and Windows, all testing was conducted on Windows. \nFor any issues or errors encountered while running on Linux, please feel free to open an issue.\n\nBefore running the library, please ensure the following are installed:\n\n### PyTorch\nWe highly recommend using a GPU to optimize performance. For PyTorch installation, please follow the commands below to ensure compatibility with your GPU\n```\n# Example for installing PyTorch with CUDA 11.8\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n```\n\n### ffmpeg\n[ffmpeg](https://ffmpeg.org/) is required for audio processing tasks within this library. Please ensure it is installed and accessible from your system’s PATH.\nTo install ffmpeg:\n\n#### Windows\nDownload the latest FFmpeg release from [FFmpeg’s official website](https://ffmpeg.org/download.html), and add the bin folder to your system’s PATH.\n\n#### Linux \nUse the following command to install FFmpeg:\n```\nsudo apt update\nsudo apt install ffmpeg\n```\n\nAfter installation, you can verify by running\n```\nffmpeg -version\n```\n\n\n### HuggingFace Access Token (required for diarization)\nTo enable diarization functionality, please complete the following steps\n1. Accept [`pyannote/segmentation-3.0`](https://huggingface.co/pyannote/segmentation-3.0) user conditions\n2. Accept [`pyannote/speaker-diarization-3.1`](https://huggingface.co/pyannote/speaker-diarization-3.1) user conditions\n3. Create access token at [`hf.co/settings/tokens`](https://huggingface.co/login?next=%2Fsettings%2Ftokens).\n\n```\nfrom pafts.pafts import PAFTS\n\np = PAFTS(\n    path = 'your_audio_directory_path',\n    output_path = 'output_path',\n    hf_token=\"HUGGINGFACE_ACCESS_TOKEN_GOES_HERE\"\n)\n\n```\n\nAfter completing the setup steps above, you can install this library by running\n```\npip install pafts\n```\n\n\n## Usage\n```\nfrom pafts import PAFTS\n\np = PAFTS(\n    path = 'your_audio_directory_path',\n    output_path = 'output_path',\n    hf_token=\"HUGGINGFACE_ACCESS_TOKEN_GOES_HERE\" # if you use diarization\n    \n)\n\n# Separator\np.separator()\n\n# Diarization\np.diarization()\n\n# STT\np.STT(model_size='small')\n\n# One-Click Process\np.run()\n\n```\n\n## TODO\n- [ ] Command line\n- [ ] Clean logging\n- [ ] Separator with Model Selection\n- [ ] Update README.md\n- [ ] Add VAD\n\n## License\n\nThe code of **PAFTS** is [MIT-licensed](LICENSE)\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharmlessman%2Fpafts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fharmlessman%2Fpafts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fharmlessman%2Fpafts/lists"}