Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/etienneab3d/whispertimesync
Synchronize Whisper's timestamps over an existing accurate transcription
https://github.com/etienneab3d/whispertimesync
aligner asr nlp subtitles text-to-speech whisper
Last synced: about 2 months ago
JSON representation
Synchronize Whisper's timestamps over an existing accurate transcription
- Host: GitHub
- URL: https://github.com/etienneab3d/whispertimesync
- Owner: EtienneAb3d
- Created: 2023-02-09T07:47:00.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-28T13:43:35.000Z (8 months ago)
- Last Synced: 2024-05-29T05:32:48.851Z (7 months ago)
- Topics: aligner, asr, nlp, subtitles, text-to-speech, whisper
- Language: Java
- Homepage:
- Size: 2.7 MB
- Stars: 109
- Watchers: 6
- Forks: 21
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WhisperTimeSync
Synchronize Whisper's timestamps over an existing accurate transcriptionInput 1: SRT with good timestamps and bad-quality text
Input 2: good text-only, or SRT with good text and bad timestamps
Output: SRT with good text and good timestampsPython version: https://github.com/EtienneAb3d/SRT-Sync
# Complement
The "accurate transcriptions" may be obtained using WhisperHallu:
https://github.com/EtienneAb3d/WhisperHalluWhisperTimeSync and WhisperHallu are used to extract vocals and lyrics in karaok-AI:
https://github.com/EtienneAb3d/karaok-AIChatMate is a complete versatile ChatGPT automation tool, including explanations to produce a SRT file translator to Chinese (as an example):
https://github.com/EtienneAb3d/ChatMate# Google Colab
https://colab.research.google.com/drive/10r4m_GaTwU-JQMkRe9T0cvgrfQH1le31?usp=sharing
# Install
```
git clone https://github.com/EtienneAb3d/WhisperTimeSync.gitcd WhisperTimeSync
virtualenv -p python3 ../venvWhisperTimeSync
source ../venvWhisperTimeSync/bin/activatepip install -U openai-whisper
sudo apt update && sudo apt install ffmpeg
```# Transcribe
```
python3 transcribe.py data/KatyPerry-Firework.mp3 largeOutput:
==========
Loading Whisper model...
Transcribing: data/KatyPerry-Firework.mp3 ...
Do you ever feel like a plastic bag? Drifting through the wind, wanting to start again? Do you ever feel, feel so paper-thin? Like a house apart, one blow from caving in? Do you ever feel already buried deep? Six feet under screens and no one seems to hear a thing. Do you know that there's still a chance for you? Because there's a spark in you. You just got to ignite the light and let it shine. Just own the night like the 4th of July. Cause baby you're a firework. Come on, show them what you're worth. Make them go ah, ah, ah, as you shoot across the sky. Baby, you're a firework. Come on, let your colors burst. Make them go ah, ah, ah. You're going to leave them all in awe, awe, awe. You don't have to feel like a wasted space. Your original cannot be replaced. If you only knew what the future holds. After a hurricane comes a rainbow. Maybe a reason why all the doors are closed. So you could open one that leads you to the perfect road. Like a lightning bolt, your heart will blow. And when it's time, you'll know you just got to ignite the light and let it shine. Just own the night like the 4th of July. Cause baby you're a firework. Come on, show them what you're worth. Make them go ah, ah, ah, as you shoot across the sky. Baby, you're a firework. Come on, let your colors burst. Make them go ah, ah, ah. You're going to leave them all in awe, awe, awe. Boom, boom, boom. Even brighter than the moon, moon, moon. It's always been inside of you, you, you. And now it's time to let it through. Cause baby you're a firework. Come on, show them what you're worth. Make them go ah, ah, ah, as you shoot across the sky. Baby, you're a firework. Come on, let your colors burst. Make them go ah, ah, ah. You're going to leave them all in awe, awe, awe. Boom, boom, boom. Even brighter than the moon, moon, moon. Boom, boom, boom. Even brighter than the moon, moon, moon.
Saving: data/KatyPerry-Firework.mp3.srt ...
==========
```# Synchronize
```
java -Xmx2G -jar distrib/WhisperTimeSync.jar data/KatyPerry-Firework.mp3.srt data/KatyPerry-Firework.txt enOutput (data/KatyPerry-Firework.txt.srt):
==========
1
00:00:00,000 --> 00:00:18,020
Lyrics:
Do you ever feel
Like a plastic bag2
00:00:18,020 --> 00:00:22,140
Drifting through the wind
Wanting to start again3
00:00:22,140 --> 00:00:25,440
Do you ever feel
Feel so paper-thin4
00:00:25,440 --> 00:00:29,640
Like a house of cards
One blow from caving in5
00:00:29,640 --> 00:00:33,400
Do you ever feel
Already buried deep[...]
==========
```# Remarks
- Asian languages are processed char by char. An improvement may be done on the tokenizer in order to process them word by word.
- For large files, this aligner will be very time-consuming and RAM-consuming.I have an other aligner being able to align Asian languages and large files. It is also able to do cross-lingual alignments. But, I won't release it as open-source. It is included in the **ComPair** Freeware: an universal cross-ligual document comparer.
See: http://cubaix.com/ComPair_QuickStartGuide.php
This tool is a demonstration of our know-how.
If you are interested in a commercial/industrial AI linguistic project, contact us:
https://cubaix.com