An open API service indexing awesome lists of open source software.

https://github.com/pablolion/whisper-note


https://github.com/pablolion/whisper-note

Last synced: 8 months ago
JSON representation

Awesome Lists containing this project

README

          

## Key Features

- **Live Speech Recognition and Transcription:** Automatically transcribe spoken words in real time (cannot be turned off).
- **Detailed Live Transcript:** View a live transcript of your conversation, including translations, and see the queue length.
- **Select Input Language:** Choose your preferred input language.
- **Optional Real-Time Translation:** Get instant translations of spoken content if needed.
- **Choose Model Size:** Select the model size that suits your needs.
- **Language-Specific Models:** Utilize models tailored for specific languages.
- **Export Trimmed Recordings:** Easily save trimmed `.wav` files of your recordings without silent gaps.
- **Export Transcript History:** Save your entire transcript history as an `.html` file.
- **High-Quality Transcription:** Optionally receive high-quality transcription after the recording is completed.
- **Upcoming Feature:** Stay tuned for an optional summary of the transcript generated with the help of ChatGPT.

## Install

### Mac with Apple Silicon

```zsh
brew install portaudio # src/pyaudio/device_api.c:9:10: fatal error: 'portaudio.h' file not found
brew install ffmpeg
brew install mbedtls # /opt/homebrew/Cellar/mbedtls/3.4.1/lib/libmbedcrypto.13.dylib
```

After this, my mbedtls@3.4.1 gives only a `libmbedcrypto.14.dylib` but I renamed it manually:

```zsh
# for mbedtls@3.4.1
cp /opt/homebrew/Cellar/mbedtls/3.4.1/lib/libmbedcrypto.14.dylib /opt/homebrew/Cellar/mbedtls/3.4.1/lib/libmbedcrypto.13.dylib
# for mbedtls@3.5.0
cp /opt/homebrew/Cellar/mbedtls/3.5.0/lib/libmbedcrypto.15.dylib /opt/homebrew/Cellar/mbedtls/3.5.0/lib/libmbedcrypto.13.dylib
```

Then setup the virtual environment and install the requirements with poetry:

```zsh
poetry install
```

### Mac with Intel

(mbedtls updated to 3.5.0, so the version number is different)

```zsh
cp /usr/local/opt/mbedtls/lib/libmbedcrypto.15.dylib /usr/local/opt/mbedtls/lib/libmbedcrypto.13.dylib
```

I had an error `dyld[49347]: Library not loaded: '@loader_path/../../../../Python.framework/Versions/3.11/Python'` on poetry installation; solved by `pip install poetry`. The brew version won't work.

### Not supported

I don't know how to install these.

- Windows
- Linux
- Mac with Intel

## Use

- Large model will cause the script to run slow: the recognition happens slower than a constantly speaking person, with M1 Ultra 128GB RAM.
- Recommend to use small model: It's faster and the recognition is not bad.
- See the comment in `config.yml` for more details.

## Dev

### Env setup

- Suppose you have [`poetry`](https://python-poetry.org/) installed on your machine with `python@3.11`.
- Assume `pwd` is the root of this repo.

```zsh
poetry install --with dev
poetry run pre-commit install
touch .env
```

- Get an API key from DeepL and put it in `.env` in the root of this repo, like this:

```plaintext
DEEPL_API_KEY=1234567890
```

- Maybe setup your own config file. #TODO: default config file
- #TODO: use `make` to make this easier

### Memo

Most of this should be converted to GitHub Issues when published.

- Trying to use [result](https://pypi.org/project/result/) to handle error, not sure how it feels.
- The idea is to build something to substitute Otter to take notes.
- Check and try speech recognition package
- Features:
- summary of the text with ChatGPT
- generate .SRT substitute
- Add DEV_MODE env variable, in which mode logger should be more verbose
- UI:
- start/end control
- Not needed: Add a "Still Recording..." indicator every 5 seconds the input is idle.
- For translation, it seems that [DeepL](https://www.deepl.com/translator) is the best option, but it's not free. Given I don't need it, just doing the most basic thing: translate the text with some API.
- I tried to use [textual](https://github.com/Textualize/textual) but the CSS is not applied on the dynamically rendered list items. And it's not easy to use. Maybe use electron / eel instead if we want a web UI.

## Special thanks

- AI Model [OpenAI/whisper](https://github.com/openai/whisper)
- real time transcript script [davabase/whisper_real_time](https://github.com/davabase/whisper_real_time)