https://github.com/assemblyai-solutions/realtimespellcheck
This script integrates a LLM into the AssemblyAI Real-Time API to correct spelling errors in real-time. The script is currently configured to correct dish names for an online food ordering service.
https://github.com/assemblyai-solutions/realtimespellcheck
Last synced: 8 months ago
JSON representation
This script integrates a LLM into the AssemblyAI Real-Time API to correct spelling errors in real-time. The script is currently configured to correct dish names for an online food ordering service.
- Host: GitHub
- URL: https://github.com/assemblyai-solutions/realtimespellcheck
- Owner: AssemblyAI-Solutions
- Created: 2023-11-21T19:50:50.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-21T19:57:42.000Z (over 2 years ago)
- Last Synced: 2025-01-24T04:53:33.377Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 15.9 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Real-Time LLM Spell Checker
### Overview
This script integrates a LLM into the AssemblyAI Real-Time API to correct spelling errors in real-time. The script is currently configured to correct dish names for an online food ordering service.
### Steps
1. Install the required packages in requirements.txt
2. Create a .env file with your `OPENAI_API_KEY` and `ASSEMBLYAI_API_KEY`
2. Transcode the files you want to transcribe (requires FFMPEG)
3. Add your menu items to menu.txt
4. Run the script `python transcript.py `
### Files
1. **transcript.py:** Creates websocket and streams audio to AssemblyAI API.
2. **llm.py:** Makes call to LLM and returns corrected utterance.
3. **transcode.py:** Transcodes audio files to the required PCM 16bit format.
4. **menu.txt:** List of menu items to use as reference for LLM.
### Notes
[1] This script uses our Real-Time confidence scoring to identify the potential misspellings. Ironically this aligns pretty well with the dish names that are being mis-transcribed (the words in red in the screenshot below all have confidence scores below 0.5)

[2] Using an LLM, we can try to correct these misspellings in real-time. We can provide the LLM the correct spellings of the dish names as reference.
[3] Overall, this solution would be fairly cheap to implement. For example I used GPT 3.5 Turbo in the demo below and averaged the total cost out to be ~$0.0008 per 2 minute call. To save on cost / latency, we only make the LLM call if a low-confidence word is detected in the utterance. So essentially the LLM is only used as backup and can be toggled on or off depending on restaurant / cuisine (customer mentioned Chinese & Japanese dishes were the most difficult to transcribe)
[4] Latency may be a slight concern here. Depending on OpenAI's system traffic levels, the LLM call can take anywhere from 1-5 seconds to return the corrected utterance. IMO, this added latency worth it if it fixes the downstream dish detection issue. I have a few ideas on ways we can try to reduce this including fine-tuning a smaller model (davinci-003) or combining the spell correction + dish detection NER processes. Sadly, LeMUR won't be a viable option here as the latency is just too long to be used in a Real-Time setting.
### Demo
https://github.com/AssemblyAI-Solutions/RealtimeSpellCheck/assets/106788596/83df6f05-ece4-49ba-839d-d79022aafe93