Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/der3318/audio-powered-gpt
.Net 6.0 - WPF Application for Azure Speech and OpenAI GPT
https://github.com/der3318/audio-powered-gpt
gpt speech translate windows-presentation-foundation
Last synced: about 1 month ago
JSON representation
.Net 6.0 - WPF Application for Azure Speech and OpenAI GPT
- Host: GitHub
- URL: https://github.com/der3318/audio-powered-gpt
- Owner: der3318
- License: mit
- Created: 2023-04-16T14:56:24.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-06T13:10:39.000Z (over 1 year ago)
- Last Synced: 2024-11-11T16:55:38.886Z (3 months ago)
- Topics: gpt, speech, translate, windows-presentation-foundation
- Language: C#
- Homepage:
- Size: 2.57 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## 💬 Audio Powered GPT
![version](https://img.shields.io/badge/version-2.0.2-blue.svg)
![dotnetf](https://img.shields.io/badge/.net-6.0-green.svg)
[![openai](https://img.shields.io/badge/Azure.AI.OpenAI%20%28nuget%29-1.0.0%20beta.5-yellow.svg)](https://www.nuget.org/packages/Azure.AI.OpenAI)
[![speech](https://img.shields.io/badge/Microsoft.CognitiveServices.Speech%20%28nuget%29-1.27.0-pink.svg)](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech)
![portable](https://img.shields.io/badge/portable-win%20x64%2019041+-blueviolet.svg)
[![.NET WPF App Release Builder](https://github.com/der3318/audio-powered-gpt/actions/workflows/release.yml/badge.svg?branch=main)](https://github.com/der3318/audio-powered-gpt/actions/workflows/release.yml)A tiny WPF interface that integrates [Azure cognitive service](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices) with [GPT endpoint](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource). This requires Azure subscription resources of both speech service and OpenAI.
![Demo.png](https://github.com/der3318/audio-powered-gpt/blob/main/Images/Demo.png)
### Interactive Mode
Simply type or speak (via microphone) to ask GTP questions in this mode. Press the "start button" to trigger a speech QA session, and click the "start/stop button" again to pause.
![InteractiveMode.png](https://github.com/der3318/audio-powered-gpt/blob/main/Images/InteractiveMode.png)
### Translation Mode
This is the real time translation (into Chinese) functionality. Result texts will also be displayed as a 3-second toast in the bottom corner, so the app can be run completely in the background.
![TranslationMode.png](https://github.com/der3318/audio-powered-gpt/blob/main/Images/TranslationMode.png)
An audio redirection (from speacker to input) interface is a prerequisite to use the feature. Windows stereo mix or [VoiceMeeter](https://vb-audio.com/Voicemeeter/) is probably a good choice.
### References
* Icon: https://arstechnica.com/information-technology/2023/01/openai-and-microsoft-reaffirm-shared-quest-for-powerful-ai-with-new-investment/
* Azure Speech to Text: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-recognize-speech
* Azure OpenAI Studio: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/quickstart
* Toast Notification: https://learn.microsoft.com/en-us/windows/apps/design/shell/tiles-and-notifications/send-local-toast
* Embedded WPF Markdown Viewer: https://github.com/whistyun/MdXaml