https://github.com/decipher2k/windows-ai-assistant

A voice controlled AI Assistant for Windows featuring ChatGPT and Groq
https://github.com/decipher2k/windows-ai-assistant

ai assistant chatgpt groq voice-recognition

Last synced: 3 months ago
JSON representation

A voice controlled AI Assistant for Windows featuring ChatGPT and Groq

Host: GitHub
URL: https://github.com/decipher2k/windows-ai-assistant
Owner: decipher2k
License: other
Created: 2024-12-11T16:45:56.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-12-13T12:49:55.000Z (7 months ago)
Last Synced: 2024-12-13T13:37:42.093Z (7 months ago)
Topics: ai, assistant, chatgpt, groq, voice-recognition
Language: C#
Homepage: https://waia.online/
Size: 140 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

WAIA, the Windows AI Assistant

A voice-controlled AI Assistant for Microsoft Windows

New: WAIA [Cloudless] which runs 100% on the local machine for increased privacy.

Speech recognition does now detect longer sentences. Groq quota draining has been prevented. Chat history added. It is now possible to do real conversations.

The Groq speech detection is now independent of background noise without speech. The Azure speech recognition has been disabled until quota draining has been repaired.

Please note that WAIA is just a client that unifies different AI services. The program itself does not provide any AI capabilities.

Windows control is done using plugins. So you will need plugins for it.

Only use Plugins from the WAIA GitHub repository. Otherwise, they could contain malware.

The URL of the repository is https://github.com/decipher2k/Windows-AI-Assistant.

Features:

Seamlessly integrates with ChatGPT and Groq

Advanced voice recognition powered by Groq

Voice controlled interaction with Windows

Plugin system

Program starter using key sentences

Webhooks using key sentences for integration with IFTTT (home automation etc.) - https://ifttt.com/

(untested)

Download:

https://github.com/decipher2k/Windows-AI-Assistant/releases

Usage

After starting the application, a tray icon is being added. Doubleclick on it to configure the settings.

Setup the API keys and other informations using the "Settings" buttons.

A green point on the tray icon means that speech has been recorded.

A blue point means that the recorded text has been sent to the Chat AI.

To stop the voice output, right click the tray icon and click on "Cancel".

How to allways show a tray icon:

https://www.lifewire.com/show-or-hide-icons-in-system-tray-in-windows-10-5115219

Chat History:

The chat history contains the last 3 messages. Thus you can really chat with the AI.

Keyword:

You can set a custom keyword for starting speech recognition. Default is "Computer".

Thus you can say "Computer, who was John F. Kennedy" to get informations about John F. Kennedy.

Keyword Detection:

There is now a keyword detection using Windows Speech Recognition.

It can be good to use it in a noisy environment, like when watching TV or listening to music, to prevent speech recognition quota draining.

Keyword detection sets the keyword to "Computer", which can't be changed. Reliability differes between systems.

Recognition quality can be enhaced by training:

https://www.tenforums.com/tutorials/120674-add-delete-change-speech-recognition-profiles-windows-10-a.html

To access the control panel in Windows 11, hit the "Windows" key and enter "control panel".

Windows Sound Recording Level:

If the voice recognition is active too often without you saying anything, or no speech is being detected, you can try to adjust the microphone recording level in the Windows settings.

Suggested Services:

I had good experiences with the following setup.

All suggested services are available for free.

Voice recognition: Groq.

Chat AI: Groq ist the fastest one.

Voice Output: Windows Speech until Google Cloud AI has been implemented.

Program Starter

The program starter can be configured using the "Commands" button.

The first column defines whether to use speech recognition or the chat AI to start the plugin.

Speech recognition will listen for the exact sentence.

Using the Chat AI allows you to vary sentences. The program does automatically precede the sentence with "if the user asks for". Thus, the sentence "starting windows explorer" will allow you to say either "start windows explorer", or "run windows explorer" etc.

Chat AI commands have not been implemented yet.

The third column is the program file that should be started.

The fourth column allows you to set command parameters.

Webhooks

Webhooks can be configured using the "Commands" button.

They can be used to raise events in webapplications, for example IFTTT. IFTTT can be used to control home automation systems etc.

The first column defines whether to use speech recognition or the chat AI to execute the webhook.

Speech recognition will listen for the exact sentence.

Using the Chat AI allows you to vary sentences. The program does automatically precede the sentence with "if the user asks for". Thus, the sentence "turning on the light" will allow you to say either "turn on the light", or "switch on the light" etc.

Chat AI commands have not been implemented yet.

The second column defines the sentence that the program uses to recognize the command.

The third column is the URL of the webhook.

The fourth column defines whether to use HTTP POST or HTTP GET. For most webhooks, this will be HTTP GET.

The fifth colummn defines parameters to the webhook.

In case of GET messages, these parameters will be appended to the URL, for example "?light=on" will lead to "https://example.com/webhook?light=on".

In case of POST messages, these parameters will define the data that is being sent with the POST request, for example JSON data.

Plugins

Plugins can be configured using the "Commands" button.

The media player plugin is included in the release of the program.

The first column defines whether to use speech recognition or the chat AI to start the plugin.

Speech recognition will listen for the exact sentence.

Using the Chat AI allows you to vary sentences. The program does automatically precede the sentence with "if the user asks for". Thus, the sentence "playing media" will allow you to say either "play media", or "play the song" etc.

Chat AI commands have not been implemented yet.

The second column defines the sentence that the program uses to recognize the command.

The third column defines the name of the plugin DLL.

The following columns are there to parametrize the plugin. They do differ from plugin to plugin. Please read the plugin's manual for more information.

Only use Plugins from this GitHub repository. Otherwise, they may containt malware.

The [TEXT] variable

Whenever you enter the token [TEXT] in a parameter of the commands section, the token will be replaced with the text that has been said after the command.

For example "Create a note: Shopping" using the key sentence "Create a note: [TEXT]" will pass the word "Shopping" instead of the [TEXT] token to a plugin, a webhook, or a program.

This will only work with Speech Recognition commands, not with Chat AI ones.

Speech recognition

Groq Speech Recognition:

Groq can be found at https://groq.com

The API keys can be created at https://console.groq.com/keys

AI Chat

ChatGPT:

https://medium.com/latinxinai/how-to-get-api-key-for-chat-gpt-3-5-or-4-0-fce40b35aa00

You will need ChatGPT API credits, not ChatGPT Plus!

Groq LLM API:

Groq can be found at https://groq.com

The API keys can be created at https://console.groq.com/keys

Speech Synthesis

Elevenlabs:

Log in to your Elevenlabs account.

In the top-right corner, click on your profile icon > Profile.

Next to the API Key field, click the eye icon to view and copy your API key and store it in a safe place.

Please note: The "voice" field referes to the name of the voice, not to its ID.

Windows Speech Synthesis:

Average Quality.

You may need to set a voice according to your language in the settings.

Costs

Speech Recognition - one of the following:

-Groq (available for free, usage limits, fast)

AI Chat - one of the following:

-ChatGPT (about 10$/month)

-Groq LLM API (available for free, usage limits, fast)

Speech output - one of the following:

-Microsoft Windows Speech (free, average quality)

-Elevenlabs (about 10$-20$/month, good quality)

Please note that prices are dependent on actual usage and may vary.

Writing a plugin

To write a plugin, add "WAIA Plugin.dll" to a new Visual Studio 2022 DotNet 8.0 class library project, implement the interface IWAIAPlugin and the following method:

public String RunPlugin(String text, String[] parameters);

The parameter "String text" is the spoken input.

The parameter "String[] parameters" allows you to pass parameters to the plugin.

The return value of the function will be sent to the speech synthesis engine.

If the third parameter of "String[] parameters" is "AI", the return value of the function will be sent to the Chat AI.

Troubleshooting

If the speech doesn't get detected, try the following:

-Adjust the microphone level of Windows

-Enable online speech recognition in Windows settings

-Speak slow and clearly

Planned Features

More Plugins for Windows

Dictation

Sound Volume

Maximizing/Minimizing/Closing windows

Alt+Tab

Shutdown/Restart

Macros

More Services

Claude

Microsoft Chat AI

Microsoft Azure Cortex Speech Recognition

Google Speech Synthesis

Google AI

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/decipher2k/windows-ai-assistant

Awesome Lists containing this project

README

WAIA, the Windows AI Assistant

Features:

Download:

Usage

Chat History:

Keyword:

Keyword Detection:

Windows Sound Recording Level:

Suggested Services:

Program Starter

Webhooks

Plugins

The [TEXT] variable

Speech recognition

Groq Speech Recognition:

AI Chat

ChatGPT:

Groq LLM API:

Speech Synthesis

Elevenlabs:

Windows Speech Synthesis:

Costs

Speech Recognition - one of the following:

AI Chat - one of the following:

Speech output - one of the following:

Writing a plugin

Troubleshooting

Planned Features