https://github.com/sameerqureshii/jarvis-ai-assistant
A voice controlled AI assistant built with Python that combines speech recognition, text-to-speech, and Google's Gemini AI API to create a personalized virtual assistant experience.
https://github.com/sameerqureshii/jarvis-ai-assistant
Last synced: 5 months ago
JSON representation
A voice controlled AI assistant built with Python that combines speech recognition, text-to-speech, and Google's Gemini AI API to create a personalized virtual assistant experience.
- Host: GitHub
- URL: https://github.com/sameerqureshii/jarvis-ai-assistant
- Owner: sameerqureshii
- License: mit
- Created: 2025-08-12T14:55:33.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-12T15:54:46.000Z (11 months ago)
- Last Synced: 2025-08-12T17:18:35.066Z (11 months ago)
- Size: 15.6 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Jarvis AI Assistant
A voice controlled AI assistant built with Python that combines speech recognition, text-to-speech, and Google's Gemini AI to create a personalized virtual assistant experience.
# ✨ Features:
🎤 `Voice Recognition`: Wake word detection with "Jarvis"
🗣️ `Text-to-Speech`: Natural voice responses using Google TTS
🧠 `AI Integration`: Powered by Google Gemini for intelligent responses
🌐 `Web Navigation`: Quick access to popular websites
🎵 `Music Playback`: Voice controlled music from your library
📰 `News Updates`: Fetch latest headlines from News API
⚡ `Fast Commands`: Optimized hardcoded commands for common tasks
# 🛠️ Installation:
**Prerequisites:**
1. Python 3.7 or higher
2. Requires a microphone for voice input
3. API keys must be valid for features to work
4. Requires an internet connection for gTTS, NewsAPI, and Gemini AI
# Required Dependencies:
pip install `speechrecognition`
pip install `pyttsx3`
pip install `requests`
pip install `google-generativeai`
pip install `gtts`
pip install `pygame`
pip install `pyaudio`
# API Keys Setup:
• News API: Get your free API key from NewsAPI.
• Google Gemini API: Get your API key from Google AI Studio.
# 📁 Project Structure:
jarvis-ai-assistant/
├── main.py # Main assistant code
├── musicLibrary.py # Predefined song names & YouTube links
└── README.md # Project documentation
# ⚙️ Configuration:
Before running the assistant, update the API keys in `main.py`:
• NEWS_API_KEY = "Your-API"
• GEMINI_API_KEY = "Your-API"
# 🚀 Usage:
**Run the assistant:**
python `main.py`
1. Wait for "Initializing Jarvis..." message
2. Say "Jarvis" to wake the assistant
3. Wait for "Yes, Sameer" response
4. Give your command within 6 seconds
# 📋 Function Documentation:
**Core Functions:**
`speak(text)`
**Purpose:** Converts text to speech using Google's TTS engine.
**How it works:**
1. Creates a gTTS (Google Text-to-Speech) object
2. Saves audio to temporary MP3 file
3. Uses pygame to play the audio
4. Automatically cleans up temporary files
5. Falls back to error handling if TTS fails
**Parameters:**
• `text` (string): Text to be spoken
`speak_old(text)`
**Purpose:** Alternative text-to-speech using offline pyttsx3 engine.
**How it works:**
• Uses the local pyttsx3 engine for offline TTS
• More reliable but less natural sounding than Google TTS
**Parameters:**
• `text` (string): Text to be spoken
`aiProcess(command)`
**Purpose:** Processes natural language commands using Google Gemini AI.
**How it works:**
1. Sends user command to Gemini 1.5 Flash model
2. Uses system instruction to maintain Jarvis personality
3. Returns AI generated response text
4. Includes error handling for API failures
**Parameters:**
1. `command` (string): User's voice command
**Returns:** AI generated response text
`processCommand(c)`
**Purpose:** Main command processor that handles both hardcoded and AI commands.
# How it works:
**1. Website Navigation (Hardcoded for speed)**:
• "open google" → Opens Google
• "open chatgpt" → Opens ChatGPT
• "open github" → Opens developer's GitHub profile
• "open linkedin" → Opens developer's LinkedIn profile
**4. Music Playback:**
• Command format: "play [song_name]"
• Searches musicLibrary.py for song links
• Opens YouTube links in default browser
• Provides feedback if song not found
**3. News Updates:**
• Triggers on "news" keyword
• Fetches top US headlines from News API
• Reads first 3 headlines aloud
• Handles API errors gracefully
**5. AI Fallback:**
• Any unrecognized command goes to Gemini AI
• Provides intelligent responses for general queries
**Parameters:**
• `c` (string): Voice command from user
# 🎵 Music Library:
The `musicLibrary.py` file contains a dictionary of song names mapped to YouTube URLs:
music = {
"stealth": "YouTube URL",
"march": "YouTube URL",
"skyfall": "YouTube URL",
"wolf": "YouTube URL"
}
**Usage:** Say "play [song_name]" where song_name matches a key in the dictionary.
**Adding Songs:** Add new entries to the music dictionary in the format `"song_name": "youtube_url"`
# Modifying Voice Settings:
Adjust recognition parameters in the main loop:
• `timeout`: Maximum wait time for voice input
• `phrase_time_limit`: Maximum length of spoken phrase
• `duration`: Ambient noise adjustment time
# Changing AI Personality:
Modify the system instruction in the Gemini model configuration:
`system_instruction="Your custom personality instructions here"`
# 📄 License:
This project is open source and available under the MIT License.
# 🤝 Contributing:
Feel free to fork this project and submit pull requests for improvements!
# 👨💻 Developer:
**Sameer Ahmed Qureshi**
`Portfolio`: https://sameer-personall-portfolio.vercel.app
`LinkedIn`: https://www.linkedin.com/in/sameer-ahmed-qureshi56
Built with using Python and Google AI