{"id":31845944,"url":"https://github.com/psychip/vox","last_synced_at":"2025-10-12T08:44:20.516Z","repository":{"id":317041318,"uuid":"1023024047","full_name":"PsyChip/VOX","owner":"PsyChip","description":"Conversational Voice Agent with Tool Support","archived":false,"fork":false,"pushed_at":"2025-10-08T03:17:56.000Z","size":34620,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-08T04:21:32.985Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://vox.psychip.net","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PsyChip.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-20T11:13:51.000Z","updated_at":"2025-10-08T03:17:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"1756b52c-a2a9-4dcf-8757-899827b0c6ce","html_url":"https://github.com/PsyChip/VOX","commit_stats":null,"previous_names":["psychip/vox"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/PsyChip/VOX","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2FVOX","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2FVOX/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2FVOX/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2FVOX/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PsyChip","download_url":"https://codeload.github.com/PsyChip/VOX/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PsyChip%2FVOX/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279010792,"owners_count":26084807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-12T08:44:11.387Z","updated_at":"2025-10-12T08:44:20.510Z","avatar_url":"https://github.com/PsyChip.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VOX - Conversational Voice Agent\n\nA multilingual conversational AI agent powered by ElevenLabs, featuring real-time audio visualization, geographic location awareness, and 30+ integrated tools including weather, news, search, image gallery, navigation, flight search, and academic research.\n\n## Team Members\n- Alec Fritsch (@flokzybtw)\n- Mehmet Ali Dolgun (@psychip_)\n\n## Live Demo\n[vox.psychip.net](https://vox.psychip.net)\n\n## Project Overview\n\nThis application demonstrates an advanced conversational AI interface with:\n- **Real-time voice conversation** using ElevenLabs Conversational AI\n- **Multi-language support** with Turkish, English, German, and Spanish\n- **Dynamic audio visualization** with speech activity detection\n- **Geographic awareness** with IP-based location detection\n- **30+ integrated tools** for weather, news, search, navigation, flights, and more\n- **Touch-friendly interface** with automatic device detection\n- **Image gallery system** with automated visual search and modal view\n- **Responsive web interface** with mobile optimization\n\n## Available Tools\n\n### Information \u0026 Search\n- **web-search** - Search the web using Google\n  - *\"Search for quantum computing\"*\n  - *\"Look up climate change effects\"*\n\n- **image-search** - Find images across the web\n  - *\"Show me pictures of Mount Everest\"*\n  - *\"Find images of sports cars\"*\n  - **Automatically triggers** when discussing celebrities, places, landmarks, movies, products, animals, or any visual subject\n\n- **latest-news** - Get recent news articles by location or topic\n  - *\"What's the latest news?\"*\n  - *\"Get technology news\"*\n  - *\"News about Istanbul\"*\n  - Automatically filters out sports news unless specifically requested\n\n- **latest-earthquakes** - Check recent earthquakes near location\n  - *\"Any earthquakes nearby?\"*\n  - *\"Recent earthquakes in California\"*\n  - Reports magnitude, location, and depth\n\n### Weather \u0026 Location\n- **get-weather** - Get current weather and forecast\n  - *\"What's the weather?\"*\n  - *\"Weather in London\"*\n  - *\"Will it rain today?\"*\n\n- **poi-search** - Find nearby points of interest\n  - *\"Find a hospital nearby\"*\n  - *\"Where's the nearest gas station?\"*\n  - *\"Show me restaurants\"*\n  - Types: hospital, pharmacy, gas station, charging station, atm, parking, hotel, cafe, bank, police\n\n- **save-location** - Save current location as KML file\n  - *\"Save this location\"*\n  - *\"Mark this as parking spot\"*\n\n- **local-events** - Find upcoming local events\n  - *\"What's happening this weekend?\"*\n  - *\"Any concerts in Berlin?\"*\n\n- **get-address** - Reverse geocoding to identify current location\n  - *\"Where am I?\"*\n  - *\"What is this place?\"*\n  - *\"I'm lost\"*\n\n### Travel \u0026 Navigation\n- **flight-search** - Search for available flights between cities\n  - *\"Find flights to Berlin\"*\n  - *\"Flights from Istanbul to Berlin tomorrow\"*\n  - *\"Fly to London today\"*\n  - Automatically finds airport IATA codes via web search for any city\n  - Supports date parsing (today, tomorrow, YYYY-MM-DD)\n  - Converts USD prices to local currency\n\n- **Google Maps Navigation** - Get driving directions\n  - *\"Navigate to Istanbul\"*\n  - *\"Take me to the airport\"*\n  - *\"Directions to the nearest hospital\"*\n\n- **Hotel Search** - Find accommodation via Hotels.com\n  - *\"Find a hotel\"*\n  - *\"Hotels in Paris\"*\n  - *\"Where to stay in Tokyo\"*\n\n### Media \u0026 Entertainment\n- **music-search** - Search and play music\n  - *\"Play Bohemian Rhapsody\"*\n  - *\"Play Mozart Symphony No 40\"*\n\n- **YouTube Search** - Find videos and music\n  - *\"Show me the Thriller music video\"*\n  - *\"How to tie a tie\"*\n\n- **SoundCloud Search** - Find music, remixes, DJ sets\n  - *\"Find lo-fi hip hop on SoundCloud\"*\n  - *\"Search for deadmau5 live set\"*\n\n### Shopping\n- **Amazon Search** - Search for products\n  - *\"Find wireless headphones on Amazon\"*\n  - *\"Search for Sony cameras\"*\n\n- **eBay Search** - Find used items, collectibles\n  - *\"Find used MacBook Pro on eBay\"*\n  - *\"Search for vintage watches\"*\n\n- **app-search** - Find apps for your platform\n  - *\"Find Spotify\"*\n  - *\"Search for WhatsApp\"*\n  - Auto-detects platform (Android/iOS/Windows/Linux)\n\n### Academic \u0026 Research\n- **Google Scholar** - Search academic papers across all disciplines\n  - *\"Find research on climate change\"*\n\n- **Semantic Scholar** - AI-powered academic search with citation context\n  - *\"Find machine learning papers\"*\n\n- **PubMed** - Medical and life sciences research\n  - *\"Search for diabetes treatment research\"*\n\n- **JSTOR** - Humanities and social sciences archives\n  - *\"Find articles on ancient philosophy\"*\n\n- **ResearchGate** - Academic networking and paper sharing\n  - *\"Find papers on renewable energy\"*\n\n### Social Media\n- **Reddit Search** - Find community discussions\n  - *\"Search Reddit for gaming PC builds\"*\n\n- **X/Twitter Search** - Real-time updates and reactions\n  - *\"Search Twitter for AI news\"*\n\n### Entertainment Info\n- **IMDB Search** - Find movies, TV shows, actors\n  - *\"Find Inception on IMDB\"*\n  - *\"Search for Breaking Bad\"*\n\n### Utilities\n- **calculator** - Complex mathematical calculations\n  - *\"Calculate square root of 144 plus 5 squared\"*\n  - *\"Convert 3.5 inches to centimeters\"*\n  - *\"Multiply matrix [1,2][3,4] by [5,6][7,8]\"*\n\n- **currency-convert** - Convert between currencies\n  - *\"Convert 100 USD to EUR\"*\n  - *\"How much is 50 dollars in my currency?\"*\n\n- **visible-aircraft** - Check aircraft overhead\n  - *\"How many planes are in the sky?\"*\n  - *\"Show visible aircraft\"*\n\n- **author** - Generate long-form content (recipes, code, guides)\n  - *\"Write a Python script to backup files\"*\n  - *\"Give me a chocolate cake recipe\"*\n  - *\"Create a Linux installation guide\"*\n\n### Image Gallery\n- **pick-card** - Randomly select and open an image with personalized comment\n  - *\"Pick one\"*\n  - *\"Show me one\"*\n  - *\"Open one of those\"*\n  - Agent provides unique contextual comments for each selection\n\n- **next-card** - Navigate to next image in modal\n  - *\"Next\"*\n  - *\"Show me another\"*\n\n- **close-card** - Close image modal\n  - *\"Close\"*\n  - *\"That's enough\"*\n\n### Personal\n- **take-note** - Capture spoken notes\n  - *\"Take a note: meeting at 3 PM\"*\n  - *\"Remember to buy milk\"*\n\n- **save-name** - Save your name for personalization\n  - *\"My name is John\"*\n  - *\"I'm Sarah\"*\n\n### System\n- **volume-adjust** - Adjust master volume by 10%\n  - *\"Turn it up\"*\n  - *\"I can't hear you\"*\n  - *\"Too loud\"*\n  - Recognizes casual volume requests\n\n- **reset** - Factory reset with data clearing\n  - *\"Forget about me\"*\n  - *\"Delete everything\"*\n  - *\"Reset to factory settings\"*\n  - Clears all user data and preferences\n\n- **end-session** - End conversation\n  - *\"Goodbye\"*\n  - *\"End session\"*\n\n## Keyboard Shortcuts\n\n- **Tab** - Toggle text input window\n- **\\` (Backtick)** - Toggle debug console\n- **Escape** - Close image modal\n- **Arrow Left** - Previous image in modal\n- **Arrow Right** - Next image in modal\n\n## Core Technologies\n- **Node.js** with Express.js server\n- **Webpack** for module bundling\n- **Web Audio API** for real-time audio processing\n- **Canvas API** for audio visualization and image gallery\n- **MaxMind GeoIP2** for location detection\n\n## APIs \u0026 Services\n- **ElevenLabs API** - Voice synthesis and conversation management\n- **SerpAPI** - Web search, image search, news, events, and flight data\n- **OpenWeather API** - Weather information and forecasts\n- **Google Places API** - Points of interest search\n- **AltınKaynak API** - Turkish Lira currency rates\n- **OpenExchangeRates API** - Global currency conversion\n- **EMSC \u0026 USGS** - Earthquake data feeds\n- **MaxMind GeoLite2** - Local IP geolocation\n- **AviationStack API** - Visible aircraft tracking\n- **Math.js** - Complex mathematical calculations\n\n## Installation \u0026 Setup\n\n### 1. Clone the Repository\n```bash\ngit clone https://github.com/psychip/berlin-hackathon\ncd berlin-hackathon\n```\n\n### 2. Install Dependencies\n```bash\nnpm install\n```\n\n### 3. Environment Configuration\nCreate a `.env` file in the root directory:\n```env\n# ElevenLabs Configuration\nXI_API_KEY=your_elevenlabs_api_key\nAGENT_ID=your_elevenlabs_agent_id\n\n# API Keys\nSERPAPI_KEY=your_serpapi_key\nOPENWEATHER_KEY=your_openweather_key\nOPENEXCHANGERATES_KEY=your_openexchangerates_key\nGPLACES_KEY=your_google_places_key\n\n# Server Configuration\nPORT=3388\n```\n\n### 4. Database Setup\nThe application includes MaxMind GeoLite2 databases for IP geolocation:\n- `db/GeoLite2-City.mmdb` - City-level geolocation\n- `db/GeoLite2-ASN.mmdb` - ISP/Organization data\n\nThese are included in the repository for development purposes.\n\n## Running the Application\n\n```bash\nnpm run build\nnode server.js\n```\n\nThe application will be available at `http://localhost:3388`\n\n## Project Structure\n\n```\nVOX/\n├── src/                    # Frontend source files\n│   ├── app.js             # Main application logic\n│   ├── index.html         # HTML template\n│   ├── styles.css         # Stylesheets\n├── dist/                  # Built/compiled files\n│   ├── bundle.js          # Webpack compiled bundle\n│   ├── index.html         # Production HTML\n│   └── static/            # Static assets (sound effects)\n├── content/               # Agent configuration\n│   ├── system.md          # System prompt and tool definitions\n│   ├── drift.md           # Critical reminders\n│   ├── character.md       # Character definitions\n│   ├── greetings.json     # Greeting templates\n│   └── tool.md            # Tool implementation guide\n├── db/                    # Databases\n│   ├── GeoLite2-*.mmdb   # MaxMind GeoIP databases\n│   ├── api.json          # API endpoint configurations\n│   ├── currency.json     # Currency data\n│   └── lang.json         # Language settings\n├── server.js              # Express.js backend server\n├── token.py              # Token counter utility\n├── webpack.config.js      # Webpack configuration\n└── package.json          # Project dependencies\n```\n\n## Language Support\n\nVOX supports 4 languages with full localization:\n\n- **Turkish (tr)** - Türkçe - Default for Turkey\n- **English (en)** - English - Default for most regions\n- **German (de)** - Deutsch - Default for Germany, Austria, Switzerland\n- **Spanish (es)** - Español - Default for Spain and Latin America\n\nLanguage is automatically detected from user's IP location and can be changed via the language selection screen on first launch.\n\n## Configuration\n\n### Audio Processing\n- **FFT Size**: 256 (standard), 64 (low-end devices)\n- **Smoothing**: 0.6 (standard), 0.25 (low-end)\n- **Speech Detection Threshold**: 15\n- **Silence Detection**: 800ms pause for sentence end\n- **Subtitle Speed**: 75 characters per second\n\n### Touch UI (Tablets/Smartphones)\n- **UI Timeout**: 5000ms (5 seconds) - configurable in `src/app.js` via `TOUCH_UI_TIMEOUT`\n- Controls auto-hide after timeout, reappear on touch\n\n### Visualization\n- **Circle Radius**: 80px\n- **Audio Multiplier**: 40 (standard), 15 (low-end)\n- **Color Speed**: 10\n- **Glow Effect**: 8 (disabled on low-end devices)\n\n### Performance Optimization\n- Automatic device capability detection\n- Low-end mode for devices with \u003c8GB RAM\n- Manual override: `?lowperf=true/false`\n\n## Features\n\n### Audio Visualization\n- Real-time FFT analysis\n- Circular spectrum display with rotation\n- Speech activity detection with visual feedback\n- Agent/user state differentiation\n- Performance-adaptive rendering\n\n### Image Gallery\n- Animated image display with random placement\n- Collision detection and smart layout\n- Click to view full-size in modal\n- Keyboard navigation (arrow keys)\n- Automatic fade-out on disconnect\n- Hover effects with scaling\n\n### Touch-Friendly UI\n- Automatic touch device detection\n- Auto-hiding controls after 5 seconds\n- Show on touch/tap\n- Affects volume bar, call controls, topic display\n\n### Subtitle System\n- Intelligent sentence splitting (respects abbreviations like \"Mr.\", \"Dr.\")\n- Dynamic display timing (30 chars/second)\n- Automatic handling of transcription errors\n\n### Topic Display\n- Shows current conversation topic\n- Color-coded tags\n- Hover to view (desktop) or touch to show (mobile)\n- Persists across sessions\n\n### Conversation Management\n- Time-based greetings\n- **Multi-language support**: Turkish, English, German, Spanish\n- Location and timezone awareness\n- Session history tracking\n- Error handling with audio feedback\n- Proactive image search for visual subjects\n- Automatic tool triggering based on context\n\n## Common Issues\n\n**Agent Not Connecting**\n- Verify ElevenLabs API key and Agent ID\n- Check network connectivity\n- Confirm microphone permissions\n\n**Performance Issues**\n- Try low performance mode: `?lowperf=true`\n- Close other audio applications\n- Use supported browsers (Chrome, Firefox, Safari)\n\n**No Audio/Microphone**\n- Grant microphone permissions\n- Check microphone is not stereo mix\n- Verify no other application is using microphone\n\n## Development\n\n### Adding New Tools\n1. Define tool in `content/system.md` with trigger patterns and examples\n2. Add API endpoint to `db/api.json` if needed\n3. Implement handler in `server.js` (for server-side tools)\n4. Add client-side handler in `src/app.js` if needed\n5. Test tool across all supported languages\n\n### Adding New Languages\n1. Create language folder in `content/[language-code]/`\n2. Add `agent.md` with localized instructions\n3. Add `greetings.json` with time-based greeting templates\n4. Update `db/lang.json` with language configuration\n5. Add language card to `src/index.html`\n6. Test all tools and responses in new language\n\n### Modifying System Prompt\nEdit `content/system.md` - changes apply immediately after agent restart.\n\n### Adjusting Touch UI Timeout\nModify `TOUCH_UI_TIMEOUT` constant in `src/app.js` (line 20).\n\n## Browser Compatibility\n- **Chrome/Edge**: Full support ✅\n- **Firefox**: Full support ✅\n- **Safari**: Full support ✅\n- **Mobile browsers**: Touch-optimized ✅\n\n## License\n\nThis project is developed for educational and demonstration purposes as part of the {Tech:Europe} Berlin Hackathon 2025.\n\n---\n\nBuilt with ❤️ in 48 hours for the Berlin Hackathon\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsychip%2Fvox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpsychip%2Fvox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpsychip%2Fvox/lists"}