https://github.com/psychip/berlin-hackathon

11labs powered conversational ai agent
https://github.com/psychip/berlin-hackathon

Last synced: 11 months ago
JSON representation

11labs powered conversational ai agent

Host: GitHub
URL: https://github.com/psychip/berlin-hackathon
Owner: PsyChip
Created: 2025-07-19T09:02:10.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-07-19T13:01:17.000Z (11 months ago)
Last Synced: 2025-07-19T14:57:53.921Z (11 months ago)
Language: JavaScript
Size: 68.4 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Conversational Voice Agent with Tool Support

A conversational AI agent powered by ElevenLabs conversational agents, featuring real-time audio visualization, geographic location awareness, and integrated tool capabilities including weather, directions, and search functionality.

## Team Members
- Alec Fritsch (@flokzybtw)
- Mehmet Ali Dolgun (@psychip_)

## Live Demo
[vox.psychip.net](https://vox.psychip.net)

## Project Overview

This application demonstrates an advanced conversational AI interface with:
- **Real-time voice conversation** using ElevenLabs Conversational AI
- **Dynamic audio visualization** with speech activity detection
- **Geographic awareness** with IP-based location detection
- **Integrated tools** for weather, directions, and search
- **Responsive web interface** with mobile optimization

### Core Technologies
- **Node.js** with Express.js server
- **Webpack** for module bundling and development
- **Web Audio API** for real-time audio processing
- **Canvas API** for audio visualization
- **MaxMind GeoIP2** for location detection

### APIs & Services
- **ElevenLabs API** - Voice synthesis and conversation management
- **Google Routes API** - Driving directions (11labs tool)
- **OpenWeather API** - Weather information (11labs tool)
- **Google Custom Search API** - Web search capabilities (11labs tool)
- **MaxMind GeoLite2** - Local IP geolocation databases

### Frontend Libraries
- **Sound.js** - Sound effects and noise generation
- **Web Audio API** - Real-time audio analysis and effects

## Prerequisites
- **Node.js** (v16 or higher)
- **npm** package manager
- **ElevenLabs account** with API access
- **Google Cloud Platform** account (for Routes and Search APIs)
- **SerpAPI** for local news
- **OpenWeather** account for weather data

## Installation & Setup

### 1. Clone the Repository
```bash
git clone https://github.com/psychip/berlin-hackathon
cd berlin-hackathon
```

### 2. Install Dependencies
```bash
npm install
```

### 3. Environment Configuration
Create a `.env` file in the root directory:
```env
# ElevenLabs Configuration
XI_API_KEY=your_elevenlabs_api_key
AGENT_ID=your_elevenlabs_agent_id

# Server Configuration
PORT=3388
```

note: google cloud and serpapi keys hardcoded into 11labs tool calls

### 4. ElevenLabs Agent Setup
1. Create an account at [ElevenLabs](https://elevenlabs.io)
2. Navigate to the Conversational AI section
3. Create a new agent with the following configuration:
- **Voice**: Choose your preferred voice model
- **Tools**: Enable the following tools:

take a look to the screenshots in ./doc folder for detailed setup

4. Copy the Agent ID to your `.env` file

### 5. Database Setup
The application includes MaxMind GeoLite2 databases for IP geolocation:
- `db/GeoLite2-City.mmdb` - City-level geolocation
- `db/GeoLite2-ASN.mmdb` - ISP/Organization data

These are included in the repository for development purposes.

## Running the Application

```bash
npm run build
node server.js
```

## Project Structure

```
berlin-hackathon/
├── src/ # Frontend source files
│ ├── app.js # Main application logic
│ ├── index.html # HTML template
│ ├── styles.css # Stylesheets
├── dist/ # Built/compiled files
│ ├── bundle.js # Webpack compiled bundle
│ ├── index.html # Production HTML
│ └── static/ # Static assets
├── db/ # MaxMind GeoIP databases
│ ├── GeoLite2-City.mmdb
│ └── GeoLite2-ASN.mmdb
├── server.js # Express.js backend server
├── system_prompt.txt # AI agent system prompt
├── webpack.config.js # Webpack configuration
├── package.json # Project dependencies
└── README.md # This documentation
```

## Configuration Details

### Audio Processing
- **FFT Size**: 256 (standard), 64 (low-end devices)
- **Smoothing**: 0.6 (standard), 0.25 (low-end)
- **Speech Detection Threshold**: 15 (adjustable)
- **Silence Detection**: 800ms pause for sentence end

### Visualization Settings
- **Circle Radius**: 80px
- **Audio Multiplier**: 40 (standard), 15 (low-end)
- **Color Speed**: 10
- **Glow Effect**: 8 (disabled on low-end devices)

### Performance Optimization
The application automatically detects device capabilities:
- **Mobile devices** or devices with <8GB RAM use optimized settings
- **Manual override** available via URL parameter: `?lowperf=true/false`

## API Integrations

### ElevenLabs Conversational AI
- Real-time voice synthesis and recognition
- Custom system prompts with location awareness
- Tool integration for external API calls
- WebSocket-based communication

### Location Services
- IP-based geolocation using MaxMind GeoLite2
- Automatic timezone and location detection
- Privacy-focused (no external API calls for basic geolocation)

## Features

### Audio Visualization
- Real-time FFT analysis
- Circular spectrum display with rotation
- Speech activity detection with visual feedback
- Agent/user state differentiation
- Performance-adaptive rendering

### Conversation Management
- Automatic greeting based on time of day
- Subtitle display
- List formatting for structured responses
- Connection status monitoring
- Error handling with audio feedback

### Common Issues

**Agent Not Connecting**
- Verify ElevenLabs API key and Agent ID
- Check network connectivity
- Confirm agent configuration matches requirements

**Performance Issues**
- Try low performance mode: `?lowperf=true`
- Close other audio applications
- Use supported browsers (Chrome, Firefox, Safari)

This project was developed for {Tech:Europe} 19/07/2025 Berlin Hackathon competition in 48 hours. For evaluation purposes, please review:
1. Code architecture and organization
2. API integration implementations
3. Real-time audio processing
4. User experience design
5. Error handling and performance optimization

## 📄 License

This project is developed for educational and demonstration purposes as part of a hackathon competition.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/psychip/berlin-hackathon

Awesome Lists containing this project

README