https://github.com/jkaraskiewicz/audio-ai
Multi-Platform Audio Transcription System
https://github.com/jkaraskiewicz/audio-ai
Last synced: 8 months ago
JSON representation
Multi-Platform Audio Transcription System
- Host: GitHub
- URL: https://github.com/jkaraskiewicz/audio-ai
- Owner: jkaraskiewicz
- Created: 2025-08-15T15:06:12.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2025-10-05T10:16:34.000Z (8 months ago)
- Last Synced: 2025-10-05T12:04:43.243Z (8 months ago)
- Language: TypeScript
- Size: 5.41 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ตโก๏ธ๐ Audio-AI
AI-powered audio transcription and analysis system that transforms voice recordings into structured, actionable markdown documents.
## ๐ Quick Start
### Option 1: Automatic Setup (Recommended)
```bash
git clone https://github.com/your-repo/audio-ai.git
cd audio-ai
./setup.sh
```
The setup script will guide you through:
- ๐ Full setup with local Whisper service
- ๐ Connect to existing Whisper service
- โ๏ธ Use cloud services (Hugging Face, etc.)
- ๐งช Development setup
### Option 2: Manual Setup
1. **Copy configuration**:
```bash
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY
```
2. **Choose your setup**:
**Local Whisper (Recommended)**:
```bash
docker-compose --profile full up -d
```
**External Whisper Service**:
```bash
# Edit .env: WHISPER_SERVICE_URL=http://your-whisper-host:port
docker-compose up audio-ai -d
```
**Cloud Services**:
```bash
# Edit .env: TRANSCRIPTION_PROVIDER=huggingface
# Add your API tokens
docker-compose up audio-ai -d
```
## ๐งช Test Your Setup
```bash
# Health check
curl http://localhost:3000/health
# Process audio file
curl -X POST http://localhost:3000/process-file -F "file=@audio.mp3"
# Process text directly
curl -X POST http://localhost:3000/process \
-H "Content-Type: application/json" \
-d '{"transcript":"Meeting notes: discuss project timeline"}'
```
## ๐ Output
Files are automatically saved to `./processed/category/YYYY-MM-DD_filename.md`:
```markdown
# Weekly Team Meeting - Project Alpha
## Summary
Discussion of Q1 goals, upcoming deadlines...
## Action Items
- [ ] John: Complete API documentation by Friday
- [ ] Sarah: Review budget proposal
## Key Ideas
- Implement user feedback system
- Mobile-first approach
## Tags
meeting, project-alpha, q1-goals
```
## ๐ง Configuration Options
### Transcription Providers
| Provider | Use Case | Setup |
|----------|----------|-------|
| **openai_whisper_webservice** | Production, existing Whisper | `WHISPER_SERVICE_URL=http://your-host:port` |
| **huggingface** | Cloud processing | `HUGGINGFACE_API_TOKEN=your_token` |
| **gemini_audio** | Google ecosystem | Uses same GEMINI_API_KEY |
| **free_web_speech** | Testing, development | No additional setup |
### Port Configuration
```bash
# Default ports (customizable in .env)
AUDIO_AI_PORT=3000 # Audio-AI backend
WHISPER_PORT=9000 # Local Whisper service
AUDIO_AI_DEV_PORT=3001 # Development server
```
### Volume Mounting
```yaml
volumes:
- ./processed:/usr/src/app/processed # Your transcribed files
- ./backend/.env:/usr/src/app/.env:ro # Configuration
```
## ๐ External Whisper Integration
### Connect to Existing Service
If you have [ahmetoner/whisper-asr-webservice](https://github.com/ahmetoner/whisper-asr-webservice) running:
```bash
# In .env file
TRANSCRIPTION_PROVIDER=openai_whisper_webservice
WHISPER_SERVICE_URL=http://localhost:1991
# Or via environment
WHISPER_SERVICE_URL=http://localhost:1991 docker-compose up audio-ai -d
```
### Supported Whisper Configurations
- **Local Docker**: `http://whisper:9000`
- **Host Service**: `http://host.docker.internal:1991`
- **Remote Service**: `http://your-server-ip:9000`
- **Local Network**: `http://192.168.1.100:9000`
## ๐ฑ Android App
```bash
cd android
./gradlew assembleDebug
./gradlew installDebug
# Configure server URL in app: http://your-server-ip:3000
```
## ๐ ๏ธ Development
```bash
# Development with hot reload
docker-compose --profile dev up -d
# Backend development
npm run dev # Start development server
npm run test # Run all tests
npm run build # Production build
# Android development
cd android && ./gradlew test
```
## ๐ฏ Advanced Features
- **๐ Audio Format Conversion**: Automatic m4aโmp3 conversion for Whisper compatibility
- **๐ง AI Analysis**: Rich content structuring with action items, ideas, and commentary
- **๐ Smart Organization**: Auto-categorization by content type
- **๐ Provider Flexibility**: Easy switching between transcription services
- **๐ณ Docker-First**: Complete containerized deployment
- **๐ Health Monitoring**: Built-in health checks and logging
## ๐ Documentation
- **[DEVELOPER.md](DEVELOPER.md)** - Complete developer guide
- **[setup.sh](setup.sh)** - Interactive setup script
- **[.env.example](.env.example)** - Configuration examples
## ๐ Use Cases
- **Meeting Notes**: Transform recordings into action items
- **Voice Memos**: Convert ideas into structured documents
- **Interviews**: Generate transcripts with key insights
- **Lectures**: Create searchable study materials
- **Brainstorming**: Capture and organize creative sessions
Transform your voice into actionable intelligence! ๐ฏ