https://github.com/codervivek5/acko-voice-assistant

Last synced: 10 months ago
JSON representation

Host: GitHub
URL: https://github.com/codervivek5/acko-voice-assistant
Owner: codervivek5
Created: 2025-09-11T05:45:42.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-09-11T06:02:03.000Z (10 months ago)
Last Synced: 2025-09-11T09:39:23.383Z (10 months ago)
Language: JavaScript
Homepage: https://acko-voice-assistant.vercel.app
Size: 127 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🏥 Acko Medical Voice Assistant

An AI-powered real-time voice transcription and reflexive question generator specifically designed for Medical Examination Reports (MER) in health insurance underwriting. Built for the ACKO hackathon to enhance doctor-customer interactions during tele/video consultations.

## ✨ Features

### 🎤 Enhanced Speech Recognition
- **Real-Time Transcription**: High-accuracy speech-to-text with Indian accent support
- **Bilingual Support**: Hindi + English with dynamic language switching
- **Indian Accent Recognition**: Optimized for diverse Indian dialects and pronunciation
- **Speaker Annotation**: Automatic Doctor/Patient identification
- **Confidence Scoring**: Real-time accuracy feedback and language detection
- **Advanced Statistics**: Duration, word count, language switches, and confidence metrics

### 🤖 Medical AI Intelligence
- **Reflexive Question Generation**: Context-aware medical questions for insurance underwriting
- **Sentiment Analysis**: Detect patient distress, confusion, or emotional state
- **Risk Assessment**: Automatic flagging of high-risk medical indicators
- **Medical Context Extraction**: Identify symptoms, conditions, and medications
- **Question Categorization**: Binary, Scale, and Open-ended question types
- **Priority-based Display**: High, Medium, Normal priority questions
- **Comprehensive Medical Summaries**: Professional MER reports with risk flags

### 🧠 Session Memory & Context
- **Multi-day Context Retention**: Maintain conversation context across sessions
- **Session History Search**: Find and load previous consultations
- **Context-aware Questions**: Generate questions based on prior session data
- **Risk Level Tracking**: Monitor patient risk across multiple consultations
- **Session Analytics**: Track consultation patterns and outcomes
- **Cloud Storage**: Secure Firebase integration for session persistence

### 🎨 Medical Doctor Interface
- **Real-time Annotations**: Live speaker identification and medical insights
- **Interactive Question Management**: Select, edit, and swap suggested questions
- **Visual Risk Indicators**: Color-coded risk levels and priority alerts
- **Professional Medical Theme**: Healthcare-focused design with medical icons
- **Responsive Design**: Optimized for various devices and screen sizes
- **Accessibility**: Keyboard navigation and screen reader support

## 🏆 Hackathon Requirements Met

### ✅ Mandatory Requirements
1. **Real-Time Voice Transcription**
- Robust speech-to-text with Indian accent support
- Bilingual capabilities (Hindi + English)
- Dynamic language switching during sessions

2. **Reflexive Question Generation**
- Medical-specific NLU processing
- Clinically appropriate, context-aware questions
- Emotion/sentiment detection and alerts
- Multiple question types (open-ended, binary, scale-based)

3. **Interactive Doctor Interface**
- Real-time transcribed text with speaker annotations
- Accept/edit/swap suggested questions
- Visual cues and organized interaction flow

### ✅ Good-to-Have Features
4. **Consultation Summary Generator**
- Automatic session summaries with key responses
- Risk flags and missing information alerts
- Professional medical report formatting

5. **Context Retention & Adaptability**
- Session memory across conversation turns
- Multi-day conversation support
- Adaptive follow-up questions based on prior sessions

6. **System Scalability and Performance**
- Low-latency transcription and response generation
- High concurrency support for multiple sessions
- Optimized for real-time medical consultations

## 🚀 Quick Start

### Prerequisites
- Node.js 16+ and npm
- Modern browser with speech recognition support (Chrome, Edge recommended)
- Firebase project for data persistence
- Google Gemini API key for AI features

### Installation

1. **Clone the repository**
```bash
git clone https://github.com/your-username/acko-voice-assistant.git
cd acko-voice-assistant
```

2. **Install dependencies**
```bash
npm install
```

3. **Environment Setup**
Create a `.env` file in the root directory:
```env
VITE_FIREBASE_API_KEY=your_firebase_api_key
VITE_FIREBASE_AUTH_DOMAIN=your_project.firebaseapp.com
VITE_FIREBASE_PROJECT_ID=your_project_id
VITE_FIREBASE_STORAGE_BUCKET=your_project.appspot.com
VITE_FIREBASE_MESSAGING_SENDER_ID=your_sender_id
VITE_FIREBASE_APP_ID=your_app_id
VITE_GEMINI_API_KEY=your_gemini_api_key
```

4. **Start the development server**
```bash
npm run dev
```

5. **Open your browser**
Navigate to `http://localhost:5173`

## 🏗️ Project Structure

```
src/
├── components/
│ └── SpeechRecognition.jsx # Main speech recognition component
├── config/
│ ├── firebase.js # Firebase configuration
│ └── gemini.js # Gemini AI configuration
├── services/
│ ├── aiService.js # AI service for questions and summaries
│ └── databaseService.js # Database operations
├── App.jsx # Main application component
├── App.css # Application styles
├── index.css # Global styles and CSS variables
└── main.jsx # Application entry point
```

## 🔧 Configuration

### Firebase Setup
1. Create a new Firebase project at [Firebase Console](https://console.firebase.google.com)
2. Enable Firestore Database
3. Configure security rules for your use case
4. Add your Firebase config to the `.env` file

### Gemini AI Setup
1. Get your API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Add the key to your `.env` file as `VITE_GEMINI_API_KEY`

## 📱 Usage

### Starting a Session
1. Click "Start Recording" to begin speech recognition
2. Speak clearly into your microphone
3. Watch real-time transcription appear
4. Use the settings panel to adjust language preferences

### Generating Questions
- Questions are automatically generated as you speak
- Click the copy button to copy questions to clipboard
- Questions are contextually relevant to your consultation

### Session Management
- Click "Generate Summary" to create a session summary
- Use "Save Session" to store the session in Firebase
- Export sessions as JSON files for external use
- View recent sessions in the sidebar

### Settings & Customization
- Language selection for speech recognition
- Mute/unmute functionality
- Real-time statistics display
- Error handling and notifications

## 🛠️ Development

### Available Scripts
- `npm run dev` - Start development server
- `npm run build` - Build for production
- `npm run preview` - Preview production build
- `npm run lint` - Run ESLint

### Code Style
- ESLint configuration for consistent code quality
- Modern React patterns with hooks
- CSS custom properties for theming
- Responsive design principles

## 🔒 Security & Privacy

- **Local Processing**: Speech recognition runs in the browser
- **Secure Storage**: Firebase provides encrypted cloud storage
- **API Security**: Environment variables protect API keys
- **Data Privacy**: No data is shared with third parties except configured services

## 🌐 Browser Support

- **Chrome**: Full support (recommended)
- **Edge**: Full support
- **Firefox**: Limited speech recognition support
- **Safari**: Limited speech recognition support

## 📊 Performance

- **Optimized Bundle**: Vite for fast development and building
- **Lazy Loading**: Components load as needed
- **Efficient State Management**: React hooks for optimal performance
- **Caching**: Firebase caching for improved data access

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🆘 Support

- **Documentation**: Check this README and inline code comments
- **Issues**: Report bugs and request features via GitHub Issues
- **Discussions**: Join community discussions for help and ideas

## 🔮 Roadmap

- [ ] Multi-user support with authentication
- [ ] Advanced AI insights and analytics
- [ ] Integration with EHR systems
- [ ] Mobile app development
- [ ] Voice commands for navigation
- [ ] Custom question templates
- [ ] Session collaboration features

## 🙏 Acknowledgments

- **Google**: Speech Recognition API and Gemini AI
- **Firebase**: Backend infrastructure
- **Lucide React**: Beautiful icons
- **Vite**: Fast build tooling
- **React**: Frontend framework

---

**Made with ❤️ for healthcare professionals**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/codervivek5/acko-voice-assistant

Awesome Lists containing this project

README