https://github.com/codervivek5/acko-voice-assistant
https://github.com/codervivek5/acko-voice-assistant
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/codervivek5/acko-voice-assistant
- Owner: codervivek5
- Created: 2025-09-11T05:45:42.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-09-11T06:02:03.000Z (10 months ago)
- Last Synced: 2025-09-11T09:39:23.383Z (10 months ago)
- Language: JavaScript
- Homepage: https://acko-voice-assistant.vercel.app
- Size: 127 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🏥 Acko Medical Voice Assistant
An AI-powered real-time voice transcription and reflexive question generator specifically designed for Medical Examination Reports (MER) in health insurance underwriting. Built for the ACKO hackathon to enhance doctor-customer interactions during tele/video consultations.
## ✨ Features
### 🎤 Enhanced Speech Recognition
- **Real-Time Transcription**: High-accuracy speech-to-text with Indian accent support
- **Bilingual Support**: Hindi + English with dynamic language switching
- **Indian Accent Recognition**: Optimized for diverse Indian dialects and pronunciation
- **Speaker Annotation**: Automatic Doctor/Patient identification
- **Confidence Scoring**: Real-time accuracy feedback and language detection
- **Advanced Statistics**: Duration, word count, language switches, and confidence metrics
### 🤖 Medical AI Intelligence
- **Reflexive Question Generation**: Context-aware medical questions for insurance underwriting
- **Sentiment Analysis**: Detect patient distress, confusion, or emotional state
- **Risk Assessment**: Automatic flagging of high-risk medical indicators
- **Medical Context Extraction**: Identify symptoms, conditions, and medications
- **Question Categorization**: Binary, Scale, and Open-ended question types
- **Priority-based Display**: High, Medium, Normal priority questions
- **Comprehensive Medical Summaries**: Professional MER reports with risk flags
### 🧠 Session Memory & Context
- **Multi-day Context Retention**: Maintain conversation context across sessions
- **Session History Search**: Find and load previous consultations
- **Context-aware Questions**: Generate questions based on prior session data
- **Risk Level Tracking**: Monitor patient risk across multiple consultations
- **Session Analytics**: Track consultation patterns and outcomes
- **Cloud Storage**: Secure Firebase integration for session persistence
### 🎨 Medical Doctor Interface
- **Real-time Annotations**: Live speaker identification and medical insights
- **Interactive Question Management**: Select, edit, and swap suggested questions
- **Visual Risk Indicators**: Color-coded risk levels and priority alerts
- **Professional Medical Theme**: Healthcare-focused design with medical icons
- **Responsive Design**: Optimized for various devices and screen sizes
- **Accessibility**: Keyboard navigation and screen reader support
## 🏆 Hackathon Requirements Met
### ✅ Mandatory Requirements
1. **Real-Time Voice Transcription**
- Robust speech-to-text with Indian accent support
- Bilingual capabilities (Hindi + English)
- Dynamic language switching during sessions
2. **Reflexive Question Generation**
- Medical-specific NLU processing
- Clinically appropriate, context-aware questions
- Emotion/sentiment detection and alerts
- Multiple question types (open-ended, binary, scale-based)
3. **Interactive Doctor Interface**
- Real-time transcribed text with speaker annotations
- Accept/edit/swap suggested questions
- Visual cues and organized interaction flow
### ✅ Good-to-Have Features
4. **Consultation Summary Generator**
- Automatic session summaries with key responses
- Risk flags and missing information alerts
- Professional medical report formatting
5. **Context Retention & Adaptability**
- Session memory across conversation turns
- Multi-day conversation support
- Adaptive follow-up questions based on prior sessions
6. **System Scalability and Performance**
- Low-latency transcription and response generation
- High concurrency support for multiple sessions
- Optimized for real-time medical consultations
## 🚀 Quick Start
### Prerequisites
- Node.js 16+ and npm
- Modern browser with speech recognition support (Chrome, Edge recommended)
- Firebase project for data persistence
- Google Gemini API key for AI features
### Installation
1. **Clone the repository**
```bash
git clone https://github.com/your-username/acko-voice-assistant.git
cd acko-voice-assistant
```
2. **Install dependencies**
```bash
npm install
```
3. **Environment Setup**
Create a `.env` file in the root directory:
```env
VITE_FIREBASE_API_KEY=your_firebase_api_key
VITE_FIREBASE_AUTH_DOMAIN=your_project.firebaseapp.com
VITE_FIREBASE_PROJECT_ID=your_project_id
VITE_FIREBASE_STORAGE_BUCKET=your_project.appspot.com
VITE_FIREBASE_MESSAGING_SENDER_ID=your_sender_id
VITE_FIREBASE_APP_ID=your_app_id
VITE_GEMINI_API_KEY=your_gemini_api_key
```
4. **Start the development server**
```bash
npm run dev
```
5. **Open your browser**
Navigate to `http://localhost:5173`
## 🏗️ Project Structure
```
src/
├── components/
│ └── SpeechRecognition.jsx # Main speech recognition component
├── config/
│ ├── firebase.js # Firebase configuration
│ └── gemini.js # Gemini AI configuration
├── services/
│ ├── aiService.js # AI service for questions and summaries
│ └── databaseService.js # Database operations
├── App.jsx # Main application component
├── App.css # Application styles
├── index.css # Global styles and CSS variables
└── main.jsx # Application entry point
```
## 🔧 Configuration
### Firebase Setup
1. Create a new Firebase project at [Firebase Console](https://console.firebase.google.com)
2. Enable Firestore Database
3. Configure security rules for your use case
4. Add your Firebase config to the `.env` file
### Gemini AI Setup
1. Get your API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Add the key to your `.env` file as `VITE_GEMINI_API_KEY`
## 📱 Usage
### Starting a Session
1. Click "Start Recording" to begin speech recognition
2. Speak clearly into your microphone
3. Watch real-time transcription appear
4. Use the settings panel to adjust language preferences
### Generating Questions
- Questions are automatically generated as you speak
- Click the copy button to copy questions to clipboard
- Questions are contextually relevant to your consultation
### Session Management
- Click "Generate Summary" to create a session summary
- Use "Save Session" to store the session in Firebase
- Export sessions as JSON files for external use
- View recent sessions in the sidebar
### Settings & Customization
- Language selection for speech recognition
- Mute/unmute functionality
- Real-time statistics display
- Error handling and notifications
## 🛠️ Development
### Available Scripts
- `npm run dev` - Start development server
- `npm run build` - Build for production
- `npm run preview` - Preview production build
- `npm run lint` - Run ESLint
### Code Style
- ESLint configuration for consistent code quality
- Modern React patterns with hooks
- CSS custom properties for theming
- Responsive design principles
## 🔒 Security & Privacy
- **Local Processing**: Speech recognition runs in the browser
- **Secure Storage**: Firebase provides encrypted cloud storage
- **API Security**: Environment variables protect API keys
- **Data Privacy**: No data is shared with third parties except configured services
## 🌐 Browser Support
- **Chrome**: Full support (recommended)
- **Edge**: Full support
- **Firefox**: Limited speech recognition support
- **Safari**: Limited speech recognition support
## 📊 Performance
- **Optimized Bundle**: Vite for fast development and building
- **Lazy Loading**: Components load as needed
- **Efficient State Management**: React hooks for optimal performance
- **Caching**: Firebase caching for improved data access
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🆘 Support
- **Documentation**: Check this README and inline code comments
- **Issues**: Report bugs and request features via GitHub Issues
- **Discussions**: Join community discussions for help and ideas
## 🔮 Roadmap
- [ ] Multi-user support with authentication
- [ ] Advanced AI insights and analytics
- [ ] Integration with EHR systems
- [ ] Mobile app development
- [ ] Voice commands for navigation
- [ ] Custom question templates
- [ ] Session collaboration features
## 🙏 Acknowledgments
- **Google**: Speech Recognition API and Gemini AI
- **Firebase**: Backend infrastructure
- **Lucide React**: Beautiful icons
- **Vite**: Fast build tooling
- **React**: Frontend framework
---
**Made with ❤️ for healthcare professionals**