https://github.com/samarthpandeydev/voicebook
Voicebook transforms PDF documents and YouTube videos into engaging podcast-style conversations between two AI personas, Alex and Sarah. Powered by cutting-edge AI, it combines content processing, embedding generation, and natural language understanding to create dynamic dialogues and interactive chat experiences.
https://github.com/samarthpandeydev/voicebook
gemini groq groq-api javascript notebooklm pinecone pineconedb typescript vercel
Last synced: 8 months ago
JSON representation
Voicebook transforms PDF documents and YouTube videos into engaging podcast-style conversations between two AI personas, Alex and Sarah. Powered by cutting-edge AI, it combines content processing, embedding generation, and natural language understanding to create dynamic dialogues and interactive chat experiences.
- Host: GitHub
- URL: https://github.com/samarthpandeydev/voicebook
- Owner: samarthpandeydev
- Created: 2024-11-19T22:32:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-20T07:21:01.000Z (over 1 year ago)
- Last Synced: 2025-03-31T14:58:09.494Z (about 1 year ago)
- Topics: gemini, groq, groq-api, javascript, notebooklm, pinecone, pineconedb, typescript, vercel
- Language: TypeScript
- Homepage: https://voicebookkk.vercel.app
- Size: 246 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Voicebook - PDF & YouTube to Podcast Converter
Voicebook is a sophisticated web application that transforms PDF documents and YouTube videos into engaging podcast-style conversations between two AI personas (Alex and Sarah). The application leverages advanced AI technologies for content processing, embedding generation, and natural language understanding.
## 🚀 Features
### PDF Processing
- Upload and process PDF documents
- Automatic content chunking and embedding
- Vector storage in Pinecone database
- Generate AI-powered podcast conversations
- Interactive chat with document context
### YouTube Integration
- Process YouTube videos via URL
- Automatic caption/transcript extraction
- Content vectorization and storage
- Generate podcast discussions about video content
- Context-aware chat about video content
### Podcast Generation
- Dynamic conversation generation between Alex and Sarah
- Minimum 55 lines of detailed dialogue
- Structured discussion format:
- Introduction/Overview
- Main points analysis
- Critical discussion
- Real-world implications
- Personal perspectives
### Interactive Features
- Real-time audio playback
- Voice-enabled chat interface
- Context-aware responses
- PDF/Video content reference
- Semantic search capabilities
## 🛠 Tech Stack
### Frontend
- **Next.js 15.0.3** - React framework
- **React 19** - UI library
- **TailwindCSS** - Styling
- **TypeScript** - Type safety
- **React Icons** - Icon components
### Backend (API Routes)
- **Next.js API Routes** - Serverless functions
- **Pinecone** - Vector database
- **Google AI (Gemini)** - Embeddings generation
- **Groq** - LLM for conversation generation
- **LangChain** - Document processing
- **PDF Parse** - PDF text extraction
### AI/ML Components
- **Gemini Embedding Model** - Vector embeddings
- **Llama 3.2 90B** - Podcast generation
- **Mixtral 8x7B** - Chat responses
- **Web Speech API** - Voice interface
## 📦 Key Dependencies
```json
{
"@google/generative-ai": "^0.21.0",
"@langchain/community": "^0.3.14",
"@pinecone-database/pinecone": "^4.0.0",
"groq-sdk": "^0.8.0",
"langchain": "^0.3.5",
"next": "15.0.3"
}
```
## 🏗 Architecture
### Document Processing Flow
1. PDF/YouTube content upload
2. Content chunking and preprocessing
3. Embedding generation via Gemini AI
4. Vector storage in Pinecone
5. Podcast script generation via Groq
6. Interactive chat capabilities
### Data Flow
1. Content Ingestion → Chunking → Embedding → Storage
2. Query Processing → Semantic Search → Context Retrieval → Response Generation
3. Chat Interface → Voice Processing → Context-Aware Responses
## 🔧 Environment Setup
### 1. API Keys Required
#### Pinecone API Key
1. Visit [Pinecone Console](https://app.pinecone.io/)
2. Sign up or login to your account
3. Navigate to API Keys section
4. Create a new API key
5. Copy the key and environment
#### Google AI (Gemini) API Key
1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create or sign in to your Google Cloud account
3. Enable the Gemini API
4. Create a new API key
5. Copy the key
#### Groq API Key
1. Visit [Groq Console](https://console.groq.com/)
2. Create an account or sign in
3. Go to API section
4. Generate new API key
5. Copy the key
### 2. Environment Configuration
1. Clone the repository:
```bash
git clone https://github.com/yourusername/voicebook.git
cd voicebook
```
2. Copy the environment example file:
```bash
cp .env.example .env
```
3. Update the .env file with your API keys:
```env
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=your_index_name
PINECONE_ENVIRONMENT=your_environment
GOOGLE_API_KEY=your_google_api_key
GROQ_API_KEY=your_groq_api_key
```
### 3. Pinecone Index Setup
1. Create a new index in Pinecone console with:
- Dimensions: 768 (Gemini embeddings)
- Metric: Cosine
- Pod Type: s1.x1 (recommended)
2. Update your .env with the index name:
```env
PINECONE_INDEX_NAME=your-index-name
```
### 4. Development Setup
1. Install dependencies:
```bash
npm install
# or
yarn install
```
2. Run the development server:
```bash
npm run dev
# or
yarn dev
```
3. Build for production:
```bash
npm run build
npm start
# or
yarn build
yarn start
## 🎯 Key Components
### Content Processing
- PDF document chunking and embedding generation
- YouTube transcript extraction and processing
- Vector storage and retrieval
### Conversation Generation
- Structured podcast script generation
- Context-aware chat responses
- Voice interface integration
### User Interface
- Responsive design with TailwindCSS
- Audio playback controls
- Interactive chat interface
- Voice command support
## 📝 API Routes
### Main Endpoints
- `/api/upload` - PDF processing
- `/api/youtube` - YouTube video processing
- `/api/generate-podcast` - Podcast script generation
- `/api/chat` - Context-aware chat
- `/api/podcast-chat` - Podcast-specific chat
- `/api/podcast-yt-chat` - YouTube podcast chat
## 🔒 Security Considerations
- Environment variables for API keys
- Server-side processing of sensitive operations
- Rate limiting implementation
- Error handling and validation
## 🎨 UI/UX Features
- Clean, modern interface
- Responsive design
- Loading states and animations
- Error handling and user feedback
- Voice interaction capabilities
## 📚 Documentation References
- [Next.js Documentation](https://nextjs.org/docs)
- [Pinecone Documentation](https://docs.pinecone.io/)
- [Google AI Documentation](https://ai.google.dev/docs)
- [Groq Documentation](https://console.groq.com/docs)
- [LangChain Documentation](https://js.langchain.com/docs)
## 🤝 Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests for any enhancements.
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.