{"id":21205956,"url":"https://github.com/samarthpandeydev/voicebook","last_synced_at":"2025-10-25T09:37:43.931Z","repository":{"id":263691936,"uuid":"891194919","full_name":"samarthpandeydev/voicebook","owner":"samarthpandeydev","description":"Voicebook transforms PDF documents and YouTube videos into engaging podcast-style conversations between two AI personas, Alex and Sarah. Powered by cutting-edge AI, it combines content processing, embedding generation, and natural language understanding to create dynamic dialogues and interactive chat experiences.","archived":false,"fork":false,"pushed_at":"2024-11-20T07:21:01.000Z","size":252,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-31T14:58:09.494Z","etag":null,"topics":["gemini","groq","groq-api","javascript","notebooklm","pinecone","pineconedb","typescript","vercel"],"latest_commit_sha":null,"homepage":"https://voicebookkk.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samarthpandeydev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-19T22:32:14.000Z","updated_at":"2025-02-02T10:41:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"6ce47c52-0353-4e8a-8b86-051de15dda93","html_url":"https://github.com/samarthpandeydev/voicebook","commit_stats":null,"previous_names":["samarthpandeydev/voicebook"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samarthpandeydev%2Fvoicebook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samarthpandeydev%2Fvoicebook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samarthpandeydev%2Fvoicebook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samarthpandeydev%2Fvoicebook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samarthpandeydev","download_url":"https://codeload.github.com/samarthpandeydev/voicebook/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252964448,"owners_count":21832695,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gemini","groq","groq-api","javascript","notebooklm","pinecone","pineconedb","typescript","vercel"],"created_at":"2024-11-20T20:53:50.365Z","updated_at":"2025-10-25T09:37:43.872Z","avatar_url":"https://github.com/samarthpandeydev.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Voicebook - PDF \u0026 YouTube to Podcast Converter\n\nVoicebook is a sophisticated web application that transforms PDF documents and YouTube videos into engaging podcast-style conversations between two AI personas (Alex and Sarah). The application leverages advanced AI technologies for content processing, embedding generation, and natural language understanding.\n\n## 🚀 Features\n\n### PDF Processing\n- Upload and process PDF documents\n- Automatic content chunking and embedding\n- Vector storage in Pinecone database\n- Generate AI-powered podcast conversations\n- Interactive chat with document context\n\n### YouTube Integration\n- Process YouTube videos via URL\n- Automatic caption/transcript extraction\n- Content vectorization and storage\n- Generate podcast discussions about video content\n- Context-aware chat about video content\n\n### Podcast Generation\n- Dynamic conversation generation between Alex and Sarah\n- Minimum 55 lines of detailed dialogue\n- Structured discussion format:\n  - Introduction/Overview\n  - Main points analysis\n  - Critical discussion\n  - Real-world implications\n  - Personal perspectives\n\n### Interactive Features\n- Real-time audio playback\n- Voice-enabled chat interface\n- Context-aware responses\n- PDF/Video content reference\n- Semantic search capabilities\n\n## 🛠 Tech Stack\n\n### Frontend\n- **Next.js 15.0.3** - React framework\n- **React 19** - UI library\n- **TailwindCSS** - Styling\n- **TypeScript** - Type safety\n- **React Icons** - Icon components\n\n### Backend (API Routes)\n- **Next.js API Routes** - Serverless functions\n- **Pinecone** - Vector database\n- **Google AI (Gemini)** - Embeddings generation\n- **Groq** - LLM for conversation generation\n- **LangChain** - Document processing\n- **PDF Parse** - PDF text extraction\n\n### AI/ML Components\n- **Gemini Embedding Model** - Vector embeddings\n- **Llama 3.2 90B** - Podcast generation\n- **Mixtral 8x7B** - Chat responses\n- **Web Speech API** - Voice interface\n\n## 📦 Key Dependencies\n```json\n{\n  \"@google/generative-ai\": \"^0.21.0\",\n  \"@langchain/community\": \"^0.3.14\",\n  \"@pinecone-database/pinecone\": \"^4.0.0\",\n  \"groq-sdk\": \"^0.8.0\",\n  \"langchain\": \"^0.3.5\",\n  \"next\": \"15.0.3\"\n}\n```\n\n## 🏗 Architecture\n\n### Document Processing Flow\n1. PDF/YouTube content upload\n2. Content chunking and preprocessing\n3. Embedding generation via Gemini AI\n4. Vector storage in Pinecone\n5. Podcast script generation via Groq\n6. Interactive chat capabilities\n\n### Data Flow\n1. Content Ingestion → Chunking → Embedding → Storage\n2. Query Processing → Semantic Search → Context Retrieval → Response Generation\n3. Chat Interface → Voice Processing → Context-Aware Responses\n\n## 🔧 Environment Setup\n\n### 1. API Keys Required\n\n#### Pinecone API Key\n1. Visit [Pinecone Console](https://app.pinecone.io/)\n2. Sign up or login to your account\n3. Navigate to API Keys section\n4. Create a new API key\n5. Copy the key and environment\n\n#### Google AI (Gemini) API Key\n1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)\n2. Create or sign in to your Google Cloud account\n3. Enable the Gemini API\n4. Create a new API key\n5. Copy the key\n\n#### Groq API Key\n1. Visit [Groq Console](https://console.groq.com/)\n2. Create an account or sign in\n3. Go to API section\n4. Generate new API key\n5. Copy the key\n\n### 2. Environment Configuration\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/yourusername/voicebook.git\ncd voicebook\n```\n\n2. Copy the environment example file:\n```bash\ncp .env.example .env\n```\n\n3. Update the .env file with your API keys:\n```env\nPINECONE_API_KEY=your_pinecone_api_key\nPINECONE_INDEX_NAME=your_index_name\nPINECONE_ENVIRONMENT=your_environment\nGOOGLE_API_KEY=your_google_api_key\nGROQ_API_KEY=your_groq_api_key\n```\n\n### 3. Pinecone Index Setup\n\n1. Create a new index in Pinecone console with:\n   - Dimensions: 768 (Gemini embeddings)\n   - Metric: Cosine\n   - Pod Type: s1.x1 (recommended)\n\n2. Update your .env with the index name:\n```env\nPINECONE_INDEX_NAME=your-index-name\n```\n\n### 4. Development Setup\n\n1. Install dependencies:\n```bash\nnpm install\n# or\nyarn install\n```\n\n2. Run the development server:\n```bash\nnpm run dev\n# or\nyarn dev\n```\n\n3. Build for production:\n```bash\nnpm run build\nnpm start\n# or\nyarn build\nyarn start\n\n\n## 🎯 Key Components\n\n### Content Processing\n- PDF document chunking and embedding generation\n- YouTube transcript extraction and processing\n- Vector storage and retrieval\n\n### Conversation Generation\n- Structured podcast script generation\n- Context-aware chat responses\n- Voice interface integration\n\n### User Interface\n- Responsive design with TailwindCSS\n- Audio playback controls\n- Interactive chat interface\n- Voice command support\n\n## 📝 API Routes\n\n### Main Endpoints\n- `/api/upload` - PDF processing\n- `/api/youtube` - YouTube video processing\n- `/api/generate-podcast` - Podcast script generation\n- `/api/chat` - Context-aware chat\n- `/api/podcast-chat` - Podcast-specific chat\n- `/api/podcast-yt-chat` - YouTube podcast chat\n\n## 🔒 Security Considerations\n\n- Environment variables for API keys\n- Server-side processing of sensitive operations\n- Rate limiting implementation\n- Error handling and validation\n\n## 🎨 UI/UX Features\n\n- Clean, modern interface\n- Responsive design\n- Loading states and animations\n- Error handling and user feedback\n- Voice interaction capabilities\n\n## 📚 Documentation References\n\n- [Next.js Documentation](https://nextjs.org/docs)\n- [Pinecone Documentation](https://docs.pinecone.io/)\n- [Google AI Documentation](https://ai.google.dev/docs)\n- [Groq Documentation](https://console.groq.com/docs)\n- [LangChain Documentation](https://js.langchain.com/docs)\n\n## 🤝 Contributing\n\nContributions are welcome! Please read our contributing guidelines and submit pull requests for any enhancements.\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamarthpandeydev%2Fvoicebook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamarthpandeydev%2Fvoicebook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamarthpandeydev%2Fvoicebook/lists"}