{"id":30495479,"url":"https://github.com/ginga1402/studybuddy-open-source-rag-notebooklm","last_synced_at":"2025-08-25T00:22:49.311Z","repository":{"id":306457610,"uuid":"1025092229","full_name":"Ginga1402/studybuddy-open-source-rag-notebooklm","owner":"Ginga1402","description":"StudyBuddy is an open source, RAG-based AI assistant that lets you query, summarize, and interact with your documents—built as a developer-friendly alternative to Google NotebookLM.","archived":false,"fork":false,"pushed_at":"2025-08-20T17:56:22.000Z","size":59,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-20T19:43:53.203Z","etag":null,"topics":["generative-ai","langchain","llm","python","rag","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ginga1402.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-23T17:54:56.000Z","updated_at":"2025-08-20T17:56:25.000Z","dependencies_parsed_at":"2025-07-30T08:37:50.536Z","dependency_job_id":null,"html_url":"https://github.com/Ginga1402/studybuddy-open-source-rag-notebooklm","commit_stats":null,"previous_names":["ginga1402/studybuddy","ginga1402/studybuddy-open-source-rag-notebooklm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Ginga1402/studybuddy-open-source-rag-notebooklm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ginga1402%2Fstudybuddy-open-source-rag-notebooklm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ginga1402%2Fstudybuddy-open-source-rag-notebooklm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ginga1402%2Fstudybuddy-open-source-rag-notebooklm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ginga1402%2Fstudybuddy-open-source-rag-notebooklm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ginga1402","download_url":"https://codeload.github.com/Ginga1402/studybuddy-open-source-rag-notebooklm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ginga1402%2Fstudybuddy-open-source-rag-notebooklm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271984326,"owners_count":24853813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-24T02:00:11.135Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["generative-ai","langchain","llm","python","rag","streamlit"],"created_at":"2025-08-25T00:22:47.960Z","updated_at":"2025-08-25T00:22:49.259Z","avatar_url":"https://github.com/Ginga1402.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📚 StudyBuddy – Open Source RAG-Based AI Notebook and Google NotebookLM Alternative\n\n**StudyBuddy is a free, open source alternative to Google NotebookLM that lets you chat with your documents, generate smart summaries, and organize knowledge with AI—all securely on your terms.**\n\n## 🎯 Project Description\n\nStudyBuddy is a comprehensive Retrieval-Augmented Generation (RAG) system that transforms static PDF documents into interactive learning experiences. Built with FastAPI and Streamlit, it leverages advanced AI technologies including HuggingFace embeddings, FAISS vector databases, and Large Language Models to create an intelligent document analysis platform.\n\n### Technical Architecture\n- **RAG Pipeline**: Implements state-of-the-art retrieval-augmented generation for accurate, context-aware responses\n- **Vector Database**: Uses FAISS for efficient similarity search and document retrieval\n- **Document Processing**: Converts PDFs to structured markdown using Docling for optimal text extraction\n- **Embeddings**: Utilizes BGE-small-en-v1.5 embeddings with CUDA acceleration for fast processing\n- **LLM Integration**: Powered by Gemma3:12b-it-q4_K_M via Ollama for high-quality text generation\n- **Topic Modeling**: Implements LDA (Latent Dirichlet Allocation) for intelligent topic discovery\n\n### Key Features\n- **Document Ingestion**: Upload and process multiple PDFs into searchable knowledge bases\n- **Interactive Q\u0026A**: Chat with your documents using natural language queries\n- **Smart Summaries**: Generate student-friendly summaries with key concepts and formulas\n- **Visual Diagrams**: Create ASCII flowcharts and concept maps\n- **Quiz Generation**: Automatically generate multiple-choice questions for self-assessment\n- **FAQ Creation**: Extract frequently asked questions from document content\n- **Topic Extraction**: Discover and organize important themes using AI-powered analysis\n- **Analytics Dashboard**: Monitor usage patterns and system performance\n\n## 📁 Project Structure\n\n```\nStudyBuddy/\n├── configuration.py          # System configuration and model setup\n├── FastAPI.py                # Main API server with all endpoints\n├── streamlit_ui_fixed.py     # Enhanced web interface\n├── ingestion.py              # PDF processing and vector store creation\n├── QA_Rag.py                 # Question-answering RAG implementation\n├── create_summary.py         # Document summarization module\n├── create_diagram.py         # ASCII diagram generation\n├── create_quiz.py            # Quiz generation system\n├── create_faq.py             # FAQ extraction module\n├── create_topics.py          # Topic modeling and extraction\n├── Sample_outputs/           # Example outputs and demonstrations\n├── Data/                     # PDF document storage directory\n└── vector_store/             # FAISS vector database storage\n```\n\n## 🚀 Use Cases\n\n### 📚 **Education \u0026 Learning**\n- **Students**: Convert textbooks into interactive study materials with summaries, quizzes, and Q\u0026A\n- **Teachers**: Create educational content and assessments from curriculum documents\n- **Researchers**: Analyze academic papers and extract key insights quickly\n\n### 💼 **Professional \u0026 Business**\n- **Knowledge Management**: Transform company documents into searchable knowledge bases\n- **Training Materials**: Create interactive training modules from policy documents\n- **Research \u0026 Development**: Analyze technical documentation and generate insights\n\n### 🏥 **Specialized Domains**\n- **Medical Education**: Process medical textbooks for case studies and exam preparation\n- **Legal Research**: Analyze legal documents and extract relevant precedents\n- **Technical Documentation**: Create interactive guides from complex technical manuals\n\n---\n\n## 🎬 Demo Video\n\nSee StudyBuddy in action! Watch our comprehensive demo showcasing all features:\n\n[![▶️ Watch Demo](https://img.shields.io/badge/▶️_Watch_Demo-4CAF50?style=for-the-badgen\u0026logoColor=white)](link-to-your-video-file.mp4)\n\n\n### 📹 What's Covered in the Demo:\n- **PDF Upload \u0026 Vector Store Creation**: See how easy it is to upload documents\n- **Interactive Q\u0026A**: Real-time question answering with source references\n- **Content Generation**: Summaries, quizzes, diagrams, and FAQs in action\n- **Analytics Dashboard**: Comprehensive usage tracking and metrics\n- **User Interface**: Complete walkthrough of the enhanced Streamlit UI\n\n\n---\n\n## 🛠️ Installation Instructions\n\n### Prerequisites\n- Python 3.9+\n- CUDA-compatible GPU (optional, for faster processing)\n- 8GB+ RAM recommended\n\n### Step 1: Clone Repository\n```bash\ngit clone https://github.com/Ginga1402/StudyBuddy–OpenSource-NotebookLM-RAG-Based-AI.git\ncd studybuddy\n```\n\n### Step 2: Install Dependencies\n```bash\npip install -r requirements.txt\n```\n\n\n### Step 3: Set Up Ollama (LLM Backend)\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Pull the required model\nollama pull gemma3:12b-it-q4_K_M\n```\n\n### Step 4: Configure Paths\nUpdate the paths in `configuration.py` to match your system:\n```python\nDIRECTORY_PATH = \"your/path/to/Data\"\nVECTORSORE_PATH = \"your/path/to/vector_store\"\n```\n\n## 📖 Usage\n\n### Starting the Application\n\n1. **Start the FastAPI Server**:\n```bash\npython FastAPI.py\n```\nThe API will be available at `http://localhost:9000`\n\n2. **Launch the Streamlit Interface**:\n```bash\nstreamlit run streamlit_app.py\n```\nThe web interface will open at `http://localhost:8501`\n\n### Basic Workflow\n\n1. **Create Vector Store**: Upload PDF files and create a searchable knowledge base\n2. **Generate Content**: Use various features to create summaries, quizzes, diagrams, and FAQs\n3. **Interactive Q\u0026A**: Ask questions about your documents and get contextual answers\n4. **Download Results**: Save generated content as text files for offline use\n5. **Monitor Analytics**: Track usage patterns and system performance\n\n### API Endpoints\n\n- `POST /create-vectorstore/` - Create vector store from PDFs\n- `POST /QA-Guide/` - Question-answering with RAG\n- `POST /generate-summary/` - Generate document summaries\n- `POST /generate-diagram/` - Create ASCII diagrams\n- `POST /generate-quiz/` - Generate multiple-choice quizzes\n- `POST /generate-FAQ/` - Create frequently asked questions\n- `POST /generate-important-topics/` - Extract key topics\n- `GET /heartbeat` - Health check endpoint\n- `GET /metrics` - System usage metrics\n\n## 🔧 Technologies Used\n\n| Technology | Description | Link |\n|------------|-------------|------|\n| **FastAPI** | High-performance web framework for building APIs | [fastapi.tiangolo.com](https://fastapi.tiangolo.com/) |\n| **Streamlit** | Framework for creating interactive web applications | [streamlit.io](https://streamlit.io/) |\n| **LangChain** | Framework for developing LLM-powered applications | [langchain.com](https://www.langchain.com/) |\n| **FAISS** | Library for efficient similarity search and clustering | [github.com/facebookresearch/faiss](https://github.com/facebookresearch/faiss) |\n| **HuggingFace Transformers** | State-of-the-art ML models and embeddings | [huggingface.co](https://huggingface.co/) |\n| **Ollama** | Local LLM runtime for running language models | [ollama.ai](https://ollama.ai/) |\n| **Docling** | Advanced document processing and conversion | [github.com/DS4SD/docling](https://github.com/DS4SD/docling) |\n| **NLTK** | Natural language processing toolkit | [nltk.org](https://www.nltk.org/) |\n| **Gensim** | Topic modeling and document similarity analysis | [radimrehurek.com/gensim](https://radimrehurek.com/gensim/) |\n| **PyTorch** | Deep learning framework with CUDA support | [pytorch.org](https://pytorch.org/) |\n| **Pandas** | Data manipulation and analysis library | [pandas.pydata.org](https://pandas.pydata.org/) |\n| **Pydantic** | Data validation using Python type annotations | [pydantic.dev](https://pydantic.dev/) |\n\n## 🤝 Contributing\n\nContributions to this project are welcome! If you have ideas for improvements, bug fixes, or new features, feel free to open an issue or submit a pull request.\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n---\n\n## 🌟 Star History\n\nIf you find StudyBuddy useful, please consider giving it a star ⭐ on GitHub!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fginga1402%2Fstudybuddy-open-source-rag-notebooklm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fginga1402%2Fstudybuddy-open-source-rag-notebooklm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fginga1402%2Fstudybuddy-open-source-rag-notebooklm/lists"}