{"id":27210741,"url":"https://github.com/modsetter/surfsense","last_synced_at":"2026-04-08T04:02:11.597Z","repository":{"id":252779714,"uuid":"835975784","full_name":"MODSetter/SurfSense","owner":"MODSetter","description":"An open source, privacy focused alternative to NotebookLM for teams with no data limit's. Join our Discord: https://discord.gg/ejRNvftDp9","archived":false,"fork":false,"pushed_at":"2026-04-03T03:40:19.000Z","size":157230,"stargazers_count":13641,"open_issues_count":97,"forks_count":1247,"subscribers_count":80,"default_branch":"main","last_synced_at":"2026-04-03T11:53:25.272Z","etag":null,"topics":["agent","agents","ai","chrome-extension","extension","fastapi","langchain","langgraph","nextjs","notebooklm","ollama","perplexity","python","rag","typescript"],"latest_commit_sha":null,"homepage":"https://www.surfsense.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"xpertdev/SurfSense","license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MODSetter.png","metadata":{"files":{"readme":"README.es.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"MODSetter"}},"created_at":"2024-07-30T23:00:09.000Z","updated_at":"2026-04-03T11:30:33.000Z","dependencies_parsed_at":"2024-10-06T12:02:31.234Z","dependency_job_id":"385e3ce6-de9d-47f2-968a-97272aff87d2","html_url":"https://github.com/MODSetter/SurfSense","commit_stats":null,"previous_names":["modsetter/surfsense"],"tags_count":187,"template":false,"template_full_name":"cording12/next-fast-turbo","purl":"pkg:github/MODSetter/SurfSense","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MODSetter%2FSurfSense","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MODSetter%2FSurfSense/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MODSetter%2FSurfSense/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MODSetter%2FSurfSense/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MODSetter","download_url":"https://codeload.github.com/MODSetter/SurfSense/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MODSetter%2FSurfSense/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31539229,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T16:28:08.000Z","status":"online","status_checked_at":"2026-04-08T02:00:06.127Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","agents","ai","chrome-extension","extension","fastapi","langchain","langgraph","nextjs","notebooklm","ollama","perplexity","python","rag","typescript"],"created_at":"2025-04-10T01:26:56.627Z","updated_at":"2026-04-08T04:02:11.581Z","avatar_url":"https://github.com/MODSetter.png","language":"Python","readme":"\n![new_header](https://github.com/user-attachments/assets/e236b764-0ddc-42ff-a1f1-8fbb3d2e0e65)\n\n\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://discord.gg/ejRNvftDp9\"\u003e\n\u003cimg src=\"https://img.shields.io/discord/1359368468260192417\" alt=\"Discord\"\u003e\n\u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[English](README.md) | [简体中文](README.zh-CN.md)\n\n\u003c/div\u003e\n\n# SurfSense\nConnect any LLM to your internal knowledge sources and chat with it in real time alongside your team. OSS alternative to NotebookLM, Perplexity, and Glean.\n\nSurfSense is a highly customizable AI research agent, connected to external sources such as Search Engines (SearxNG, Tavily, LinkUp), Google Drive, Slack, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Circleback, Elasticsearch and more to come.\n\n\u003cdiv align=\"center\"\u003e\n\u003ca href=\"https://trendshift.io/repositories/13606\" target=\"_blank\"\u003e\u003cimg src=\"https://trendshift.io/api/badge/repositories/13606\" alt=\"MODSetter%2FSurfSense | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"/\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\n# Video \n\nhttps://github.com/user-attachments/assets/42a29ea1-d4d8-4213-9c69-972b5b806d58\n\n\n\n## Podcast Sample\n\nhttps://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7\n\n\n\n\n## Key Features\n\n### 💡 **Idea**: \n- Open source alternative to NotebookLM, Perplexity, and Glean. Connect any LLM to your internal knowledge sources and collaborate with your team in real time.\n### 📁 **Multiple File Format Uploading Support**\n- Save content from your own personal files *(Documents, images, videos and supports **50+ file extensions**)* to your own personal knowledge base .\n### 🔍 **Powerful Search**\n- Quickly research or find anything in your saved content .\n### 💬 **Chat with your Saved Content**\n- Interact in Natural Language and get cited answers.\n### 📄 **Cited Answers**\n- Get Cited answers just like Perplexity.\n### 🔔 **Privacy \u0026 Local LLM Support**\n- Works Flawlessly with Ollama local LLMs.\n### 🏠 **Self Hostable**\n- Open source and easy to deploy locally.\n### 👥 **Team Collaboration with RBAC**\n- Role-Based Access Control for Search Spaces\n- Invite team members with customizable roles (Owner, Admin, Editor, Viewer)\n- Granular permissions for documents, chats, connectors, and settings\n- Share knowledge bases securely within your organization\n### 🎙️ Podcasts \n- Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.)\n- Convert your chat conversations into engaging audio content\n- Support for local TTS providers (Kokoro TTS)\n- Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI)\n\n### 🤖 **Deep Agent Architecture**\n\n#### Built-in Agent Tools\n| Tool | Description |\n|------|-------------|\n| **search_knowledge_base** | Search your personal knowledge base with semantic + full-text hybrid search, date filtering, and connector-specific queries |\n| **generate_podcast** | Generate audio podcasts from chat conversations or knowledge base content |\n| **link_preview** | Fetch rich Open Graph metadata for URLs to display preview cards |\n| **display_image** | Display images in chat with metadata and source attribution |\n| **scrape_webpage** | Extract full content from webpages for analysis and summarization (supports Firecrawl or local Chromium/Trafilatura) |\n\n#### Extensible Tools Registry\nContributors can easily add new tools via the registry pattern:\n1. Create a tool factory function in `surfsense_backend/app/agents/new_chat/tools/`\n2. Register it in the `BUILTIN_TOOLS` list in `registry.py`\n\n#### Configurable System Prompts\n- Custom system instructions via LLM configuration\n- Toggle citations on/off per configuration\n- Supports 100+ LLMs via LiteLLM integration\n\n### 📊 **Advanced RAG Techniques**\n- Supports 100+ LLM's\n- Supports 6000+ Embedding Models.\n- Supports all major Rerankers (Pinecone, Cohere, Flashrank etc)\n- Uses Hierarchical Indices (2 tiered RAG setup).\n- Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion).\n\n### ℹ️ **External Sources**\n- Search Engines (Tavily, LinkUp)\n- SearxNG (self-hosted instances)\n- Google Drive\n- Slack\n- Linear\n- Jira\n- ClickUp\n- Confluence\n- BookStack\n- Notion\n- Gmail\n- Youtube Videos\n- GitHub\n- Discord\n- Airtable\n- Google Calendar\n- Luma\n- Circleback\n- Elasticsearch\n- and more to come.....\n\n## 📄 **Supported File Extensions**\n\n| ETL Service | Formats | Notes |\n|-------------|---------|-------|\n| **LlamaCloud** | 50+ formats | Documents, presentations, spreadsheets, images |\n| **Unstructured** | 34+ formats | Core formats + email support |\n| **Docling** | Core formats | Local processing, no API key required |\n\n**Audio/Video** (via STT Service): `.mp3`, `.wav`, `.mp4`, `.webm`, etc.\n\n### 🔖 Cross Browser Extension\n- The SurfSense extension can be used to save any webpage you like.\n- Its main usecase is to save any webpages protected beyond authentication.\n\n\n\n## FEATURE REQUESTS AND FUTURE\n\n\n**SurfSense is actively being developed.** While it's not yet production-ready, you can help us speed up the process.\n\nJoin the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the future of SurfSense!\n\n## 🚀 Roadmap\n\nStay up to date with our development progress and upcoming features!  \nCheck out our public roadmap and contribute your ideas or feedback:\n\n**📋 Roadmap Discussion:** [SurfSense 2025-2026 Roadmap: Deep Agents, Real-Time Collaboration \u0026 MCP Servers](https://github.com/MODSetter/SurfSense/discussions/565)\n\n**📊 Kanban Board:** [SurfSense Project Board](https://github.com/users/MODSetter/projects/3)\n\n\n## How to get started?\n\n### Quick Start with Docker 🐳\n\n\u003e [!TIP]\n\u003e For production deployments, use the full [Docker Compose setup](https://www.surfsense.com/docs/docker-installation) which offers more control and scalability.\n\n**Linux/macOS:**\n\n```bash\ndocker run -d -p 3000:3000 -p 8000:8000 \\\n  -v surfsense-data:/data \\\n  --name surfsense \\\n  --restart unless-stopped \\\n  ghcr.io/modsetter/surfsense:latest\n```\n\n**Windows (PowerShell):**\n\n```powershell\ndocker run -d -p 3000:3000 -p 8000:8000 `\n  -v surfsense-data:/data `\n  --name surfsense `\n  --restart unless-stopped `\n  ghcr.io/modsetter/surfsense:latest\n```\n\n**With Custom Configuration:**\n\nYou can pass any environment variable using `-e` flags:\n\n```bash\ndocker run -d -p 3000:3000 -p 8000:8000 \\\n  -v surfsense-data:/data \\\n  -e EMBEDDING_MODEL=openai://text-embedding-ada-002 \\\n  -e OPENAI_API_KEY=your_openai_api_key \\\n  -e AUTH_TYPE=GOOGLE \\\n  -e GOOGLE_OAUTH_CLIENT_ID=your_google_client_id \\\n  -e GOOGLE_OAUTH_CLIENT_SECRET=your_google_client_secret \\\n  -e ETL_SERVICE=LLAMACLOUD \\\n  -e LLAMA_CLOUD_API_KEY=your_llama_cloud_key \\\n  --name surfsense \\\n  --restart unless-stopped \\\n  ghcr.io/modsetter/surfsense:latest\n```\n\n\u003e [!NOTE]\n\u003e - If deploying behind a reverse proxy with HTTPS, add `-e BACKEND_URL=https://api.yourdomain.com`\n\nAfter starting, access SurfSense at:\n- **Frontend**: [http://localhost:3000](http://localhost:3000)\n- **Backend API**: [http://localhost:8000](http://localhost:8000)\n- **API Docs**: [http://localhost:8000/docs](http://localhost:8000/docs)\n\n**Useful Commands:**\n\n```bash\ndocker logs -f surfsense      # View logs\ndocker stop surfsense         # Stop\ndocker start surfsense        # Start\ndocker rm surfsense           # Remove (data preserved in volume)\n```\n\n### Installation Options\n\nSurfSense provides multiple options to get started:\n\n1. **[SurfSense Cloud](https://www.surfsense.com/login)** - The easiest way to try SurfSense without any setup.\n   - No installation required\n   - Instant access to all features\n   - Perfect for getting started quickly\n\n2. **Quick Start Docker (Above)** - Single command to get SurfSense running locally.\n   - All-in-one image with PostgreSQL, Redis, and all services bundled\n   - Perfect for evaluation, development, and small deployments\n   - Data persisted via Docker volume\n\n3. **[Docker Compose (Production)](https://www.surfsense.com/docs/docker-installation)** - Full stack deployment with separate services.\n   - Includes pgAdmin for database management through a web UI\n   - Supports environment variable customization via `.env` file\n   - Flexible deployment options (full stack or core services only)\n   - Better for production with separate scaling of services\n\n4. **[Manual Installation](https://www.surfsense.com/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment.\n\nDocker and manual installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.\n\nBefore self-hosting installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.com/docs/) including:\n- Auth setup (optional - defaults to LOCAL auth)\n- **File Processing ETL Service** (optional - defaults to Docling):\n  - Docling (default, local processing, no API key required, supports PDF, Office docs, images, HTML, CSV)\n  - Unstructured.io API key (supports 34+ formats)\n  - LlamaIndex API key (enhanced parsing, supports 50+ formats)\n- Other API keys as needed for your use case\n\n\n\n## Tech Stack\n\n\n ### **BackEnd** \n\n-  **FastAPI**: Modern, fast web framework for building APIs with Python\n  \n-  **PostgreSQL with pgvector**: Database with vector search capabilities for similarity searches\n\n-  **SQLAlchemy**: SQL toolkit and ORM (Object-Relational Mapping) for database interactions\n\n-  **Alembic**: A database migrations tool for SQLAlchemy.\n\n-  **FastAPI Users**: Authentication and user management with JWT and OAuth support\n\n-  **Deep Agents**: Custom agent framework built on LangGraph for reasoning and acting AI agents with configurable tools\n\n-  **LangGraph**: Framework for developing stateful AI agents with conversation persistence\n\n-  **LangChain**: Framework for developing AI-powered applications.\n\n-  **LiteLLM**: Universal LLM integration supporting 100+ models (OpenAI, Anthropic, Ollama, etc.)\n\n-  **Rerankers**: Advanced result ranking for improved search relevance\n\n-  **Hybrid Search**: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)\n\n-  **Vector Embeddings**: Document and text embeddings for semantic search\n\n-  **pgvector**: PostgreSQL extension for efficient vector similarity operations\n\n-  **Redis**: In-memory data structure store used as message broker and result backend for Celery\n\n-  **Celery**: Distributed task queue for handling asynchronous background jobs (document processing, podcast generation, etc.)\n\n-  **Flower**: Real-time monitoring and administration tool for Celery task queues\n\n-  **Chonkie**: Advanced document chunking and embedding library\n\n  \n---\n ### **FrontEnd**\n\n-  **Next.js**: React framework featuring App Router, server components, automatic code-splitting, and optimized rendering.\n\n-  **React**: JavaScript library for building user interfaces.\n\n-  **TypeScript**: Static type-checking for JavaScript, enhancing code quality and developer experience.\n\n- **Vercel AI SDK Kit UI Stream Protocol**: To create scalable chat UI.\n\n-  **Tailwind CSS**: Utility-first CSS framework for building custom UI designs.\n\n-  **Shadcn**: Headless components library.\n\n-  **Motion (Framer Motion)**: Animation library for React.\n\n\n\n ### **DevOps**\n\n-  **Docker**: Container platform for consistent deployment across environments\n  \n-  **Docker Compose**: Tool for defining and running multi-container Docker applications\n\n-  **pgAdmin**: Web-based PostgreSQL administration tool included in Docker setup\n\n\n### **Extension** \n Manifest v3 on Plasmo\n\n\n## Contribute \n\nContributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues.\nFine-tuning the Backend is always desired.\n\n### Adding New Agent Tools\n\nWant to add a new tool to the SurfSense agent? It's easy:\n\n1. Create your tool file in `surfsense_backend/app/agents/new_chat/tools/my_tool.py`\n2. Register it in `registry.py`:\n\n```python\nToolDefinition(\n    name=\"my_tool\",\n    description=\"What my tool does\",\n    factory=lambda deps: create_my_tool(\n        search_space_id=deps[\"search_space_id\"],\n        db_session=deps[\"db_session\"],\n    ),\n    requires=[\"search_space_id\", \"db_session\"],\n),\n```\n\nFor detailed contribution guidelines, please see our [CONTRIBUTING.md](CONTRIBUTING.md) file.\n\n## Star History\n\n\u003ca href=\"https://www.star-history.com/#MODSetter/SurfSense\u0026Date\"\u003e\n \u003cpicture\u003e\n   \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://api.star-history.com/svg?repos=MODSetter/SurfSense\u0026type=Date\u0026theme=dark\" /\u003e\n   \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://api.star-history.com/svg?repos=MODSetter/SurfSense\u0026type=Date\" /\u003e\n   \u003cimg alt=\"Star History Chart\" src=\"https://api.star-history.com/svg?repos=MODSetter/SurfSense\u0026type=Date\" /\u003e\n \u003c/picture\u003e\n\u003c/a\u003e\n\n---\n---\n\u003cp align=\"center\"\u003e\n    \u003cimg \n      src=\"https://github.com/user-attachments/assets/329c9bc2-6005-4aed-a629-700b5ae296b4\" \n      alt=\"Catalyst Project\" \n      width=\"200\"\n    /\u003e\n\u003c/p\u003e\n\n---\n---\n","funding_links":["https://github.com/sponsors/MODSetter"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodsetter%2Fsurfsense","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodsetter%2Fsurfsense","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodsetter%2Fsurfsense/lists"}