https://github.com/kazuph/mcp-qdrant-docs
  
  
    An MCP server that scrapes websites, indexes content into Qdrant, and provides a query tool. 
    https://github.com/kazuph/mcp-qdrant-docs
  
        Last synced: 7 months ago 
        JSON representation
    
An MCP server that scrapes websites, indexes content into Qdrant, and provides a query tool.
- Host: GitHub
 - URL: https://github.com/kazuph/mcp-qdrant-docs
 - Owner: kazuph
 - Created: 2025-04-08T06:59:09.000Z (7 months ago)
 - Default Branch: main
 - Last Pushed: 2025-04-09T03:29:43.000Z (7 months ago)
 - Last Synced: 2025-04-14T14:17:13.596Z (7 months ago)
 - Language: TypeScript
 - Size: 186 KB
 - Stars: 0
 - Watchers: 1
 - Forks: 0
 - Open Issues: 0
 - 
            Metadata Files:
            
- Readme: README.md
 
 
Awesome Lists containing this project
- awesome-mcp-servers - **mcp-qdrant-docs** - An MCP server that scrapes websites, indexes content into Qdrant, and provides a query tool. `typescript` `mcp` `server` `web` `npm install kazuph/mcp-qdrant-docs` (🌐 Web Development)
 
README
          # mcp-qdrant-docs MCP Server
A Model Context Protocol server
This is a TypeScript-based MCP server that scrapes website content, indexes it into a Qdrant vector database, and provides a tool to answer questions about the indexed content.
## 使用例
```
ask_hono_docs: Hono.devのドキュメント内容について質問できます
ask_reactrouter_docs: ReactRouter.comのドキュメント内容について質問できます
ask_gradio_docs: Gradio.appのLLMsドキュメントについて質問できます
```
## Features
### Tool: `ask__docs`
- **Name:** Dynamically generated based on the `DOCS_URL` or `--start-url` (e.g., `ask_reactrouter_docs`).
- **Functionality:** Allows users to ask natural language questions about the content scraped from the specified website.
- **Process:**
    1. On startup (or if `force-reindex` is specified), the server scrapes the target website.
    2. The scraped content is processed, chunked, and embedded using a sentence transformer model.
    3. These embeddings and content chunks are stored in a Qdrant collection specific to the website.
    4. When the tool is called with a query, the server embeds the query, searches Qdrant for relevant chunks, and returns the found content as the answer.
- **Input:** A natural language query (string).
- **Output:** Text containing the relevant content chunks found in the Qdrant index.
## Installation
To use with Claude Desktop, add the server config:
On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
```
npm i -g @kazuph/mcp-qdrant-docs
```
### Using `npx`
The recommended way to run this server is using `npx` within your MCP client configuration (e.g., Claude Desktop's `claude_desktop_config.json`). This avoids the need for global installation.
**Example `claude_desktop_config.json` using `npx`:**
```json
{
  "mcpServers": {
    "react-router-docs": { // A unique name for this server instance
      "command": "npx",
      "args": [
        "@kazuph/mcp-qdrant-docs", // The command registered in package.json bin
        // Optional: Add command-line arguments here if needed
        // "--start-url", "https://some-default-url.com/",
        // "--debug"
      ],
      // Optional: Set environment variables for configuration
      "env": {
        "DOCS_URL": "https://reactrouter.com/",
        "QDRANT_URL": "http://your-qdrant-instance:6333",
        "COLLECTION_NAME": "react-router-docs", // Base name for the collection
        "EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
        // "DEBUG": "true" // Alternative way to enable debug logging
      }
    }
    // You can add more server instances for different documentation sites here
  }
}
```
## Command-Line Options
When running the server directly (e.g., using `npx mcp-qdrant-docs` or `npm run dev --`), you can use the following command-line options. These options override corresponding environment variables if both are set.
-   `--start-url ` or `-s `:
    -   **Required (if `DOCS_URL` env var is not set).**
    -   The starting URL of the website to scrape.
    -   Overrides the `DOCS_URL` environment variable.
-   `--limit ` or `-l `:
    -   Maximum number of pages to scrape.
    -   Default: `300`.
-   `--match ` or `-m `:
    -   URL path patterns (prefix match) to limit scraping. Can be specified multiple times.
    -   Example: `--match /docs/ --match /api/`
    -   Default: Scrapes all pages under the `start-url` domain.
-   `--force-reindex`:
    -   Force re-scraping and re-indexing even if the Qdrant collection already exists.
    -   Default: `false`.
-   `--collection-name ` or `-c `:
    -   Base name for the Qdrant collection. The final collection name will be `-`.
    -   Overrides the `COLLECTION_NAME` environment variable.
    -   Default: `docs-collection`.
-   `--qdrant-url `:
    -   URL of the Qdrant instance.
    -   Overrides the `QDRANT_URL` environment variable.
    -   Default: `http://localhost:6333`.
-   `--embedding-model `:
    -   Name of the sentence transformer model to use for embeddings (from Hugging Face or local).
    -   Overrides the `EMBEDDING_MODEL` environment variable.
    -   Default: `Xenova/all-MiniLM-L6-v2`.
-   `--debug`:
    -   Enable detailed debug logging.
    -   Overrides the `DEBUG` environment variable (if set to `true`).
    -   Default: `false`.
-   `--help` or `-h`:
    -   Show the help message listing all options.
**Example using command-line options:**
```bash
npx @kazuph/mcp-qdrant-docs --start-url https://example-docs.com/ --collection-name my-docs --limit 50 --debug
```
**Configuration Priority:**
The server uses the following priority for settings:
1.  **Command-line arguments:** (e.g., `--start-url`, `--collection-name`) - Highest priority.
2.  **Environment variables:** (e.g., `DOCS_URL`, `COLLECTION_NAME`) - Used if command-line arguments are not provided.
3.  **Default values:** (Defined within the code) - Lowest priority.
## Example: Adding React Router Documentation
To add a server instance specifically for querying React Router documentation, add the following entry to your `mcpServers` configuration (e.g., in `claude_desktop_config.json`):
```json
{
  "mcpServers": {
    // ... other servers ...
    "react-router-docs": {
      "command": "npx", // Or the direct path if not installed globally
      "args": [
        "@kazuph/mcp-qdrant-docs"
        // No need to specify --start-url etc. if using env vars
      ],
      "env": {
        "DOCS_URL": "https://reactrouter.com/",
        "QDRANT_URL": "http://your-qdrant-instance:6333", // Replace with your Qdrant URL
        "COLLECTION_NAME": "react-router-docs", // Base name, will become 'react-router-docs-reactrouter_com'
        "EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2" // Or your preferred model
        // "DEBUG": "true" // Enable debug logs if needed
      }
    }
    // ... other servers ...
  }
}
```
**Resulting Tool:**
Once this server instance is running and connected to your MCP client, it will provide a tool named similar to `ask_reactrouter_docs` (not `ask_reactrouter_com_docs`).
-   **Tool Name:** `ask__docs` (e.g., `ask_reactrouter_docs`)
-   **Description:** Ask a question about the content of the site specified by `DOCS_URL` (or `--start-url`).
-   **Input:** A natural language query about the documentation.
The server will automatically scrape the site (if the collection doesn't exist or `--force-reindex` is used), index the content into the specified Qdrant collection (`react-router-docs-reactrouter_com` in this example), and then use the index to answer your queries via the provided tool (`ask_reactrouter_docs`).