An open API service indexing awesome lists of open source software.

https://github.com/kazuph/mcp-qdrant-docs

An MCP server that scrapes websites, indexes content into Qdrant, and provides a query tool.
https://github.com/kazuph/mcp-qdrant-docs

Last synced: 7 months ago
JSON representation

An MCP server that scrapes websites, indexes content into Qdrant, and provides a query tool.

Awesome Lists containing this project

README

          

# mcp-qdrant-docs MCP Server

A Model Context Protocol server

This is a TypeScript-based MCP server that scrapes website content, indexes it into a Qdrant vector database, and provides a tool to answer questions about the indexed content.

## 使用例

```
ask_hono_docs: Hono.devのドキュメント内容について質問できます
ask_reactrouter_docs: ReactRouter.comのドキュメント内容について質問できます
ask_gradio_docs: Gradio.appのLLMsドキュメントについて質問できます
```

## Features

### Tool: `ask__docs`
- **Name:** Dynamically generated based on the `DOCS_URL` or `--start-url` (e.g., `ask_reactrouter_docs`).
- **Functionality:** Allows users to ask natural language questions about the content scraped from the specified website.
- **Process:**
1. On startup (or if `force-reindex` is specified), the server scrapes the target website.
2. The scraped content is processed, chunked, and embedded using a sentence transformer model.
3. These embeddings and content chunks are stored in a Qdrant collection specific to the website.
4. When the tool is called with a query, the server embeds the query, searches Qdrant for relevant chunks, and returns the found content as the answer.
- **Input:** A natural language query (string).
- **Output:** Text containing the relevant content chunks found in the Qdrant index.

## Installation

To use with Claude Desktop, add the server config:

On MacOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
On Windows: `%APPDATA%/Claude/claude_desktop_config.json`

```
npm i -g @kazuph/mcp-qdrant-docs
```

### Using `npx`

The recommended way to run this server is using `npx` within your MCP client configuration (e.g., Claude Desktop's `claude_desktop_config.json`). This avoids the need for global installation.

**Example `claude_desktop_config.json` using `npx`:**

```json
{
"mcpServers": {
"react-router-docs": { // A unique name for this server instance
"command": "npx",
"args": [
"@kazuph/mcp-qdrant-docs", // The command registered in package.json bin
// Optional: Add command-line arguments here if needed
// "--start-url", "https://some-default-url.com/",
// "--debug"
],
// Optional: Set environment variables for configuration
"env": {
"DOCS_URL": "https://reactrouter.com/",
"QDRANT_URL": "http://your-qdrant-instance:6333",
"COLLECTION_NAME": "react-router-docs", // Base name for the collection
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2"
// "DEBUG": "true" // Alternative way to enable debug logging
}
}
// You can add more server instances for different documentation sites here
}
}
```

## Command-Line Options

When running the server directly (e.g., using `npx mcp-qdrant-docs` or `npm run dev --`), you can use the following command-line options. These options override corresponding environment variables if both are set.

- `--start-url ` or `-s `:
- **Required (if `DOCS_URL` env var is not set).**
- The starting URL of the website to scrape.
- Overrides the `DOCS_URL` environment variable.
- `--limit ` or `-l `:
- Maximum number of pages to scrape.
- Default: `300`.
- `--match ` or `-m `:
- URL path patterns (prefix match) to limit scraping. Can be specified multiple times.
- Example: `--match /docs/ --match /api/`
- Default: Scrapes all pages under the `start-url` domain.
- `--force-reindex`:
- Force re-scraping and re-indexing even if the Qdrant collection already exists.
- Default: `false`.
- `--collection-name ` or `-c `:
- Base name for the Qdrant collection. The final collection name will be `-`.
- Overrides the `COLLECTION_NAME` environment variable.
- Default: `docs-collection`.
- `--qdrant-url `:
- URL of the Qdrant instance.
- Overrides the `QDRANT_URL` environment variable.
- Default: `http://localhost:6333`.
- `--embedding-model `:
- Name of the sentence transformer model to use for embeddings (from Hugging Face or local).
- Overrides the `EMBEDDING_MODEL` environment variable.
- Default: `Xenova/all-MiniLM-L6-v2`.
- `--debug`:
- Enable detailed debug logging.
- Overrides the `DEBUG` environment variable (if set to `true`).
- Default: `false`.
- `--help` or `-h`:
- Show the help message listing all options.

**Example using command-line options:**

```bash
npx @kazuph/mcp-qdrant-docs --start-url https://example-docs.com/ --collection-name my-docs --limit 50 --debug
```

**Configuration Priority:**

The server uses the following priority for settings:

1. **Command-line arguments:** (e.g., `--start-url`, `--collection-name`) - Highest priority.
2. **Environment variables:** (e.g., `DOCS_URL`, `COLLECTION_NAME`) - Used if command-line arguments are not provided.
3. **Default values:** (Defined within the code) - Lowest priority.

## Example: Adding React Router Documentation

To add a server instance specifically for querying React Router documentation, add the following entry to your `mcpServers` configuration (e.g., in `claude_desktop_config.json`):

```json
{
"mcpServers": {
// ... other servers ...
"react-router-docs": {
"command": "npx", // Or the direct path if not installed globally
"args": [
"@kazuph/mcp-qdrant-docs"
// No need to specify --start-url etc. if using env vars
],
"env": {
"DOCS_URL": "https://reactrouter.com/",
"QDRANT_URL": "http://your-qdrant-instance:6333", // Replace with your Qdrant URL
"COLLECTION_NAME": "react-router-docs", // Base name, will become 'react-router-docs-reactrouter_com'
"EMBEDDING_MODEL": "sentence-transformers/all-MiniLM-L6-v2" // Or your preferred model
// "DEBUG": "true" // Enable debug logs if needed
}
}
// ... other servers ...
}
}

```

**Resulting Tool:**

Once this server instance is running and connected to your MCP client, it will provide a tool named similar to `ask_reactrouter_docs` (not `ask_reactrouter_com_docs`).

- **Tool Name:** `ask__docs` (e.g., `ask_reactrouter_docs`)
- **Description:** Ask a question about the content of the site specified by `DOCS_URL` (or `--start-url`).
- **Input:** A natural language query about the documentation.

The server will automatically scrape the site (if the collection doesn't exist or `--force-reindex` is used), index the content into the specified Qdrant collection (`react-router-docs-reactrouter_com` in this example), and then use the index to answer your queries via the provided tool (`ask_reactrouter_docs`).