https://github.com/yigitkonur/mcp-researchpowerpack
The ultimate research MCP toolkit: Reddit mining, web search with CTR aggregation, AI-powered deep research, and intelligent web scraping
https://github.com/yigitkonur/mcp-researchpowerpack
ai-agents deep-research mcp mcp-server model-context-protocol reddit research-automation typescript web-scraping web-search
Last synced: 3 months ago
JSON representation
The ultimate research MCP toolkit: Reddit mining, web search with CTR aggregation, AI-powered deep research, and intelligent web scraping
- Host: GitHub
- URL: https://github.com/yigitkonur/mcp-researchpowerpack
- Owner: yigitkonur
- Created: 2026-03-08T12:48:56.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-12T05:02:12.000Z (3 months ago)
- Last Synced: 2026-03-12T10:07:42.362Z (3 months ago)
- Language: TypeScript
- Size: 759 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
π¬ MCP Research Powerpack
Five research tools for AI assistants β search, scrape, mine Reddit, and synthesize with LLMs.
npx mcp-research-powerpack
---
An [MCP](https://modelcontextprotocol.io) server that gives Claude, Cursor, Windsurf, and any MCP-compatible AI assistant a complete research toolkit. Google search, Reddit deep-dives, web scraping with AI extraction, and multi-model deep research β all as tools that chain into each other.
Zero config to start. Each API key you add unlocks more capabilities.
## Tools
| Tool | What it does | Requires |
|:-----|:-------------|:---------|
| **`web_search`** | Parallel Google search across 3β100 keywords with CTR-weighted ranking and consensus detection | `SERPER_API_KEY` |
| **`search_reddit`** | Same search engine filtered to reddit.com β 10β50 queries in parallel | `SERPER_API_KEY` |
| **`get_reddit_post`** | Fetch 2β50 Reddit posts with full comment trees, smart comment budget allocation | `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` |
| **`scrape_links`** | Scrape 1β50 URLs with JS rendering fallback, HTMLβMarkdown, optional AI extraction | `SCRAPEDO_API_KEY` |
| **`deep_research`** | Send questions to research-capable models (Grok, Gemini) with web search, file attachments | `OPENROUTER_API_KEY` |
Tools are designed to **chain**: `web_search` β `scrape_links` β `search_reddit` β `get_reddit_post` β `deep_research` for synthesis. Each tool suggests the next logical step in its output.
## Quick Start
### Claude Desktop / Claude Code
Add to your MCP config (`~/Library/Application Support/Claude/claude_desktop_config.json`):
```json
{
"mcpServers": {
"research-powerpack": {
"command": "npx",
"args": ["-y", "mcp-research-powerpack"],
"env": {
"SERPER_API_KEY": "your-key-here",
"OPENROUTER_API_KEY": "your-key-here"
}
}
}
}
```
### Cursor
Add to `.cursor/mcp.json` in your project:
```json
{
"mcpServers": {
"research-powerpack": {
"command": "npx",
"args": ["-y", "mcp-research-powerpack"],
"env": {
"SERPER_API_KEY": "your-key-here"
}
}
}
}
```
### From Source
```bash
git clone https://github.com/yigitkonur/mcp-research-powerpack.git
cd mcp-research-powerpack
pnpm install && pnpm build
pnpm start
```
### HTTP Transport
```bash
MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-research-powerpack
```
Exposes `/mcp` endpoint (POST/GET/DELETE with session headers) and `/health`.
## API Keys
Each key unlocks a capability. Missing keys silently disable their tools β the server never crashes.
| Variable | Enables | Free Tier |
|:---------|:--------|:----------|
| `SERPER_API_KEY` | `web_search`, `search_reddit` | 2,500 searches/mo β [serper.dev](https://serper.dev) |
| `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` | `get_reddit_post` | Unlimited β [reddit.com/prefs/apps](https://www.reddit.com/prefs/apps) (script type) |
| `SCRAPEDO_API_KEY` | `scrape_links` | 1,000 credits/mo β [scrape.do](https://scrape.do) |
| `OPENROUTER_API_KEY` | `deep_research`, LLM extraction | Pay-per-token β [openrouter.ai](https://openrouter.ai) |
| `CEREBRAS_API_KEY` | Cerebras LLM extraction | β |
| `USE_CEREBRAS` | Enable Cerebras for extraction (set `true`) | `false` |
## Configuration
Optional tuning via environment variables:
| Variable | Default | Description |
|:---------|:--------|:------------|
| `RESEARCH_MODEL` | `x-ai/grok-4-fast` | Primary deep research model |
| `RESEARCH_FALLBACK_MODEL` | `google/gemini-2.5-flash` | Fallback when primary fails |
| `LLM_EXTRACTION_MODEL` | `openai/gpt-oss-120b:nitro` | Model for scrape/reddit AI extraction |
| `DEFAULT_REASONING_EFFORT` | `high` | Research depth: `low`, `medium`, `high` |
| `DEFAULT_MAX_URLS` | `100` | Max search results per research question (10β200) |
| `API_TIMEOUT_MS` | `1800000` | Request timeout in ms (default: 30 min) |
| `MCP_TRANSPORT` | `stdio` | Transport mode: `stdio` or `http` |
| `MCP_PORT` | `3000` | Port for HTTP mode |
| `USE_CEREBRAS` | `false` | Set to `true` to use Cerebras for extraction instead of OpenRouter |
| `CEREBRAS_API_KEY` | β | API key for Cerebras cloud β [cloud.cerebras.ai](https://cloud.cerebras.ai) |
### Cerebras Support
When `USE_CEREBRAS=true` and `CEREBRAS_API_KEY` are set, the `scrape_links` tool uses Cerebras (Z.ai GLM 4.7) for AI content extraction instead of OpenRouter. This provides:
- **Ultra-fast extraction** β Cerebras inference is optimized for speed
- **Independent from OpenRouter** β extraction works even without `OPENROUTER_API_KEY`
- **Automatic fallback** β if Cerebras is not configured, falls back to OpenRouter
```bash
# Enable Cerebras for extraction
USE_CEREBRAS=true CEREBRAS_API_KEY=your-key npx mcp-research-powerpack
```
### Network Resilience
All LLM API calls include built-in stability protections:
- **Request deadlines** β hard timeout prevents calls from hanging indefinitely
- **Stall detection** β if no response arrives within a threshold, the request is aborted and retried
- **Exponential backoff** β transient failures (429, 5xx) retry with jitter to avoid thundering herd
- **Connection loss recovery** β network errors (ECONNRESET, ECONNREFUSED) trigger automatic retry
- **Graceful degradation** β all tools return structured errors instead of crashing
## How It Works
### Search Ranking
Results from multiple queries are deduplicated by normalized URL and scored using **CTR-weighted position values** (position 1 = 100.0, position 10 = 12.56). URLs appearing across multiple queries get a consensus marker. Frequency threshold starts at β₯3, falls back to β₯2, then β₯1 to ensure results.
### Reddit Comment Budget
Global budget of **1,000 comments**, max 200 per post. After the first pass, surplus from posts with fewer comments is redistributed to truncated posts in a second fetch pass.
### Scraping Pipeline
**Three-mode fallback** per URL: basic β JS rendering β JS + US geo-targeting. Results go through HTMLβMarkdown conversion (Turndown), then optional AI extraction with a 100K char input cap and 8,000 token output per URL.
### Deep Research
**32,000 token budget** divided across questions (1 question = 32K, 10 questions = 3.2K each). Gemini models get `google_search` tool access. Grok/Perplexity get `search_parameters` with citations. Primary model fails β automatic fallback to secondary model.
### File Attachments
`deep_research` can read **local files** and include them as context. Files over 600 lines are smart-truncated (first 500 + last 100 lines). Line ranges supported. Line numbers preserved in output.
## Concurrency
| Operation | Parallel Limit |
|:----------|:---------------|
| Web search keywords | 8 |
| Reddit search queries | 8 |
| Reddit post fetches per batch | 5 (batches of 10) |
| URL scraping per batch | 10 (batches of 30) |
| LLM extraction | 3 |
| Deep research questions | 3 |
All clients use **manual retry with exponential backoff and jitter**. The OpenAI SDK's built-in retry is disabled (`maxRetries: 0`).
## Architecture
```
src/
βββ index.ts Entry point β STDIO + HTTP transport, graceful shutdown
βββ worker.ts Cloudflare Workers entry (Durable Objects)
βββ config/
β βββ index.ts Env parsing, capability detection, lazy Proxy config
β βββ loader.ts YAML β Zod β JSON Schema pipeline
β βββ yaml/tools.yaml Single source of truth for tool definitions
βββ schemas/ Zod input validation (deep-research, scrape-links, web-search)
βββ tools/
β βββ registry.ts Tool lookup β capability check β validate β execute
β βββ search.ts web_search handler
β βββ reddit.ts search_reddit + get_reddit_post handlers
β βββ scrape.ts scrape_links handler
β βββ research.ts deep_research handler
βββ clients/
β βββ search.ts Google Serper API client
β βββ reddit.ts Reddit OAuth + comment tree parser
β βββ scraper.ts Scrape.do client with fallback modes
β βββ research.ts OpenRouter client with model-specific handling
βββ services/
β βββ llm-processor.ts Shared LLM extraction (singleton OpenAI client)
β βββ markdown-cleaner.ts HTML β Markdown via Turndown
β βββ file-attachment.ts Local file reading with line ranges
βββ utils/
βββ retry.ts Shared backoff + retry constants
βββ concurrency.ts Bounded parallel execution (pMap, pMapSettled)
βββ url-aggregator.ts CTR-weighted scoring + consensus detection
βββ errors.ts Error classification + structured errors
βββ logger.ts MCP logging protocol
βββ response.ts Standardized 70/20/10 output formatting
```
## Deploy
### Cloudflare Workers
```bash
npx wrangler deploy
```
Uses Durable Objects with SQLite storage. YAML-based tool definitions are replaced with inline definitions since there's no filesystem in Workers.
### npm
Published as [`mcp-research-powerpack`](https://www.npmjs.com/package/mcp-research-powerpack). Binary names: `mcp-research-powerpack`, `research-powerpack-mcp`.
## Development
```bash
pnpm install # Install dependencies
pnpm dev # Run with tsx (live TypeScript)
pnpm build # Compile to dist/
pnpm typecheck # Type-check without emitting
pnpm start # Run compiled output
```
### Testing
```bash
pnpm test:web-search # Test web search tool
pnpm test:reddit-search # Test Reddit search
pnpm test:scrape-links # Test scraping
pnpm test:deep-research # Test deep research
pnpm test:all # Run all tests
pnpm test:check # Check environment setup
```
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run `pnpm typecheck && pnpm build` to verify
5. Commit (`git commit -m 'feat: add amazing feature'`)
6. Push to your branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request
## License
[MIT](https://opensource.org/licenses/MIT) Β© [YiΔit Konur](https://github.com/yigitkonur)