https://github.com/scaile-it/g-mcp-tools-fast
Production-ready data enrichment API with 9 AI-powered tools: web scraping, email intel, phone validation, company data, and more. SaaS-ready with OpenAPI docs.
https://github.com/scaile-it/g-mcp-tools-fast
api data-intelligence email-validation enrichment gemini modal openapi phone-validation saas web-scraping
Last synced: 8 months ago
JSON representation
Production-ready data enrichment API with 9 AI-powered tools: web scraping, email intel, phone validation, company data, and more. SaaS-ready with OpenAPI docs.
- Host: GitHub
- URL: https://github.com/scaile-it/g-mcp-tools-fast
- Owner: SCAILE-it
- License: mit
- Created: 2025-10-26T17:51:45.000Z (8 months ago)
- Default Branch: master
- Last Pushed: 2025-10-26T18:52:06.000Z (8 months ago)
- Last Synced: 2025-10-26T20:41:49.006Z (8 months ago)
- Topics: api, data-intelligence, email-validation, enrichment, gemini, modal, openapi, phone-validation, saas, web-scraping
- Language: Python
- Size: 64.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# g-mcp-tools-fast - Production-Ready Enrichment API
**Enterprise-grade data intelligence API with 9 enrichment tools + bulk processing**
๐ **Status:** Production-Ready | SaaS-Ready | Fully Deployed
๐ **Live Endpoint:** `https://scaile--g-mcp-tools-fast-api.modal.run`
๐ **Interactive Docs:** [Swagger UI](https://scaile--g-mcp-tools-fast-api.modal.run/docs) | [ReDoc](https://scaile--g-mcp-tools-fast-api.modal.run/redoc)
---
## ๐ฏ Overview
A complete data enrichment API built on Modal.com, combining AI-powered web scraping with 8 specialized intelligence tools. Perfect for sales intelligence, market research, lead enrichment, and data validation.
### Key Features
โ
**9 Enrichment Tools** - Web scraping, email intel, company data, phone validation, and more
โ
**Bulk Processing** - Process 100s-1000s of records in parallel with auto-detection
โ
**Smart Auto-Detection** - Automatically detect data types and apply appropriate tools
โ
**Multi-Tool Enrichment** - Combine multiple tools on a single record
โ
**AI-Powered Extraction** - Uses Gemini 2.5 Flash for intelligent data extraction
โ
**Production-Ready** - Authentication, health checks, comprehensive error handling
โ
**Auto-Scaling** - Serverless architecture handles traffic spikes automatically
โ
**24-Hour Cache** - Reduces costs and improves response times
โ
**OpenAPI Docs** - Swagger/ReDoc for easy integration
โ
**Type-Safe** - Pydantic models for all inputs/outputs
---
## ๐ Bulk Processing & Power Features
**NEW:** Process multiple records in parallel with intelligent auto-detection!
### Multi-Tool Enrichment (`/enrich`)
Enrich a single record with multiple tools at once:
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/enrich \
-H 'Content-Type: application/json' \
-d '{
"data": {
"phone": "+14155552671",
"email": "john@anthropic.com"
},
"tools": ["phone-validation", "email-intel", "email-pattern"]
}'
```
### Auto-Detection (`/enrich/auto`)
Automatically detect data types and apply appropriate tools:
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/enrich/auto \
-H 'Content-Type: application/json' \
-d '{
"data": {
"contact_phone": "+14155552671",
"work_email": "john@anthropic.com",
"company_domain": "anthropic.com"
}
}'
```
**Response:** Automatically detected and enriched with 5 tools (phone validation, email intel, email pattern, WHOIS, tech stack)!
### Bulk Processing (`/bulk`)
Process multiple records in parallel with explicit tools:
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/bulk \
-H 'Content-Type: application/json' \
-d '{
"rows": [
{"name": "Alice Johnson", "email": "alice@example.com"},
{"name": "Bob Smith", "email": "bob@example.com"}
],
"tools": ["email-intel", "email-pattern"]
}'
```
**Response:**
```json
{
"success": true,
"batch_id": "batch_1761503726531_7AzCBh1nHak",
"status": "completed",
"total_rows": 2,
"successful": 2,
"failed": 0,
"processing_time_seconds": 1.24,
"results": [ /* enriched rows */ ]
}
```
### Bulk Auto-Processing (`/bulk/auto`)
Process multiple records with automatic tool detection:
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/bulk/auto \
-H 'Content-Type: application/json' \
-d '{
"rows": [
{"name": "Alice", "email": "alice@example.com", "website": "example.com"},
{"name": "Bob", "phone": "+14155551234"}
]
}'
```
**Smart Features:**
- โ
Automatically detects emails, phones, domains, companies, GitHub usernames
- โ
Applies appropriate tools (email-intel, email-pattern, whois, tech-stack, etc.)
- โ
Processes rows in parallel using asyncio
- โ
Handles up to 10,000 rows per batch
- โ
Returns detailed success/error stats
---
## ๐ ๏ธ Individual Enrichment Tools
### 1. **Web Scraper** (`/scrape`)
Extract structured data from any website using natural language prompts.
**Capabilities:**
- AI-powered extraction with Gemini 2.5 Flash
- Multi-page scraping with auto-discovery
- Custom JSON schema support
- Link extraction
- 24-hour intelligent caching
**Example:**
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/scrape \
-H 'Content-Type: application/json' \
-d '{
"url": "https://anthropic.com",
"prompt": "Extract the company mission and product names",
"max_pages": 1
}'
```
**Response:**
```json
{
"success": true,
"data": {
"company_mission": "Build safe, beneficial AI...",
"product_names": ["Claude", "Claude Code", "Opus", "Sonnet", "Haiku"]
},
"metadata": {
"extraction_time": 10.31,
"pages_scraped": 1,
"model": "gemini-2.5-flash"
}
}
```
---
### 2. **Email Intel** (`/email-intel`)
Check which platforms an email is registered on (holehe).
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-intel \
-H 'Content-Type: application/json' \
-d '{"email": "user@example.com"}'
```
---
### 3. **Email Finder** (`/email-finder`)
Find email addresses associated with a domain (theHarvester).
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-finder \
-H 'Content-Type: application/json' \
-d '{"domain": "anthropic.com", "limit": 10}'
```
---
### 4. **Company Data** (`/company-data`)
Get company registration and corporate information (OpenCorporates).
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/company-data \
-H 'Content-Type: application/json' \
-d '{"companyName": "Anthropic", "domain": "anthropic.com"}'
```
---
### 5. **Phone Validation** (`/phone-validation`)
Validate phone numbers with carrier, location, and line type info.
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/phone-validation \
-H 'Content-Type: application/json' \
-d '{"phoneNumber": "+14155552671", "defaultCountry": "US"}'
```
**Response:**
```json
{
"success": true,
"data": {
"valid": true,
"formatted": {
"e164": "+14155552671",
"international": "+1 415-555-2671",
"national": "(415) 555-2671"
},
"country": "San Francisco, CA",
"carrier": "Unknown",
"lineType": "FIXED_LINE_OR_MOBILE",
"lineTypeCode": 2
}
}
```
---
### 6. **Tech Stack** (`/tech-stack`)
Detect technologies and frameworks used by a website.
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/tech-stack \
-H 'Content-Type: application/json' \
-d '{"domain": "anthropic.com"}'
```
**Response:**
```json
{
"success": true,
"data": {
"domain": "anthropic.com",
"technologies": [
{"name": "Next.js", "category": "Framework"},
{"name": "cloudflare", "category": "Web Server"}
],
"totalFound": 2
}
}
```
---
### 7. **Email Pattern** (`/email-pattern`)
Generate common email patterns for a domain.
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-pattern \
-H 'Content-Type: application/json' \
-d '{"domain": "anthropic.com", "firstName": "John", "lastName": "Doe"}'
```
---
### 8. **WHOIS** (`/whois`)
Look up domain registration information.
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/whois \
-H 'Content-Type: application/json' \
-d '{"domain": "anthropic.com"}'
```
**Response:**
```json
{
"success": true,
"data": {
"domain": "anthropic.com",
"registrar": "MarkMonitor, Inc.",
"creationDate": "2001-10-02",
"expirationDate": "2033-10-02",
"nameServers": ["ISLA.NS.CLOUDFLARE.COM", "RANDY.NS.CLOUDFLARE.COM"]
}
}
```
---
### 9. **GitHub Intel** (`/github-intel`)
Analyze GitHub user profiles and repositories.
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/github-intel \
-H 'Content-Type: application/json' \
-d '{"username": "anthropics"}'
```
**Response:**
```json
{
"success": true,
"data": {
"username": "anthropics",
"name": "Anthropic",
"location": "United States of America",
"publicRepos": 54,
"followers": 14565,
"languages": {
"Python": 6,
"TypeScript": 3,
"JavaScript": 1
}
}
}
```
---
## ๐ Authentication
The API supports optional API key authentication via the `x-api-key` header.
### Enable Authentication
1. **Create Modal secret:**
```bash
modal secret create modal-api-key MODAL_API_KEY=your-secret-key-here
```
2. **Redeploy the API:**
```bash
./DEPLOY_G_MCP_TOOLS.sh
```
3. **Include API key in requests:**
```bash
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/scrape \
-H 'Content-Type: application/json' \
-H 'x-api-key: your-secret-key-here' \
-d '{"url": "https://example.com", "prompt": "Extract data"}'
```
**Note:** If `MODAL_API_KEY` is not set, the API is publicly accessible (useful for development).
---
## ๐ Deployment
### Prerequisites
1. **Install Modal CLI:**
```bash
pip install modal
```
2. **Authenticate:**
```bash
modal setup
```
3. **Create Gemini API secret:**
```bash
modal secret create gemini-secret GOOGLE_GENERATIVE_AI_API_KEY=your-gemini-key
```
### Deploy
```bash
chmod +x DEPLOY_G_MCP_TOOLS.sh
./DEPLOY_G_MCP_TOOLS.sh
```
Or manually:
```bash
modal deploy g-mcp-tools-complete.py
```
---
## ๐ฅ Health Check
Monitor API status:
```bash
curl https://scaile--g-mcp-tools-fast-api.modal.run/health
```
**Response:**
```json
{
"status": "healthy",
"service": "g-mcp-tools-fast",
"version": "1.0.0",
"tools": 9,
"timestamp": "2025-10-26T17:30:00.000000Z"
}
```
---
## ๐ Response Format
All endpoints follow a consistent response format:
### Success Response
```json
{
"success": true,
"data": { ... },
"metadata": {
"source": "tool-name",
"timestamp": "2025-10-26T17:30:00.000000Z"
}
}
```
### Error Response
```json
{
"success": false,
"error": "Error message",
"metadata": {
"source": "tool-name",
"timestamp": "2025-10-26T17:30:00.000000Z"
}
}
```
---
## ๐ฐ Cost Optimization
The API includes several cost-saving features:
1. **24-Hour Cache** - Repeated requests return cached results
2. **Timeouts** - Prevents runaway processes (30s default, 120s max)
3. **Container Idle Timeout** - Containers shut down after 120s of inactivity
4. **Efficient Resource Usage** - Only runs when needed
**Estimated costs (Modal pricing):**
- Web scraping: ~$0.001 per request
- Other tools: ~$0.0001 per request
- Cache hits: $0 (served from memory)
---
## ๐งช Testing
### Run All Tests
```bash
# Test all 9 endpoints
./test-all-endpoints.sh
```
### Individual Endpoint Tests
```bash
# Email pattern
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/email-pattern \
-H 'Content-Type: application/json' \
-d '{"domain": "anthropic.com"}'
# Phone validation
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/phone-validation \
-H 'Content-Type: application/json' \
-d '{"phoneNumber": "+14155552671"}'
# GitHub intel
curl -X POST https://scaile--g-mcp-tools-fast-api.modal.run/github-intel \
-H 'Content-Type: application/json' \
-d '{"username": "anthropics"}'
```
---
## ๐ SaaS Readiness Checklist
- [x] **Health Check Endpoint** - `/health` for monitoring
- [x] **API Authentication** - Optional `x-api-key` header
- [x] **OpenAPI Documentation** - Swagger UI + ReDoc
- [x] **Error Handling** - Comprehensive error responses
- [x] **Input Validation** - Pydantic models
- [x] **Rate Limiting** - Handled by Modal platform
- [x] **Monitoring** - Modal dashboard + logs
- [x] **Auto-Scaling** - Serverless architecture
- [x] **Cost Optimization** - Caching + timeouts
- [x] **Type Safety** - TypeScript-style typing
### Ready to Sell As:
โ
B2B SaaS API
โ
Data Enrichment Service
โ
Lead Intelligence Platform
โ
Market Research Tool
---
## ๐ง Monitoring & Logs
### View Logs
```bash
modal app logs g-mcp-tools-fast --follow
```
### Check App Status
```bash
modal app list | grep g-mcp-tools
```
### View Secrets
```bash
modal secret list
```
---
## ๐๏ธ Architecture
```
Client Request
โ
FastAPI (Modal ASGI)
โ
Authentication Check (optional)
โ
Input Validation (Pydantic)
โ
Cache Check (24h TTL)
โ (cache miss)
Tool Execution
โโ Web Scraper (crawl4ai + Gemini)
โโ Email Intel (holehe)
โโ Email Finder (theHarvester)
โโ Company Data (OpenCorporates API)
โโ Phone Validation (libphonenumber)
โโ Tech Stack (custom detection)
โโ Email Pattern (pattern generation)
โโ WHOIS (python-whois)
โโ GitHub Intel (GitHub API)
โ
Cache Result
โ
JSON Response
```
---
## ๐ License
See parent repository for license information.
---
## ๐ค Support
- **Documentation:** [Swagger UI](https://scaile--g-mcp-tools-fast-api.modal.run/docs)
- **Issues:** Report via GitHub Issues
- **Modal Support:** [modal.com/docs](https://modal.com/docs)
---
## ๐ฏ Use Cases
### Sales Intelligence
- Enrich lead data with company info
- Find contact emails and phone numbers
- Validate contact information
### Market Research
- Scrape competitor websites
- Analyze tech stacks
- Track company changes via WHOIS
### Developer Intelligence
- Analyze GitHub profiles
- Detect technologies used
- Research developer ecosystems
### Data Validation
- Validate phone numbers
- Verify email patterns
- Check domain registrations
---
**Built with:** Modal.com | FastAPI | Gemini 2.5 Flash | crawl4ai