https://github.com/shreshthmohan/av-cf-scraper
A scraper based on two cloudflare workers for getting the lastest Aranya Vihara Permit status data
https://github.com/shreshthmohan/av-cf-scraper
Last synced: 8 months ago
JSON representation
A scraper based on two cloudflare workers for getting the lastest Aranya Vihara Permit status data
- Host: GitHub
- URL: https://github.com/shreshthmohan/av-cf-scraper
- Owner: shreshthmohan
- Created: 2025-09-08T07:43:13.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-09-08T09:32:20.000Z (9 months ago)
- Last Synced: 2025-09-22T12:03:55.267Z (9 months ago)
- Language: JavaScript
- Size: 46.9 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README/local-testing-guide.md
Awesome Lists containing this project
README
# Local Testing Guide for Two-Worker Architecture
This document provides instructions for testing the two-worker architecture locally using Wrangler's development mode before deploying to production.
## Architecture Overview
The two-worker system uses **RPC (Remote Procedure Call)** via Cloudflare service bindings for inter-worker communication:
- **Discovery Worker**: Calls `availabilityBinding.processAvailability(requestData)`
- **Availability Worker**: Exports anonymous class extending `WorkerEntrypoint` as default
- **Communication**: Direct method calls (no HTTP requests in production)
Local testing uses RPC service bindings, which work seamlessly in wrangler dev mode.
## Prerequisites
- [Wrangler CLI](https://developers.cloudflare.com/workers/wrangler/install-and-update/) installed
- Node.js and npm/pnpm installed
- Local environment variables configured
- Supabase database accessible from local environment
## Local Environment Setup
### 1. Install Dependencies
```bash
# Install project dependencies
npm install
# or if using pnpm
pnpm install
```
### 2. Create Local Environment Files
#### Create `.dev.vars` for Discovery Worker
```bash
# Create environment file for discovery worker
cat > .dev.vars.discovery << EOF
SUPABASE_URL=your_supabase_url_here
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_key_here
AVAILABILITY_DAYS=5
DEBUG=true
VERBOSE=true
MAX_CONCURRENT_REQUESTS=3
RETRY_ATTEMPTS=2
EOF
```
#### Create `.dev.vars` for Availability Worker
```bash
# Create environment file for availability worker
cat > .dev.vars.availability << EOF
SUPABASE_URL=your_supabase_url_here
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_key_here
AVAILABILITY_DAYS=5
DEBUG=true
VERBOSE=true
EOF
```
**Note**: Replace the placeholder values with your actual Supabase credentials.
### 3. Verify Environment Files
```bash
# Check that environment files exist and have correct content
ls -la .dev.vars.*
head -n 3 .dev.vars.discovery
head -n 3 .dev.vars.availability
```
## Local Development Testing
### RPC Service Bindings Testing
Service bindings work seamlessly in local development with `wrangler dev`. Both workers run locally and communicate via RPC.
#### Step 1: Start Availability Worker First
```bash
# Terminal 1 - Start availability worker (must start first for binding to work)
wrangler dev --config wrangler-availability.toml --port 8787 --env-file .dev.vars.availability
```
#### Step 2: Start Discovery Worker
```bash
# Terminal 2 - Start discovery worker (automatically binds to availability worker)
wrangler dev --config wrangler-discovery.toml --port 8788 --env-file .dev.vars.discovery
```
**Note**: You should see binding confirmation in the terminal output when the discovery worker starts.
#### Step 3: Test Individual Workers
```bash
# Terminal 3 - Test availability worker directly
curl -X GET "http://localhost:8787/health" | jq
# Test availability worker with sample data
curl -X POST "http://localhost:8787/fetch-availability" \
-H "Content-Type: application/json" \
-d '{
"trek_id": "85",
"trek_name": "Test Trek Local",
"district_id": "17",
"availability_days": 5,
"start_date": "'$(date -I)'"
}' | jq
```
#### Step 4: Test RPC Communication Between Workers
```bash
# Test discovery worker health
curl -X GET "http://localhost:8788/health" | jq
curl -X GET "http://localhost:8788/status" | jq
# Test discovery with RPC calls to availability worker
curl -X POST "http://localhost:8788/discover-only" | jq
```
**What to Look For:**
- Discovery worker logs: `"📤 RPC: Sending availability request"`
- Availability worker logs: `"🔗 RPC: Processing trek via service binding"`
- No HTTP requests between workers - everything goes through RPC
## Local Testing Scenarios
### Scenario 1: Basic Health Checks
```bash
# Test both workers are running
curl -X GET "http://localhost:8787/health" # Availability worker
curl -X GET "http://localhost:8788/health" # Discovery worker
curl -X GET "http://localhost:8788/status" # Discovery worker status
```
### Scenario 2: Single Trek Processing
```bash
# Process a single trek for 3 days
curl -X POST "http://localhost:8787/fetch-availability" \
-H "Content-Type: application/json" \
-d '{
"trek_id": "17-85",
"trek_name": "Local Test Trek",
"district_id": "17",
"availability_days": 3
}' | jq
```
### Scenario 3: Discovery Only (Safe Testing)
```bash
# Run discovery without triggering availability collection
curl -X POST "http://localhost:8788/discover-only" | jq
```
### Scenario 4: Limited Discovery with Orchestration
```bash
# This will process ALL treks in your database - use carefully!
curl -X POST "http://localhost:8788/discover"
```
### Scenario 5: Test Full RPC Workflow
```bash
# Test that RPC calls are working properly
curl -X POST "http://localhost:8788/discover-only"
# Check logs for "🔗 RPC: Processing trek" messages indicating RPC calls
# Monitor both worker logs to see RPC communication:
# Discovery worker: "📤 RPC: Sending availability request"
# Availability worker: "🔗 RPC: Processing trek via service binding"
```
## Local Testing Best Practices
### 1. Use Small Data Sets
- Test with a small number of treks initially
- Use `AVAILABILITY_DAYS=2` or `3` for faster testing
- Set `MAX_CONCURRENT_REQUESTS=2` to avoid overwhelming local setup
### 2. Monitor Logs
```bash
# Watch logs in real-time
# Terminal 1: Availability worker logs
# Terminal 2: Discovery worker logs
# Terminal 3: Your test commands
```
### 3. Database Verification
```sql
-- Check local test data in Supabase
SELECT sr.id, sr.status, sr.started_at, sr.completed_at
FROM av_scrape_runs sr
WHERE sr.started_by LIKE '%local%'
ORDER BY sr.started_at DESC
LIMIT 5;
-- Check availability data from local tests
SELECT COUNT(*) as records, DATE(scraped_at) as test_date
FROM av_trek_availability
WHERE scrape_run_id LIKE '%local%'
GROUP BY DATE(scraped_at);
```
### 4. Error Testing
```bash
# Test invalid request
curl -X POST "http://localhost:8787/fetch-availability" \
-H "Content-Type: application/json" \
-d '{"invalid": "data"}'
# Test missing worker (stop availability worker and test discovery)
```
## Cleanup After Local Testing
### 1. Stop All Workers
```bash
# Stop wrangler dev processes (Ctrl+C in each terminal)
```
### 2. Clean Up Test Data (Optional)
```sql
-- Remove local test data from database
DELETE FROM av_trek_availability
WHERE scrape_run_id LIKE '%local%' OR scrape_run_id LIKE '%test%';
DELETE FROM av_scrape_runs
WHERE id LIKE '%local%' OR id LIKE '%test%';
```
### 3. Remove Local Files
```bash
# Remove local testing files
rm -f .dev.vars.*
```
## Troubleshooting Local Development
### Common Issues:
1. **"Port already in use"**
```bash
# Kill processes on specific ports
lsof -ti:8787 | xargs kill -9
lsof -ti:8788 | xargs kill -9
```
2. **"Database connection failed"**
- Check `.dev.vars` files have correct Supabase credentials
- Verify your network can reach Supabase
- Test database connection separately
3. **"Service binding not found"**
- Ensure availability worker is started first and is running
- Verify the service binding configuration in wrangler-discovery.toml (no entrypoint needed for default export)
- Verify anonymous WorkerEntrypoint class is properly exported as default
- Look for binding confirmation messages in discovery worker startup logs
4. **"Module not found" errors**
```bash
# Reinstall dependencies
rm -rf node_modules package-lock.json
npm install
```
5. **High resource usage**
- Reduce `AVAILABILITY_DAYS` to 1-2
- Set `MAX_CONCURRENT_REQUESTS=1`
- Test with fewer treks
6. **RPC method not found**
- Verify anonymous class is exported with `export default class extends WorkerEntrypoint`
- Check method names match: `processAvailability()` and `getHealth()`
- Ensure `import { WorkerEntrypoint } from "cloudflare:workers"` is present
7. **"Database save failed: invalid input syntax for type uuid"**
- Use proper UUID format for `scrape_run_id`: `$(uuidgen | tr '[:upper:]' '[:lower:]')`
- Avoid simple strings like `"local-test-123"` - database expects UUID format
- Check that database schema expects UUID type for scrape_run_id field
8. **"Foreign key constraint violation" (trek_id_fkey)**
- The `trek_id` you're testing with doesn't exist in the database
- Either create test trek data or use a real trek_id from your database
- The worker will automatically create the scrape_run_id in the database
## Local vs. Production Differences
| Aspect | Local Development | Production |
| -------------- | ------------------------------------- | ----------------------------- |
| Communication | RPC via service bindings | RPC via service bindings |
| Concurrency | Limited (2-3) | Higher (10+) |
| Data Volume | Small test sets | Full dataset |
| Error Handling | More verbose logging | Production logging |
| Performance | Slower (development) | Optimized |
| RPC Classes | WorkerEntrypoint classes work locally | Same WorkerEntrypoint classes |
## Next Steps
After successful local testing:
1. Commit your changes (excluding local test files)
2. Follow the main [testing-guide.md](./testing-guide.md) for deployment
3. Deploy to staging environment first
4. Monitor production deployment
Remember: Local testing is great for development and debugging, but always test in a staging environment before production deployment!