https://github.com/shreshthmohan/av-cf-scraper

A scraper based on two cloudflare workers for getting the lastest Aranya Vihara Permit status data
https://github.com/shreshthmohan/av-cf-scraper

Last synced: 8 months ago
JSON representation

A scraper based on two cloudflare workers for getting the lastest Aranya Vihara Permit status data

Host: GitHub
URL: https://github.com/shreshthmohan/av-cf-scraper
Owner: shreshthmohan
Created: 2025-09-08T07:43:13.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-09-08T09:32:20.000Z (9 months ago)
Last Synced: 2025-09-22T12:03:55.267Z (9 months ago)
Language: JavaScript
Size: 46.9 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README/local-testing-guide.md

Awesome Lists containing this project

README

# Local Testing Guide for Two-Worker Architecture

This document provides instructions for testing the two-worker architecture locally using Wrangler's development mode before deploying to production.

## Architecture Overview

The two-worker system uses **RPC (Remote Procedure Call)** via Cloudflare service bindings for inter-worker communication:

- **Discovery Worker**: Calls `availabilityBinding.processAvailability(requestData)`
- **Availability Worker**: Exports anonymous class extending `WorkerEntrypoint` as default
- **Communication**: Direct method calls (no HTTP requests in production)

Local testing uses RPC service bindings, which work seamlessly in wrangler dev mode.

## Prerequisites

- [Wrangler CLI](https://developers.cloudflare.com/workers/wrangler/install-and-update/) installed
- Node.js and npm/pnpm installed
- Local environment variables configured
- Supabase database accessible from local environment

## Local Environment Setup

### 1. Install Dependencies

```bash
# Install project dependencies
npm install
# or if using pnpm
pnpm install
```

### 2. Create Local Environment Files

#### Create `.dev.vars` for Discovery Worker

```bash
# Create environment file for discovery worker
cat > .dev.vars.discovery << EOF
SUPABASE_URL=your_supabase_url_here
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_key_here
AVAILABILITY_DAYS=5
DEBUG=true
VERBOSE=true
MAX_CONCURRENT_REQUESTS=3
RETRY_ATTEMPTS=2
EOF
```

#### Create `.dev.vars` for Availability Worker

```bash
# Create environment file for availability worker
cat > .dev.vars.availability << EOF
SUPABASE_URL=your_supabase_url_here
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_key_here
AVAILABILITY_DAYS=5
DEBUG=true
VERBOSE=true
EOF
```

**Note**: Replace the placeholder values with your actual Supabase credentials.

### 3. Verify Environment Files

```bash
# Check that environment files exist and have correct content
ls -la .dev.vars.*
head -n 3 .dev.vars.discovery
head -n 3 .dev.vars.availability
```

## Local Development Testing

### RPC Service Bindings Testing

Service bindings work seamlessly in local development with `wrangler dev`. Both workers run locally and communicate via RPC.

#### Step 1: Start Availability Worker First

```bash
# Terminal 1 - Start availability worker (must start first for binding to work)
wrangler dev --config wrangler-availability.toml --port 8787 --env-file .dev.vars.availability
```

#### Step 2: Start Discovery Worker

```bash
# Terminal 2 - Start discovery worker (automatically binds to availability worker)
wrangler dev --config wrangler-discovery.toml --port 8788 --env-file .dev.vars.discovery
```

**Note**: You should see binding confirmation in the terminal output when the discovery worker starts.

#### Step 3: Test Individual Workers

```bash
# Terminal 3 - Test availability worker directly
curl -X GET "http://localhost:8787/health" | jq

# Test availability worker with sample data
curl -X POST "http://localhost:8787/fetch-availability" \
-H "Content-Type: application/json" \
-d '{
"trek_id": "85",
"trek_name": "Test Trek Local",
"district_id": "17",
"availability_days": 5,
"start_date": "'$(date -I)'"
}' | jq
```

#### Step 4: Test RPC Communication Between Workers

```bash
# Test discovery worker health
curl -X GET "http://localhost:8788/health" | jq
curl -X GET "http://localhost:8788/status" | jq

# Test discovery with RPC calls to availability worker
curl -X POST "http://localhost:8788/discover-only" | jq
```

**What to Look For:**

- Discovery worker logs: `"📤 RPC: Sending availability request"`
- Availability worker logs: `"🔗 RPC: Processing trek via service binding"`
- No HTTP requests between workers - everything goes through RPC

## Local Testing Scenarios

### Scenario 1: Basic Health Checks

```bash
# Test both workers are running
curl -X GET "http://localhost:8787/health" # Availability worker
curl -X GET "http://localhost:8788/health" # Discovery worker
curl -X GET "http://localhost:8788/status" # Discovery worker status
```

### Scenario 2: Single Trek Processing

```bash
# Process a single trek for 3 days
curl -X POST "http://localhost:8787/fetch-availability" \
-H "Content-Type: application/json" \
-d '{
"trek_id": "17-85",
"trek_name": "Local Test Trek",
"district_id": "17",
"availability_days": 3
}' | jq
```

### Scenario 3: Discovery Only (Safe Testing)

```bash
# Run discovery without triggering availability collection
curl -X POST "http://localhost:8788/discover-only" | jq
```

### Scenario 4: Limited Discovery with Orchestration

```bash
# This will process ALL treks in your database - use carefully!
curl -X POST "http://localhost:8788/discover"
```

### Scenario 5: Test Full RPC Workflow

```bash
# Test that RPC calls are working properly
curl -X POST "http://localhost:8788/discover-only"
# Check logs for "🔗 RPC: Processing trek" messages indicating RPC calls

# Monitor both worker logs to see RPC communication:
# Discovery worker: "📤 RPC: Sending availability request"
# Availability worker: "🔗 RPC: Processing trek via service binding"
```

## Local Testing Best Practices

### 1. Use Small Data Sets

- Test with a small number of treks initially
- Use `AVAILABILITY_DAYS=2` or `3` for faster testing
- Set `MAX_CONCURRENT_REQUESTS=2` to avoid overwhelming local setup

### 2. Monitor Logs

```bash
# Watch logs in real-time
# Terminal 1: Availability worker logs
# Terminal 2: Discovery worker logs
# Terminal 3: Your test commands
```

### 3. Database Verification

```sql
-- Check local test data in Supabase
SELECT sr.id, sr.status, sr.started_at, sr.completed_at
FROM av_scrape_runs sr
WHERE sr.started_by LIKE '%local%'
ORDER BY sr.started_at DESC
LIMIT 5;

-- Check availability data from local tests
SELECT COUNT(*) as records, DATE(scraped_at) as test_date
FROM av_trek_availability
WHERE scrape_run_id LIKE '%local%'
GROUP BY DATE(scraped_at);
```

### 4. Error Testing

```bash
# Test invalid request
curl -X POST "http://localhost:8787/fetch-availability" \
-H "Content-Type: application/json" \
-d '{"invalid": "data"}'

# Test missing worker (stop availability worker and test discovery)
```

## Cleanup After Local Testing

### 1. Stop All Workers

```bash
# Stop wrangler dev processes (Ctrl+C in each terminal)
```

### 2. Clean Up Test Data (Optional)

```sql
-- Remove local test data from database
DELETE FROM av_trek_availability
WHERE scrape_run_id LIKE '%local%' OR scrape_run_id LIKE '%test%';

DELETE FROM av_scrape_runs
WHERE id LIKE '%local%' OR id LIKE '%test%';
```

### 3. Remove Local Files

```bash
# Remove local testing files
rm -f .dev.vars.*
```

## Troubleshooting Local Development

### Common Issues:

1. **"Port already in use"**

```bash
# Kill processes on specific ports
lsof -ti:8787 | xargs kill -9
lsof -ti:8788 | xargs kill -9
```

2. **"Database connection failed"**

- Check `.dev.vars` files have correct Supabase credentials
- Verify your network can reach Supabase
- Test database connection separately

3. **"Service binding not found"**

- Ensure availability worker is started first and is running
- Verify the service binding configuration in wrangler-discovery.toml (no entrypoint needed for default export)
- Verify anonymous WorkerEntrypoint class is properly exported as default
- Look for binding confirmation messages in discovery worker startup logs

4. **"Module not found" errors**

```bash
# Reinstall dependencies
rm -rf node_modules package-lock.json
npm install
```

5. **High resource usage**

- Reduce `AVAILABILITY_DAYS` to 1-2
- Set `MAX_CONCURRENT_REQUESTS=1`
- Test with fewer treks

6. **RPC method not found**

- Verify anonymous class is exported with `export default class extends WorkerEntrypoint`
- Check method names match: `processAvailability()` and `getHealth()`
- Ensure `import { WorkerEntrypoint } from "cloudflare:workers"` is present

7. **"Database save failed: invalid input syntax for type uuid"**

- Use proper UUID format for `scrape_run_id`: `$(uuidgen | tr '[:upper:]' '[:lower:]')`
- Avoid simple strings like `"local-test-123"` - database expects UUID format
- Check that database schema expects UUID type for scrape_run_id field

8. **"Foreign key constraint violation" (trek_id_fkey)**
- The `trek_id` you're testing with doesn't exist in the database
- Either create test trek data or use a real trek_id from your database
- The worker will automatically create the scrape_run_id in the database

## Local vs. Production Differences

| Aspect | Local Development | Production |
| -------------- | ------------------------------------- | ----------------------------- |
| Communication | RPC via service bindings | RPC via service bindings |
| Concurrency | Limited (2-3) | Higher (10+) |
| Data Volume | Small test sets | Full dataset |
| Error Handling | More verbose logging | Production logging |
| Performance | Slower (development) | Optimized |
| RPC Classes | WorkerEntrypoint classes work locally | Same WorkerEntrypoint classes |

## Next Steps

After successful local testing:

1. Commit your changes (excluding local test files)
2. Follow the main [testing-guide.md](./testing-guide.md) for deployment
3. Deploy to staging environment first
4. Monitor production deployment

Remember: Local testing is great for development and debugging, but always test in a staging environment before production deployment!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shreshthmohan/av-cf-scraper

Awesome Lists containing this project

README